Managing eight-terabyte Telco SAS Warehouse and make use of

Transkript

Managing eight-terabyte Telco SAS Warehouse and make use of
Managing eight-terabyte Telco
SAS Warehouse and make use
of SPD Server for performance
enhancements
Esra Bayram
TURKCELL
The Premier GSM Operator in
Turkey
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
Table of Contents
•
•
•
•
•
•
•
•
•
About Turkcell
Turkcell Data Mining Environment
Turkcell SAS Warehouse Architecture
Turkcell Data Mining Projects
Turkcell SAS Warehouse ETL
SPDS Server Configuration
SPDS Server Security Model
SPDS Server Enhancements
SAS EM4.3 Performance Tips
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
About Turkcell
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
Leading GSM operator with 28.7
million1
Among the top five mobile
operators in Europe in terms of
number of subscribers
Capital expenditures of US$4.7 billion
only in Turkey as of December 2005
Covers almost 100% of Turkey with
over 10,000 base stations
Among the leaders in Europe with
respect to GPRS and MMS services
offered to customers
Global operator with international
operations and worldwide roaming
including international, GPRS and
Camel
1. As of 31 March 2006
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
About Turkcell
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
Directly and indirectly creates
employment for more than 20,000
people in Turkey alone
The only Turkish company listed on the
New York Stock Exchange (NYSE)
Represented on the Board (21 members)
and Executive Management Committee
of the GSM Association (13 members)
Promotes ISO9001, EU and US regulatory,
financial and corporate governance
standards throughout its operations
Massive tax contribution to the Turkish
government
Socially responsible with extensive and
internationally renowned educational,
cultural and sports sponsorships
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
Coverage Area
ƒ
As of end of March 2006, Turkcell covers 96.58% of
Turkish population and 78.48% of the country’s
geography.
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
INTERNATIONAL ROAMING
ƒ
Turkcell is among the
top ten operators for
international roaming
and is the world’s #1 for
GPRS roaming
ƒ
Oceans are also
included in Turkcell
roaming and GPRS
roaming coverage
through an agreement
reached with MCP
Norway in 2005
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
As April 2006;
International Roaming: 501 operators in 191 countries
GPRS Roaming: 250 operators in 98 countries
Active Camel Roaming: 112 operators in 68 countries
Passive Camel Roaming: 119 operators in 68 countries
International and International GPRS Roaming on Oceans
Turkcell Services
ƒTurkcell offers products and services
concurrently with global operators
Mobile Payment
Mobile e-mail
Ring Back Tone
Mobile Music
Mobile Video
Messenger
Visual Radio
Push to Talk
WAP Blog
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
Corporate Solutions
ƒBy leveraging advancements in GSM
technology, Turkcell is making mobile
solutions an essential tool for business
and government
Blackberry
Teleconference
m-municipality
m-government
Messaging services
Mobile marketing
IVR Services
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
Turkcell Data Mining Environment - 1
Data Mining
EUL Data Mart
(Viper)
SAS EM5.3
SAS EG 3.0
SAS/ACCESS Interface to PC Files
SAS/ETS Software
SAS/STAT Software
Data Mining
ETL Data Mart
(Miner)
OLAP TOOL
TURKCELL
DATAWAREHOUSE
(20TB)
ETL
EXTRACT / TRANSFORM / LOAD
Customer
Billing
OPERATIONAL DATA
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
Call
Turkcell Data Mining Environment - 2
Data Mining ETL Data Mart(Miner)
•
•
•
•
•
•
Sun Fire 4800
Hostname : Miner
12X1200MHz CPU (1200 MHz),
24 GB Memory
11.5 TB Disk Space (8.5 TB Full )
Development and Production(ETL) Environment
•
•
•
•
•
•
•
•
•
•
•
•
•
SAS Base 9.1.3 ,
SAS/Share,
SAS/Stat,
SAS/Access to Oracle,
SAS/Access to Sybase,
SAS/Warehouse Administrator,
SAS/Connect,
SAS/Enterprise Miner,
SPDS 4.3 (Dev/Prod)
SAS Integration Technologies,
SAS/GRAPH,
SAS/ETS,
SAS/QC
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
Turkcell Data Mining Environment - 3
Data Mining EUL Data Mart(Viper)
•
•
•
•
•
•
•
Sun V490
Hostname : Viper ,
4X1500MHz CPU
16 GB Memory
1.6 TB Disk Space (1.1 TB Full )
Server Environment
– SAS Enterprise Miner
– SAS Integration Technologies
– Enterprise Miner Server
– Enterprise Miner Thin Client
– SAS/ACCESS Interface to ORACLE
User Environment – Windows
– SAS EM5.2
– SAS EG 3.0
– SAS/ACCESS Interface to PC Files
– SAS/ETS Software
– SAS/STAT Software
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
Turkcell SAS Warehouse Architecture-1
Development Environment
•
•
•
On Miner Server
Different SPDS4.3 Server
Different Libraries (tmp_dev, odd_dev, dm_dev, dev_user libraries)
libname=tmp_dev pathname=/cadm18/SPDS4/meta/tmp_dev owner=sasspds
roptions="metapath=('/cadm18/SPDS4/meta/tmp_dev')
datapath=('/cadm17/SPDS4/data/tmp_dev' '/cadm18/SPDS4/data/tmp_dev'
'/cadm19/SPDS4/data/tmp_dev')
indexpath=('/cadm19/SPDS4/index/tmp_dev')
•
•
•
•
•
Different file systems for SPDS libraries
Different Unix Users
Logically separated from Production Environment
Read permission on Production Libraries
Different SPDS and SAS Work
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
Turkcell SAS Warehouse Architecture-1
Production Environment
•
•
•
On Miner Server
Different SPDS4.3 Server
Different Libraries (tmptcell, oddtcell, dmtcell)
libname=tmptcell pathname=/cadm13/SPDS4/meta/tmptcell owner=sasspds
roptions="metapath=('/cadm13/SPDS4/meta/tmptcell')
datapath=('/cadm24/SPDS4/data/tmptcell' '/cadm29/SPDS4/data/tmptcell'
'/cadm35/SPDS4/data/tmptcell' '/cadm04/SPDS4/data/tmptcell'
'/cadm12/SPDS4/data/tmptcell' '/cadm33/SPDS4/data/tmptcell')
indexpath=('/cadm24/SPDS4/index/tmptcell' ‘
‘/cadm29/SPDS4/index/tmptcell')"
•
•
•
•
•
Different file systems for SPDS libraries
Different Unix User
Logically separated from Development Environment
Read Permission on Development Libraries
Different SPDS and SAS Work
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
Turkcell SAS Warehouse Architecture-2
End User Environment
•
•
•
On Viper Server, Windows
SPDS4.3 Server is installed
Each End user and EM Projects have their own libraries
– 29 end users, 20 EM projects
– libnames.parm
libname=usr_a pathname=/cadm01/spdsmeta/usr_a owner=sasspds
roptions="metapath=('/cadm01/spdsmeta/usr_a')
datapath=('/cadm02/spdsdata/usr_a' '/cadm03/spdsdata/usr_a'
'/cadm04/spdsdata/usr_a' )
indexpath=('/cadm01/spdsindex/usr_a')“
libname=prj_a pathname=/cadm01/spdsmeta/prj_a owner=sasspds
roptions="metapath=('/cadm01/spdsmeta/prj_a')
datapath=('/cadm02/spdsdata/prj_a' '/cadm03/spdsdata/prj_a'
'/cadm04/spdsdata/prj_a' )
indexpath=('/cadm01/spdsindex/prj_a')“
•
Read permission on Production and Development Libraries
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
Turkcell Data Mining Projects
• Payment Behavior Segmentation (Credit Scoring)
• Application Scoring
• Customer Value Score (CVS)
• Churn Prediction
• Needs Based Segmentation
• Credit Limit
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
Turkcell SAS Warehouse – ETL
• LSF ( Screen shot)
• How is the projects scheduled ?
• How many processes are running ?
• How is the new projects are added ?
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
LSF and SAS/Warehouse Administrator is used for job scheduling
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
Appropriate Calendar is created in LSF
Codes are copied from Development Env to Production Env
Codes are defined in a SAS catalog
Jobs are defined for SAS catalog programs in SAS/WA Process Editor
Using LSF Scheduler Add-Ins jobs are sent to LSF
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
Calendars used :
Project Name
AppMDL
Credit Scoring
CreditScoringUPS
CreditScoringCPS
AppDay
Loyalty
Churn
YTS
CoprCVS
CVS
CBS
AppMon
Credit Scoring Reporting
Calendar Name
miner_AppMDL
miner_billing
miner_billing
miner_billing
miner_daily
miner_Loyalty
miner_monthly
miner_monthly
miner_monthly
miner_monthly
miner_monthly
miner_reporting
miner_reporting
Calendar Definition
Application Scoring Model Development
after 3rd billing date of each month
after 3rd billing date of each month
after 3rd billing date of each month
daily at 01:00 pm
Loyalty Run Time
5-8 th day of each month
5-8 th day of each month
5-8 th day of each month
5-8 th day of each month
5-8 th day of each month
1-5 th day of each month
1-5 th day of each month
Period
Updated Calendar
Name
Update the autoexec yearmonth
Miner_Monthly
miner_monthly
YES
miner_billmonth
miner_billing
YES
miner_billreport
miner_billreport
miner_billrepmonth
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
Extraction from Oracle:
104jobs, 130Hrs
Transformation in SAS:
161 jobs, 120Hrs
68jobs, 120 Hrs
24jobs, 20Hrs
miner_billrepmonth
miner_reporting
Concurrent ETL Sessions: 8
Upload from SAS:
miner_billmonth
Miner_Reporting
ETL Period : 20 days
DataMart in SAS:
miner_billrepmonth
Miner_Billing
# of jobs : 357 LSF jobs
YES
SPD Server Configuration-1
•
There are three SPD Servers for Development, Production and End User Layer
Miner(ETL machine)
Development SPDS4.3
spdsna
5180/tcp
spdssn
5181/tcp
Production SPDS4.3
spdsna43
5200/tcp
spdssn43
5201/tcp
Viper (EUL machine)
EUL SPDS4.3
spdsna43
5200/tcp
spdssn43
5201/tcp
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
#SPDS name server port for SAS9
#SPDS Snet server port for SAS9
#SPDS name server port for SAS9 SPDS4.3
#SPDS Snet server port for SAS9 SPDS4.3
#SPDS name server port for SAS9 SPDS4.3
#SPDS Snet server port for SAS9 SPDS4.3
SPD Server Configuration-2
Production spdsserv.parm
Development spdsserv.parm
EUL spdsserv.parm
SORTSIZE=2048M;
INDEX_SORTSIZE=512M;
GRPBYROWCACHE=256M;
BINBUFSIZE=32K;
INDEX_MAXMEMORY=30M;
WORKPATH="('/cadm03/SpdsW
ork4.3‘)”;
NOCOREFILE;
SEQIOBUFMIN=64K;
RANIOBUFMIN=4K;
MAXWHTHREADS=32;
MAXSEGRATIO=75;
WHERECOSTING;
RANDOMPLACEDPF;
SORTSIZE=2048M;
INDEX_SORTSIZE=128M;
GRPBYROWCACHE=128M;
BINBUFSIZE=32K;
INDEX_MAXMEMORY=30M;
WORKPATH="('/cadm38/SpdsWo
rkDev4.3')";
NOCOREFILE;
SEQIOBUFMIN=64K;
RANIOBUFMIN=4K;
MAXWHTHREADS=32;
MAXSEGRATIO=75;
WHERECOSTING;
RANDOMPLACEDPF;
MINPARTSIZE=2048M;
SORTSIZE=2048M;
INDEX_SORTSIZE=512M;
GRPBYROWCACHE=256M;
BINBUFSIZE=32K;
INDEX_MAXMEMORY=30M;
WORKPATH="('/cadm01/Spds
Work4.3' )";
NOCOREFILE;
SEQIOBUFMIN=64K;
RANIOBUFMIN=4K;
MAXWHTHREADS=16;
MAXSEGRATIO=75;
WHERECOSTING;
RANDOMPLACEDPF;
MINPARTSIZE=512M;
MINPARTSIZE=2048M;
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
SPD Server Security Model-1
•
Unix user is created first.
– useradd –u userid username
•
This user must be added to the SPDS User list.
– Psmgr
• add yyucel SASpw9 SASpw9 3 - - SUPPORT - • chgpass yyucel SASpw9 SASpw2 SASpw2
•
A macro is written to assign security an easy way it accepts two parameters,
library to be secured and users who have write access all the rest will have
read access
•
Users are login with their Unix users password, and assign the library with the
spds user created the same as unix user.
libname tmp_dev sasspds 'tmp_dev' server=miner.spdsna43
user="&SYSUSERID" passwd="SASpw2" IP=YES;
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
SPD Server Security Model-2
%macro libsecure(_libname=, _users=);
...........................................................
proc SPDO lib=&_libname;
set ACLTYPE DATA;
set acluser &user;
delete acl _all_;
%if &user=sasadm %then
add acl/libname persist;
add acl /generic persist;
modify acl _all_/read nowrite noalter nocontrol;
..........................................................
%do ind2=1 %to &numofusers;
%let userpass = %sysfunc(scan(&_users,&ind2,' '));
&userpass=(y,y,y,y)
%end; /*do loop*/
quit;
......................................................
%mend;
libname tmp_dev sasspds 'tmp_dev' server=miner.spdsprod user="sasadm" passwd="SASpw2"
unixdomain=yes netcomp=no aclspecial=yes;
%libsecure(_libname=tmp_dev, _users=ebayram sasprod churnusr sasdev hsari ukaradag);
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
SPD Server Security Model-3
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
SPD Server Enhancements-1
•
MINPARTSIZE=2GB
Most of the tables are bigger than 2GB. MINPARTSIZE was 128M before.
Increasing to 2GB, result in less data partitions for a table and decrease the
number of I/O requests.
Test: Reading from a table having 22 partitions was 100% faster than reading
from the same table having only 351 partitions.
•
SORTSIZE
Increased from 48MB to 2048MB
Few MERGE jobs would benefit from the extra memory allocated for SORT.
The machine has 24GB of RAM.
•
The MAXWHTHREAD server parameter was set to 64 threads. During
scalability tests, it was determined that 32 threads were adequate. By
cutting the number by half, This would prevent the server from over
threading and further reduce thread contention during queries.
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
SPD Server Enhancements-2
126
127
128
129
130
131
132
133
134
135
136
137
data &perm_tmp_lib..Corporate_Indicators30 ;
merge
&perm_tmp_lib..Corporate_Indicators(in=in1)
&perm_tmp_lib..SegmentIndicators_&RefPeriod
&perm_tmp_lib..ScoreIndicators_&RefPeriod
&perm_tmp_lib..NumDiffBnoIndicators_&RefPeriod
;
by Party_id;
if in1;
run;
NOTE: There were 170660 observations read from the data set TMPTCELL.CORPORATE_INDICATORS.
NOTE: There were 35093410 observations read from the data set TMPTCELL.SEGMENTINDICATORS_200508.
NOTE: There were 27819995 observations read from the data set TMPTCELL.SCOREINDICATORS_200508.
NOTE: There were 38038783 observations read from the data set TMPTCELL.NUMDIFFBNOINDICATORS_200508.
NOTE: The data set TMPTCELL.CORPORATE_INDICATORS30 has 170660 observations and 1412 variables.
NOTE: Compressing data set TMPTCELL.CORPORATE_INDICATORS30 decreased size by 63.23 percent.
NOTE: DATA statement used (Total process time):
real time
1:19:12.66
cpu time
56:52.57
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
SPD Server Enhancements-3
140
proc sql;
141
create table &perm_tmp_lib..Corporate_Indicators31 as select a.*, b.*,c.* ,d.*
142
from &perm_tmp_lib..Corporate_Indicators a
143
left join &perm_tmp_lib..SegmentIndicators_&RefPeriod b on a.party_id=b.party_id
144
left join &perm_tmp_lib..ScoreIndicators_&RefPeriod c on a.party_id=c.party_id
145
left join &perm_tmp_lib..NumDiffBnoIndicators_&RefPeriod d on a.party_id=d.party_id
146
;
NOTE 49-169: The meaning of an identifier after a quoted string may change in a future SAS release. Inserting
white space between
a quoted string and the succeeding identifier is recommended.
NOTE 49-169: The meaning of an identifier after a quoted string may change in a future SAS release. Inserting
white space between
a quoted string and the succeeding identifier is recommended.
NOTE: Physical Name: TMPTCELL
NOTE: Compressing data set TMPTCELL.CORPORATE_INDICATORS31 decreased size by 63.23 percent.
NOTE: Table TMPTCELL.CORPORATE_INDICATORS31 created, with 170660 rows and 1412 columns.
147
quit;
NOTE: PROCEDURE SQL used (Total process time):
real time
25:30.96
cpu time
55.61 seconds
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
SPD Server Enhancements-4
177
178
proc sql;
create table &perm_tmp_lib..Corporate_Indicators2 as select a.*, b.*,c.*
,d.*,e.*,f.*,g.*,h.*,j.*,k.*,l.*,m.*,n.*
179
from &perm_odd_lib..Corporate_Indicators a
180
left join &perm_tmp_lib..SegmentIndicators_&RefPeriod b on a.party_id=b.party_id
181
left join &perm_tmp_lib..ScoreIndicators_&RefPeriod c on a.party_id=c.party_id
182
left join &perm_tmp_lib..NumDiffBnoIndicators_&RefPeriod d on a.party_id=d.party_id
183
left join &perm_tmp_lib..SekreterCellIndicators_&RefPeriod e on a.party_id=e.party_id
184
left join &perm_tmp_lib..ServiceIndicators_&RefPeriod f on a.party_id=f.party_id
185
left join &perm_tmp_lib..TelephoneNumHistInd_&RefPeriod g on a.party_id=g.party_id
186
left join &perm_tmp_lib..AccountStatsIndicators_&RefPeriod h on a.party_id=h.party_id
187
left join &perm_tmp_lib..Postpaidpayment_&RefPeriod j on a.party_id=j.party_id
188
left join &perm_tmp_lib..Freeminuteindicators_&RefPeriod k on a.party_id=k.party_id
189
left join &perm_tmp_lib..Freeminuteindicators2_&RefPeriod l on a.party_id=l.party_id
190
left join &perm_tmp_lib..MergeCallStatsIndicators_&RefPeriod m on a.party_id=m.party_id
191
left join &perm_tmp_lib..Loyaltyindicators_&RefPeriod n on a.party_id=n.party_id
192
;
NOTE: PROCEDURE SQL used (Total process time):
real time
1:16:48.51
cpu time
1:15.06
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
SPD Server Enhancements-5
•
Adding option IP=YES to all SPD Server libname statements.
The option would enable implicit SQL pass-through
Counting the rows of a table with proc sql is very fast now. In SAS
EM 4.3 also, when you choose a table from Input Data Source, all
the rows are counted. For a 4GB table, it lasts 2 minutes to count
the rows before, but 2 seconds after the option is set. This option will
increase the speed of choosing a table from Input Data Source
node in case of counting the rows.
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
SAS EM4.3 Performance Tips-1
• Duration of opening an EM4.3
project is too long.
• Duration of listing the libraries in
IDS is too long.
We decreased the number of
libraries from 67 to 26
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
Number
of
Libraries
Duration to
open the
EM4.3
project (sec)
Duratio to list
the libraries
in IDS (sec)
3
42
9
10
50-70
17
15
50-70
18
25
100-120
42
35
100-130
41
45
100-130
46
55
100-130
54
67
100-130
63
SAS EM4.3 Performance Tips-2
#of
TMPTCELL
Libraries (769 tables)
• Duration of listing the tables in
IDS is too long.
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
EERDAL
(42 tables)
3
120
5
10
131
5
15
87
6
25
99
4
35
92
5.5
45
86
6
55
96
5.5
67
96
6
SAS EM4.3 Performance Tips-3
# of
Records
• Duration of
opening a table
in IDS is too long.
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
900,000
1,000,000
1,200,000
1,400,000
1,600,000
1,800,000
2,000,000
2,159,786
2,159,786
2,159,786
2,159,786
2,159,786
2,159,786
2,159,786
2,159,786
2,159,786
# of
Columns
564
564
564
564
564
564
564
564
564
564
564
564
564
564
564
50
100
150
200
250
300
350
400
564
Table Name
Size of Table
(MB)
Duration to
open a table
in IDS(secs)
ODDTCELL.TEST100bin
ODDTCELL.TEST200bin
ODDTCELL.TEST300bin
ODDTCELL.TEST400bin
ODDTCELL.TEST500bin
ODDTCELL.TEST600bin
ODDTCELL.TEST700bin
ODDTCELL.TEST800bin
ODDTCELL.TEST900bin
ODDTCELL.TEST1000bin
ODDTCELL.TEST1200bin
ODDTCELL.TEST1400bin
ODDTCELL.TEST1600bin
ODDTCELL.TEST1800bin
ODDTCELL.TEST2000bin
ODDTCELL.NTEST50
ODDTCELL.NTEST100
ODDTCELL.NTEST150
ODDTCELL.NTEST200
ODDTCELL.NTEST250
ODDTCELL.NTEST300
ODDTCELL.NTEST350
ODDTCELL.NTEST400
CKOYLU.NEEDS_FINAL
402
803
1,205
1,606
2,008
2,410
2,811
3,213
3,614
4,016
4,819
5,622
6,425
7,229
8,032
788
1,606
2,430
3,279
4,185
4,663
5,487
6,265
8,673
0.3
0.6
1.0
1.3
2.0
2.7
3.1
2.6
3.2
3.6
4.1
5.2
5.3
6.0
7.3
0.5
1.1
1.9
2.7
6.4
2.8
6.0
4.6
7.1
THANK YOU
Corporate Communications
Turkcell Genel Sunum Ing 280206
28.02.2006

Benzer belgeler