Introduction to SAS 9
Data | Procedures | Examples | Download | Exercise | Evaluation
Creation date: 03/01/05
Author: Patrick McLeod
Objectives: This is the first of two courses in the SAS short course series. It is designed for beginning users who want to get started with the program and experienced users who want to acquaint themselves with the latest version of the SAS system. We focus on the latest version 9.1 in this course. After this course, you should be able to:
1. Understand how SAS processes a program;
2. Be familiar with the simple data handling process;
3. Get accustomed to the SAS reiterative process;
4. Manage data sets in SAS;
5. Perform simple statistical analysis
I. What is SAS?
II. Who can use SAS for Windows?
III. Using SAS for Windows
IV. SAS data handling - Data Step
V. SAS Files
VI. SAS Procedures
Appendix: SAS function keys
I. What is SAS?
SAS stands for Statistical Analysis System. It is a statistical and information system that performs sophisticated data management and statistical analysis. SAS is available in multiple computing environments. At the University of North Texas, we have obtained the licenses of the software for different operating systems including Windows (version 9.1.3 is the most current), Mac (version 6.12 is the most current) and UNIX. In this series, we will focus on SAS 9.1 for Windows, which is a complete data analysis program with capabilities comparable to, and, in some aspects, surpassing, its counterparts SPSS, S-Plus and Stata. SAS for Windows will do every task that other editions of SAS do, plus it is easy to use and its graphic user interface can do a lot more in graphical analyses than the retired mainframe or current UNIX versions. The only limitation to SAS for Windows is the hardware that it is run on. SAS 9.1 for Windowms requires a fully patched Windows 2000 or Windows XP operating system with at least a Pentium III processor. SAS 9.1 is a large program and will require at least a gigabyte of room on your hard disc; I would not recommend installing and running SAS 9.1 on a computer with less than 128MB RAM. As with many Windows applications, the more RAM your machine has, the "better" (faster) the program will run. SAS 9.1 will not work on any versions of Windows prior to Windows 2000. If you cannot upgrade your OS and you need SAS, Research and Statistical Support can provide you with a currently-licensed version of SAS 8.2.
II. Who can use SAS for Windows?
SAS software is distributed through the university's site license agreement with SAS Institute. UNT has a site license that allows students to use the software in any general access labs on campus or to purchase time-sensitive copies through the Union Bookstore. These copies expire at the end of the current licensing period in which the software is produced (our SAS licenses typically expire on October 31 of each year with a grace period that will run through January 1st). Full-time faculty and staff can request SAS installation on their desktop machines and laptop machine on campus or at home. Check the UNT bookstore or the RSS office for more details pertaining to the student version of SAS 9.
UNT students and faculty are also eligible to use SAS on other platforms than Windows. As of this writing, SAS 9 is available on sol, our Solaris server. Users who want to use this version need a UNIX/sol account. Application for these accounts are available to the Computing Center Helpdesk (565-2324).
III. Using SAS for Windows
1. Windows Display System
The new SAS 9 expands from the previous version to include three new elements on top of its basic three window display system. The Explorer window and Results window provide user better control and organization of the SAS objects (e.g. SAS libraries, SAS data sets) and system output. The latest Enhanced Editor provides a more sophisticated programming editor that is equipped with color coding, macro features and better customizability for programmers' needs. For more new features of the new Enhanced Editor, see Appendix I. Despite that there are more windows to deal with, the new SAS 9 workspace is more versatile and easier to use with the new windowing design.
In its original design, the SAS Display Manager System is composed of three smaller windows labeled Explorer, Log, and Output. The first of which is for program input while the latter two respectively provide the program system log information and the output of the program. The program log is primarily diagnostics of the programming syntax and gives information about if the program was written correctly and hints for debugging otherwise. It also reports information about site license, user information, and the SAS release number. The Output window is where results output by SAS procedures are shown. If the program does not go through correctly, the output window will usually be blank or report the previous output.
The new Explorer window provides easy access to SAS objects such as File shortcuts and Libraries objects. You can assign SAS programs or different versions of the same program to a clickable shortcut object using the File shortcut. Accessing SAS libraries and data sets are much easier in version 8. All objects are clickable with activation of appropriate windows such as the Viewtable window for data sets.
The Result window is another Windows Explorer-like utility that allows output to be organized in a hierarchical fashion.
2. Keys Windows - A Road Map to SAS windows
The new SAS 9 has even more windows than its predecessor. To name a few, apart from the three default windows, they include the Library window, Filename window, Viewtable window, Keys window, Options window, Graphic windows, etc. It is imperative to use a "Road Map" to surf around the SAS workspace. The Keys window plays such a role that guides and gives you shortcuts to different windows. To activate the Keys window, point and click on the Command box at the upper left corner and type "keys" (case-insensitive).
The Keys window allows you to assign function keys to switch to the most frequently used windows and perform the most used functions. In the above example, the F5, F6 and F7 are assigned to default windows (Program Editor, Log and Output respectively) as in the previous versions and versions in other operating environments. For the rest of the function keys and hot keys (a combination of function keys, letter keys Control and/or Shift keys), you are basically free to assign what you want or what you use most. For instance, assigning F12 to "next" allows you to switch to the next window, F2 to "lib" to open the library window, etc. It is much more convenient to use the function keys than clicking on the menu to perform the frequently used functions or windows. An alternative is to take advantage of the window tabs at the bottom of the SAS window. Taking after the Windows taskbar, this SAS taskbar provides shortcuts to the open windows.
3. Entering SAS Statements
To enter SAS statements, simply move the cursor in the Enhanced Editor window and click the left mouse button.
A SAS program normally starts with a data step. Each statement can go on several lines, but it MUST end with a semi-colon. If you want to go to the next line, simply press ENTER.
Conventions on Windows operations are applicable in SAS for Windows. Cutting and pasting, for example, make program editing much easier for users.
By default, SAS for Windows displays the five windows simultaneously. But, you can select Cascade or Tile under Window option at the menu bar to choose the format of display.
Normally, only one window is active at a time. Simply moving the cursor to a particular window and click within the window area and the color change on the panel (top bar of the window) notifies which is the active window. When you enter a SAS session, the Program Editor window is active by default. To make the LOG window active, type LOG at the command box located at left hand corner underneath the menu bar.
4. Submitting SAS Statements
When you have entered your program correctly, you are ready to submit these statements for execution. There are many ways to do so. You can:
1. Highlight the syntax and press F3 or;
2. type SUBMIT at the command box right below the menu bar or;
3. click your right mouse button and select LOCAL, SUBMIT.
Processing of a SAS Program
If something goes wrong in your SAS statements, SAS will issue error messages in the LOG window. To check if there is an error message, you need to go to the LOG window. Type LOG at the Program Editor command line. Use PageUp and Page Down keys to scroll up and down the window. When the error is located, you may want to go back to your SAS program and make some changes. Type PGM at the command line in the LOG window to make your Program Editor window active. At the command line in the Program Editor type RECALL, you will get your SAS program back. An easy way to do so is to hit F4.
5. Saving SAS Statements
If you wish to save your SAS program, click File on the menu and select save. Give a file name like "A:\mypgm.sas", which saves the SAS program file on to your floppy diskette. Alternatively, you can also type in the command box "FILE A:\mypgm.sas". This will save every thing on the Program Editor window into drive A: under the name MYPGM.SAS. The same applies to the LOG and OUTPUT windows. Note that by convention, the file extension .sas stands for SAS program files, .log stands for SAS log files and .lst stands for listing or output files.
*.sas - SAS program file
*.log - SAS log file
*.lst - SAS output file
6. Bringing SAS Programs into a SAS Session
If you want to bring a file into the SAS Program Editor window once you have been in SAS for Windows session, type INCLUDE 'A:\MYPGM.SAS' at the Program Editor command line. An alternative way is to use the menu bar and choose from the FILE option. SAS will retrieve a file named 'MYPGM.SAS' from drive A: into the Program Editor window.
7. Ending a SAS Session
To end a SAS session, double click the uppermost left hand corner button or type ENDSAS (abbreviated: ENDS) at the command line in any window. You can close any window by typing END at the command line when the window is active.
IV. SAS data handling - Data Step
Normally, SAS users do not pay attention to what type of files SAS uses in a SAS session. This section distinguishes several types of files that SAS can handle. Knowing this, you will be able to use each type of file advantageously. The following flow chart illustrates how SAS data sets are processed:
A data file that has been entered in the SAS Data step after the CARDS command needs to
be converted into a SAS data file before SAS can use it. The DATA step takes care of this
input x y z @@;
1 2 3 4 5 6 7 8 9
This sample SAS program creates a temporary data set using the DATALINES statement to read in the in-line data. The Data statement defines a temporary data set; the INPUT statement defines the variables and their formats; the DATALINES statement gives instruction to start reading in the data that follow. The RUN statement ends this session of the program and submits for processing.
The most common type of data is sometimes referred to as an External File, Raw File, or even a Text File. These files have the same characteristics: they are made up of numbers and/or characters and they can be processed by other programming languages as well as SAS. There are two ways to incorporate this kind of file into a SAS program. The first and the commonly used one is to put data after a CARDS command as in the previous example. Another method is to refer to the location of data in the SAS program. The latter method is more efficient than the former, because it reduces the size of your SAS program to a more manageable level, especially, when your data set has over a thousand observations. The following SAS program shows you how to accomplish the latter method.
FILENAME DATAIN 'A:\COUNTRY.DAT'; DATA COUNTRY; INFILE DATAIN; INPUT DEC 1 ID 2-4 NAME $CHAR26. SSCODE 31-33 CONTIN $ 34-35 DODEV 36 POPULATE 37-43 AREA 44-49 GNP 50-56 MILEXPED 57-64 .1 PEDEXPED 65-71 .1; PROC PRINT DATA=COUNTRY; RUN;
The FILENAME statement tells SAS to use DATAIN as a file reference for the data set named 'COUNTRY.DAT'. The INFILE command tells SAS to get the data file on drive A: under 'COUNTRY.DAT'.
In SAS 9 supported data files formats include dBASE files, Lotus 1-2-3, Microsoft Excel spreadsheets and Microsoft Access tables. Using the Import Wizard, the user will be guided to create a data set from files in these formats. To import files, click on File on the menu and select import as in the following:
The Import Wizard will guide you through the importation process.
After reading in the data, you can check the data under Libraries in the Explorer window. Double-click the Libraries icon and open the Library window. The data file just created is in the WORK library.
V. SAS Files
SAS uses a special data format during data processing. This unique data format is called a SAS data or system File. If the data file you tell SAS to use is not a SAS File, SAS converts it to a SAS File before SAS starts processing the data set. SAS Files have special characteristics that make them more convenient and efficient for SAS to use. There are two types of SAS files: SAS Data Sets (*.sas7bdat) and SAS Catalogs (*.sc2). The most commonly used is the SAS Data Set. In a SAS Data Set, variable names, variable labels, and variable formats have been recorded together with the variable values.
A SAS File name is somewhat different from other types of data file names. A complete SAS file name consists of two parts separated by a period, for example PROJECT1.FITNESS. The first part is called the first-level name or libref, identifying the directory or library where the file is saved. The second part, the second-level name identifies the specific file name in the directory or library. Anyone can create a SAS Data Set from a regular file. The following Example 3 shows how to do this.
TITLE 'SAS SAMPLE - COUNTRY DATA'; DATA WORK.COUNTRY; ARRAY MSG GNP MILEXPED PEDEXPED; infile 'A:\country.dat'; input dec 1 id 2-4 name $char26. sscode 31-33 contin $ 34-35 dodev 36 populate 37-43 area 44-49 gnp 50-56 milexped 57-64 .1 pedexped 65-71 .1; label name = "COUNTRY NAME" CONTIN = 'CONTINENT' DODEV = 'DEGREE OF DEVELOPMENT' GNP = 'GNP IN MILLIONS OF DOLLARS' MILEXPED= 'MILITARY EXPENDITURE IN MILLIONS OF $' PEDEXPED= 'PUB. EDUCATION EXPENDITURE IN MIL. $'; DO OVER MSG; IF MSG= 9999999 OR MSG = 999999.9 OR MSG=99999.9 THEN MSG=.; END; RUN; PROC PRINT DATA=WORK.COUNTRY; RUN;
The LIBNAME directs SAS to associate WORK with the directory A:\MYDATA. After this job has been executed, you will have a SAS Data Set saved under A:\MYDATA\country.sas7bdat. Retrieving a SAS Data Set is easy because you do not have to tell SAS the variable names, variable formats, variable label, and variable locations.
In a SAS for Windows session, you can have as many SAS data steps as you want. You can use the LIBNAME command as often as you need to direct SAS for Windows to different SAS data directories. In case you have many SAS data files in a SAS program, SAS for Windows allows you to keep track of your SAS data files and their variables.
SAS for Windows has LIBNAME, DIRECTORY, and VARIABLES windows. The LIBNAME window tells you how many SAS data libraries are in a SAS program. The DIRECTORY window displays how many SAS data files are in a SAS data library or directory. The VARIABLES window lists the SAS variables in each SAS data file. To tell SAS for Windows to go to the LIBNAME window, you type LIB at the command box. A list of libraries or directories will be shown on a new window (LIB Window). You can also go to the DIR and VAR windows directly by typing DIR and VAR respectively at the command prompt in any SAS for Windows display Manager window. By doing this, SAS for Windows displays the current directory which is the WORK directory. To tell SAS for Windows to display the desired directory, you can type the name of the directory at the top of the window. You can do the same with the VAR window.
VI. SAS Procedures
The following covers some of the most commonly used SAS procedures with which you can run some basic statistical analyses.
1. PROC PRINT
PROC PRINT is frequently used to check the data being read by SAS. It prints out
the observations in a SAS data set, using any or some of the variables.
The syntax is as follows:
PROC PRINT DATA= SAS-data-set
The most common use is to have the PROC PRINT following the data step to verify the data:
INPUT X Y;
2. PROC CONTENTS
This procedure prints descriptions of the contents of one or more files from a SAS library. Another common procedure to verify the data set read into SAS library, especially for a sizeable data set. It is crucial, for example, to check if all observations and variables are read in correctly. PROC CONTENTS prints descriptions of the contents of one or more files from a SAS data library. It is useful for documenting permanent SAS data sets (library members of DATA type).
Specific information pertaining to the physical characteristics of a member depends on whether the file is a SAS data set or another type of SAS file.
PROC CONTENTS <DATA= <libref.>member>
3. PROC UNIVARIATE
This procedure is useful for basic descriptives of the variables. It provides detail on the distribution of a variable. Features include:
If a BY statement is used, descriptive statistics are calculated separately for groups of observations.
PROC UNIVARIATE DATA= SASdataset
ROUND= roundoff unit...;
OUTPUT OUT= SASdataset keyword= names...;
4. PROC FREQ
The procedure produces one-way to n-way frequency and crosstabulation tables. It shows the distribution of variable values and crosstabulation tables with combined frequency distributions for two or more variables. For one-way tables, PROC FREQ can compute chi-square tests for equal or specified proportions. For two-way tables, PROC FREQ computes tests and measures of association. For n-way tables, PROC FREQ does stratified analysis, computing statistics within as well as across strata.
PROC FREQ options;
OUTPUT <OUT= SAS-data-set><output-statistic-list>;
TABLES requests / options;
5. PROC TABULATE
PROC TABULATE constructs tables of descriptive statistics using class variables,
analysis variables, and keywords for statistics. Tables can have one to three dimensions:
column; row and column; or page, row, and column.
The statistics that PROC TABULATE computes are many of the same statistics computed by other descriptive procedures such as MEANS, FREQ, and SUMMARY. In order for PROC TABULATE to execute, you need either a CLASS or VAR statement, and a TABLE statement. There are no default variables chosen for the procedure.
PROC TABULATE <option-list>;
FORMAT variable-list-1 format-1 <...variable-list-n format-n>;
LABEL variable-1='label-1' <...variable-n='label-n'>;
BY <NOTSORTED> <DESCENDING> variable-1
TABLE <<page_expression,> row_expression,> column_expression
KEYLABEL keyword-1 ='description-1'
6. PROC MEANS
PROC MEANS computes statistics for an entire SAS data set or for groups of observations in the data set. If you use a BY statement, PROC MEANS calculates descriptive statistics separately for groups of observations. Each group is composed of observations having the same values of the variables used in the BY statement. The groups can be further subdivided by the use of the CLASS statement. PROC MEANS can optionally create one or more SAS data sets containing the statistics calculated.
PROC MEANS is the easiest and most direct descriptive procedure for computing univariate statistics. Other SAS procedures which compute univariate statistics and provide additional features are CHART, TABULATE, and UNIVARIATE.
The full syntax for PROC MEANS is as follows:
PROC MEANS <option-list> <statistic-keyword-list>; VAR variable-list; BY variable-list; CLASS variable-list; FREQ variable; WEIGHT variable; ID variable-list; OUTPUT <OUT= SAS-data-set> <output-statistic-list> <MINID|MAXID <(var-1<(id-list-1)> <...var-n<(id-list-n)>>)>=name-list>;
7. PROC REG
PROC REG is a general-purpose procedure for regression, while other regression procedures in the SAS System implement more specialized applications. PROC REG provides nine model-selection methods, tests linear hypotheses and multivariate hypotheses, generates scatter plots of data and various statistics, computes collinearity diagnostics and influence statistics, produces partial leverage plots, and outputs statistics to a SAS data set, including predicted values, residuals, ridge regression estimates and confidence limits. PROC REG fits linear regression models by least-squares estimation. Subsets of independent variables that "best" predict the dependent or response variable can be determined by various model-selection methods. PROC REG can be used interactively.
The full syntax for PROC REG is as follows:
PROC REG options; label: MODEL dependents= regressors / <options>; BY variable-list; FREQ variable; ID variable; VAR variable-list; ADD variable-list; DELETE variable-list; REWEIGHT <condition|ALLOBS> </options> | <STATUS|UNDO>; WEIGHT variable; label: MTEST <equation1, ... equationk / options>; OUTPUT OUT= SAS-data-set keyword= names ...; PAINT <condition|ALLOBS> </options> | <STATUS|UNDO>; PLOT <yvariable1*xvariable1> <=symbol1>,... <yvariablek*xvariablek> <=symbolk> </options>; PRINT <options ANOVA MODELDATA>; REFIT; RESTRICT equation1, ... equationk; label: TEST equation1, ... equationk / option;
8. PROC ARIMA
The ARIMA procedure implements a flexible and powerful method to analyze and forecast time series data. PROC ARIMA's implementation is similar to that of programs 1-7 in Part V of Box and Jenkins (1976). PROC ARIMA can handle time series of moderate size; there should be more than 30 observations and less than 2000. You should consider other procedures, such as FORECAST or AUTOREG, if PROC ARIMA does not meet your needs.
PROC ARIMA models a value in a response time series as a linear combination of its own past values, past errors (shocks, innovations) and past values of other time series.
The full syntax for PROC ARIMA is as follows:
PROC ARIMA options; IDENTIFY VAR=variable options; ESTIMATE options; FORECAST options; BY variables;
Appendix: SAS function keys
Recommended mapping of SAS functions keys (Windows, Mac, UNIX):
In the next SAS short course we will cover:
Last updated: 01/18/06 by Patrick McLeod