SAS 9 Intermediate Workshop, Part II
Exercise | Download | Evaluation
Creation date: 10/06/2005
Author: Patrick McLeod
Objectives: This is the third of the SAS short course series. It is designed for intermediate users and experienced users who have taken the first two classes and want to focus on the statistical and reporting procedures in SAS. After this course, you should be able to:
1. Understand the SAS procedure syntax;
2. Perform data analysis in SAS
3. Perform advanced data management in SAS
4. Be familiar with new developments in the latest version of SAS
Topics:
Review
I. Introduction
II. Functional Categories of Base SAS procedures
III. Report Writing
IV. Examples
Before we get started with the Procedure step, let's refresh what we have learned in the previous class.
DATA step
A DATA step consists of a group of statements that reads ASCII text data (in a computer file) or existing SAS data sets in order to create a new SAS data set. |
A SAS program usually starts with A DATA step. A DATA step consists of a group of statements that reads ASCII text data (in a computer file) or existing SAS data sets in order to create a new SAS data set. The DATA step must begin with the DATA statement and should end with a RUN statement. Data set can be created and stored in a permanent library. Otherwise, it will stay in a temporary library (by default, WORK) which lasts as long as the current SAS session, i.e. such data sets will be erased when you exit SAS. Data manipulation, such as creating or renaming a variable, must be done in a DATA step and cannot be done in a PROC step.
One can read in the data using the CARDS statement and embed the data in the data step or read in an external file using the INFILE statement. The following exemplifies the former method (note: the free input method is used despite the fixed column data):
DATA CLASS;
INPUT NAME $ SEX $ AGE HEIGHT WEIGHT;
DATALINES;
Alice F 13 56.0 84
Barbara F 14 62.0 102
Bernadette F 13 65.0 98
Jane F 12 59.0 84
Janet F 15 62.0 112
Joyce F 11 51.0 50
Judy F 14 64.0 90
Louise F 12 56.0 77
Mary F 15 66.0 112
Alfred M 14 69.0 112
Henry M 14 63.0 102
James M 12 57.0 83
Jeffery M 13 62.0 84
John M 12 59.0 99
Philip M 16 72.0 150
Robert M 12 64.0 128
Ronald M 15 67.0 133
Thomas M 11 57.0 85
William M 15 66.0 112
;
RUN;
Note that the CARDS statement will not be complete without a ‘;’ put on a new line and the DATA step ends with RUN; You can always verify the data by PROC PRINT or PROC CONTENTS.
The new version of SAS also features a data import wizard that directly reads in data in various formats.
Library
A SAS data library is a collection of SAS files that are recognized as a unit by the SAS system. |
A SAS library is like a special SAS pointer to a location where your SAS files are stored. Once a library is created, SAS has access to the files in that library. When you delete a library, the files are still on your computer, but SAS no longer has access to them. By creating a library, you are essentially giving SAS a shortcut name or pointer to a storage location in your operating environment where you store SAS files.
To create a library, use the following statement:
LIBNAME libref 'path:directory';
libref is the library reference name assigned by the programmer. It is bound by the
conventional eight character, no punctuation rule.
I. Introduction: The Procedure Step
In this workshop, we focus on the SAS procedure step that covers running SAS procedures on SAS data sets. A PROCedure step calls a SAS procedure to analyze or process a SAS dataset. The PROC step begins with a PROC statement and ends with a RUN statement. All of the statistical procedures require the input of a SAS data set. This data set should have already been prepared in a DATA step for processing by the procedure, since SAS procedures allow only limited adjustment of the data set.
The general syntax for a PROC step is:
PROC name [DATA=dataset [dsoptions] ] [options];
[other PROC-specific statements;]
[BY varlist;]
RUN;
where:
name |
identifies the procedure you want to use. |
dataset |
identifies the SAS data set to be used by the procedure; if omitted, the last data set to have been created during the session is used. |
dsoptions |
specifies the data set options to be used. |
varlist |
specifies the variables that define the groups to be processed separately. The data set must already be sorted by these same variables. |
options |
specifies the PROC-specific options to be used. |
The syntax above uses the following conventions for statements:
. SAS keywords are in UPPERCASE;
. User-supplied words (such as file names or variable names) are in lowercase;
. Options are in brackets [ ] . Note that you do not type the brackets.
This is a simplified form of the syntax conventions used in SAS manuals and in documentation for most statistical packages.
A SAS program can contain any number of DATA and PROC steps. The SAS statements in each step are executed all together. Once a dataset has been created, it can be processed by any subsequent DATA or PROC step. Note the following rules of the SAS statements:
- All SAS statements start with a keyword (DATA, INPUT, PROC, etc.)
- All SAS statements end with a semicolon (;) . (The most common problem students encounter is omitting a semicolon -- SAS thinks that two statements are just one.)
- SAS statements can be entered in free-format : You can begin in any column, type several statements on one line or split a single statement over several lines (as long as no word is split.).
- Uppercase and lowercase are equivalent, except inside quote marks ( sex = 'm'; is not the same as sex = 'M';).
SAS Procedures exist to carry out all the forms of statistical analysis. As the above examples indicate, a procedure is invoked in a "PROC step" which starts with the keyword PROC, such as:
PROC MEANS DATA=CLASS;
VAR HEIGHT WEIGHT;
RUN;
This simple PROC MEANS code yields the following output:
The VAR or VARIABLES statement can be used with all procedures to indicate which variables are to be analyzed. If this statement is omitted, the default is to include all variables of the appropriate type (character or numeric) for the given analysis.
Some other statements that can be used with most SAS procedure steps are:
BY variable(s);
Causes the procedure to be repeated automatically for each different value of the named variable(s). The data set must first be sorted by those variables.
ID variable(s);
Give the name of a variable to be used as an observation IDentifier.
LABEL var='label';
Assign a descriptive label to a variable.
WHERE (expression);
Select only those observations for which the expression is true.
For example, the following lines produce separate means for males and females, with the variable SEX labeled 'Gender'. (An ID statement is not appropriate, because PROC MEANS produces only summary output.)
PROC SORT DATA=CLASS;
BY SEX;
PROC MEANS DATA=CLASS;
VAR HEIGHT WEIGHT;
BY SEX;
LABEL SEX='Gender';
RUN;
If the DATA= option is not used, SAS procedures process the most recently created dataset. In the brief summaries below, the required portions of a PROC step are shown in bold. Only a few representative options are shown.
Report Writing |
CALENDAR | MEANS^{*} | SQL^{*} |
CHART^{*} | PLOT | SUMMARY^{*} |
FORMS | TABULATE^{*} | |
FREQ^{*} | REPORT^{*} | TIMEPLOT |
*These procedures produce reports and compute statistics. |
Statistics |
CHART | RANK | SUMMARY |
CORR | REPORT | TABULATE |
FREQ | SQL | UNIVARIATE |
MEANS | STANDARD |
Utilities |
APPEND | EXPLODE | REGISTRY |
BMDP^{**} | EXPORT | RELEASE^{**} |
CATALOG | FORMAT | SORT |
CIMPORT | FSLIST | SOURCE^{**} |
COMPARE | IMPORT | SQL |
CONTENTS | OPTIONS | TAPECOPY^{**} |
CONVERT^{**} | PDS^{**} | TAPELABEL^{**} |
COPY | PDSCOPY^{**} | TRANSPOSE |
CPORT | PMENU | TRANTAB |
DATASETS | PRINTTO | |
**See the SAS documentation for the operating environment for a description of these procedures. |
IV. Examples
PROC CORR
Correlations among a set of variables.
PROC CORR DATA=SASdataset options;
options:NOMISS ALPHA
VAR variable(s);
WITH variable(s);
where nomiss option excludes missing values and ALPHA specifies Pearson Correlations with Cronbach’s alpha.
Example
To get the correlation coefficients fro HEIGHT and WEIGHT, use the VAR statement:
PROC CORR;
VAR HEIGHT WEIGHT;
RUN;
The output should look like:
PROC FREQ
Frequency tables, chi ?tests
PROC FREQ DATA=SASdataset;
TABLES variable(s) / options;
options:NOCOL NOROW NOPERCENT
OUTPUT OUT=SASdataset;
Example
To get the frequency of AGE in Data Class:
PROC FREQ DATA=CLASS;
TABLES AGE;
RUN;
Then output should look like:
Also, you can get the crosstab table for two variables. For example, if you want to examine the relationship between AGE and HEIGHT, you can use the Frequency procedure to produce a cross table for them.
PROC FREQ DATA=CLASS;
TABLES AGE*HEIGHT;
RUN;
PROC MEANS
PROC MEANS provides the user with means, standard deviations, and a host of other univariate statistics for a set of variables.
PROC MEANS DATA=SASdataset options;
options:N MEAN STD MIN MAX SUM VAR CSS USS
VAR variable(s);
BY variable(s);
OUTPUT OUT=SASdataset keyword=variablename ... ;
Statistical options on the PROC MEANS statement determine which statistics are printed. The (optional) OUTPUT statement is used to create a SAS dataset containing the values of these statistics.
Example
You can examine the means of WEIGHT for different SEX.
PROC MEANS;
BY SEX;
VAR WEIGHT;
RUN;
The output should look like:
PROC UNIVARIATE
PROC UNIVARIATE provides univariate statistics and displays for a set of variables.
PROC UNIVARIATE DATA=SASdataset options;
options:PLOT
VAR variable(s);
BY variable(s);
OUTPUT OUT=SASdataset keyword=variablename ... ;
Example
You can examine the univariate statistics like median and kurtosis of WEIGHT for different SEX:
Here's an example specifying the PLOT option:
PROC UNIVARIATE DATA=class PLOT;
VAR weight;
BY sex ;
RUN;
SAS statements and options for regression (PROC REG) are described in more detail in the document PROC REG Summary. SAS statements and options for analysis of variance (PROC ANOVA and PROC GLM) are described in the document PROC ANOVA and PROC GLM.
PROC ANOVA
Analysis of variance (balanced designs)
PROC ANOVA DATA=SASdataset options;
CLASS variable(s);
MODEL dependent(s)= effect(s);
PROC GLM
General linear models, including ANOVA, regression and analysis of covariance models.
PROC GLM DATA=SASdataset options;
CLASS variable(s);
MODEL dependent(s)= effect(s);
OUTPUT OUT=SASdataset keyword=variablename ... ;
PROC REG
Regression analysis
PROC REG DATA=SASdataset options;
MODEL dependent(s) = regressors
/ options;
PLOT variable | keyword. *
variable | keyword. = symbol ;
OUTPUT OUT=SASdataset P=name R=name ... ;
PROC CHART
Histograms and bar charts
PROC CHART DATA=SASdataset options;
VBAR variable / options;
HBAR variable / options;
options: MIDPOINTS= GROUP= SUMVAR=
PROC PLOT
Scatter plots
PROC PLOT DATA=SASdataset options;
options: HPERCENT= VPERCENT=
PLOT yvariable *
xvariable = symbol / options;
PLOT (yvariables) *
(xvariables) = symbol / options ;
PLOT options: BOX OVERLAY VREF= HREF=
BY variable(s) ;
Note that the parenthesized form in the PLOT statement plots each y-variable listed against each x-variable.
Two examples using PROC PLOT:
EXAMPLE 1: Combining PROC SORT and PROC PLOT
Import Excel data using the Import wizard; file name education.xls.
options nodate pageno=1 linesize=80 pagesize=35;
proc sort data=education;
by region;
run;
proc plot data=education;
by region;
plot expenditures*dropoutrate='*' $ state / href=28.6;
title 'Plot of Dropout Rate and Expenditure Per Pupil';
run;
EXAMPLE 2: Producing a contour plot using PROC PLOT (a contour plot is a way of representing the values of three variables with a two-dimensional plot by setting one of the variables as the CONTOUR variable. The variables X and Y appear on the axes, and Z is the contour variable)
options nodate pageno=1 linesize=64 pagesize=25;
data contours;
format Z 5.1;
do X=0 to 400 by 5;
do Y=0 to 350 by 10;
z=46.2+.09*x-.0005*x**2+.1*y-.0005*y**2+.0004*x*y;
output;
end;
end;
run;
proc print data=contours(obs=5) noobs;
title 'CONTOURS Data Set';
title2 'First 5 Observations Only';
run;
options nodate pageno=1 linesize=120 pagesize=60 noovp;
proc plot data=contours;
plot y*x=z / contour=10;
title 'A Contour Plot';
run;
PROC PRINT
Print a SAS data set
PROC PRINT DATA= SASdataset options;
options: UNIFORM LABEL SPLIT='char'
VAR variable(s);
BY variable(s);
SUM variable(s);
PROC SORT
Sort a SAS data set according to one or more variables.
PROC SORT DATA=SASdataset options;
options: OUT=
BY variable(s);
V. Using SAS Solutions and
Tools
SAS provides a set of ready-to-use solutions, applications, and tools
in its latest version of the software. You can access many of these tools by using the
Solutions menu. They are:
Analysis
Using the ANALYST application for statistics tasks
One-Way ANOVA
Linear Regression
Simple Statistics
Summary Statistics
Producing publication-quality scatterplots in Analyst:
Last updated: 01/18/06 by Patrick McLeod