
SAS 9 Intermediate Workshop, Part I
Download | Exercise | Evaluation | SAS Corner
Updated: 03/10/2005
Author: Patrick McLeod

Objectives: This is the second part of the SAS short course series. It is designed for intermediate users who have taken the first class and want to advance their programming techniques in SAS. It focuses on the first building block of SAS programming: the DATA step. After this course, you should be able to:
1. Understanding SAS libraries and the data file system;
2. Create and manipulate SAS data sets;
3. Store SAS data sets permanently;
4. Import data into SAS;
5. Export SAS data sets to other formats;
6. Use SAS/Analyst to perform simple statistical analysis.
Topics:
SAS for Windows: The DATA Step
In this workshop, we focus on the first of the two major parts in a SAS program -- the DATA step. It covers data entry, reading raw data, data manipulation and management. We will follow up on the materials we have covered in the Introduction to SAS class and apply to more hands-on programming in this workshop. At the appendix, we provide a more detailed explanation on using the Display Manager System in SAS.
Before your data can be analyzed by SAS, it must be in a form that the SAS System can recognize, i.e., a SAS data set. It is contingent on the Data step that reads in raw data, manipulates variables and stores data in designated area(s).
| A DATA step consists of a group of statements that reads ASCII text data (in a computer file) or existing SAS data sets in order to create a new SAS data set. |
The DATA step must begin with the DATA statement and should end with a RUN statement.
Data sets can be created and stored in a permanent library. Otherwise, it will stay in a temporary library (by default, WORK) which lasts as long as the current SAS session, i.e. such data sets will be erased when you exit SAS. Data manipulation must be done in a DATA step and cannot be done in a PROC step.
Data set options specify actions that apply only to the SAS data set with which they appear. They let you perform such operations as:
- creating new variables out of existing variables or random functions
- renaming variables
- selecting only the first or last n observations for processing
- dropping variables from processing or output
- in the past, specifying a password for a SAS mainframe data set.
| A SAS data library is a collection of SAS files that are recognized as a unit by the SAS system. |
The three most common forms of general syntax for the DATA step are:
1.

2.

3.

The following examples illustrate how to implement these three data methods.
In example 1, a temporary data set is defined and an external raw data file is read in using INFILE. The INPUT specifies the variable names, locations and formats of the variables.
Example 1
data geol;
infile 'filename';
input state $ 1-3 county $ 5-12 sqmile 14-19
region 22-24 tract 27-29 code $ 32-33
rainfall 37-40 temp 43-46 temptype $ 49;
proc print data=geol;
run;
Example 2 illustrates the use of pre-stored data sets with the SET command. Note that the data set is a permanent one stored under the library SASWS1.
Example 2
libname sasws1 'c:\temp';
data geolnew;
set sasws1.geol
proc print data=sasws1.geol;
run;
The third example shows combining the current data sets geol
and district into one data set called combine. Is this data set permanent or temporary?
Example 3
data combine;
merge geol district;
run;
Manipulating and subsetting data sets
The following example applies the if/then statements to create variables based on certain conditions.
If-Then/Else Statements
Example 4
libname sasws1 'c:\temp';
DATA SASWS1.COUNTRY;
INFILE 'c:\temp\country.dat';
INPUT DEC 1 ID 2-4 NAME $CHAR26. SSCODE 31-33
CONTIN $ 34-35 DODEV 36 POPULATE 37-43
AREA 44-49 GNP 50-56 MILEXPED 57-64 .1
PEDEXPED 65-71 .1;
DATA temp;
SET sasws1.country;
IF GNP GE 20000 THEN GNPNEW = 'high';
ELSE IF 10000 < GNP < 20000 THEN GNPNEW='med';
ELSE GNPNEW='lo';
RUN;
PROC PRINT;
VAR GNP GNPNEW;
RUN;
This program also demonstrates the use of the IF-THEN/ELSE statements. The third statement creates a new variable named GNPNEW and assigns it the value "high" if the observation has a value greater than or equal to (GE) 20,000 for the variable named GNP. The next statement uses a compound inequality (i.e.,10000 < GNP < 20000 ) to assign the value med to GNPNEW if the observation's value for GNP is between 10000 and 20000. Finally, the fifth statement assigns the value lo to GNPNEW to all other observations -- that is, to all observations with a value for GNP that is missing or less than 10000.
Consider the following statements:
Example 5
data high medium low;
SET work.temp;
if GNPNEW= 'high' then output high;
else if GNPNEW= 'med' then output medium;
else output low;
run;
The IF-THEN statements draw subsets from the data set by the GNPNEW variable. Use the following statements, we can print the most recently created data set. What is it?
Example 6
PROC PRINT;
RUN;
If we just want to print the data set HIGH, we apply:
Example 7
PROC PRINT data=high;
RUN;
When using list input, SAS scans the input line for values instead of reading from specific columns. Features of list input include:
. Order of the variables in the INPUT statement and their corresponding values in the data must be the same; values cannot be selectively read with list input.
. Values must be separated by at least one blank.
. Missing values must be represented by periods, not blanks.
. Numerical values cannot contain embedded blanks.
. Character values longer than eight characters must use a format-modifier statement.
The syntax for list input is:
Example 8
INPUT variable [$] [&] ... ;
where:
variable is the variable name for the data value to be read.
$ indicates that the variable has character values.
& indicates that a character value may have one or more single embedded blanks.
An informat can be specified following a variable on the INPUT statement. The informat defines the variable's data type and field width, and how the values are to be read. An informat takes the form [$][name][w].[d], where $ indicates a character informat, w is the number of columns in the input data, and d gives the number of decimal places to be assigned to values without an explicit decimal point.
The syntax for using informats is:
Example 9
INPUT variable informat ... ;
where:
variable is the variable name for the data value to be read
informat gives the informat to use when reading the data value.
Data set options can be specified whenever a SAS data set is specified. Some options can be specified as statements in a DATA step. These same options can be used following a specified data set in a SET, MERGE, or PROC statement. In this case, the data set options must be enclosed in parentheses and must immediately follow the data set to which they apply. The following are some commonly used data set options:
DROP
= variables drops the listed variables from the data set being created.The MERGE statement joins corresponding observations from two or more SAS data sets into single observations in a new SAS data set. You can merge data sets with or without a BY statement. Without a BY statement, MERGE performs one-to-one merging by joining the first observation in one data set with the first observation in another, the second observation in one data set with the second observation in another, and so on. With a BY statement, MERGE performs match-merging by joining observations from two or more sorted data sets, based on the values of the common BY variables. The syntax for the MERGE statement is:
Example 10
MERGE datasets [(options)] ;
[BY variables ;]
where:
datasets are two or more existing SAS data sets.
[(options)] are data-set options, enclosed in parentheses.
[BY variables ;] are the matching variables for the BY statement.
Each data set must be sorted by these variables. SAS functions are routines that return values computed from one or more arguments; they are used to create new variables or modify existing ones. Functions are used in statements that have the syntax:
Example 11
variable = function(arguments) ;
where:
variable is the name of the variable being created or modified.
function is the name of the function you want to use.
arguments are one or more variable names, constants, or expressions.
Commonly Used Functions
MAX returns the largest of the argument values
MIN returns the smallest of the argument values
SQRT calculates square root of the argument value
ROUND rounds value to the nearest indicated round-off unit
LOG gives the natural log of the argument
MEAN returns the mean of the nonmissing argument values
SUM returns the sum of the nonmissing argument values
STD returns the standard deviation of the nonmissing values
DATE gives the current date as a SAS date value
Conditional IF statements, with a THEN clause, execute SAS statements for those observations that meet the condition defined in the IF clause. An optional ELSE statement executes alternative statements if the THEN clause is not executed. In the syntax of each IF statement:
expression is any valid SAS expression.
statement is any executable statement or DO group.
The expression can use the following comparison operators, as well as arithmetic
operators:
EQ equal to NE not equal to
GT greater than GE greater than or equal to
LT less than LE less than or equal to
Use the IF statement when you want to execute a SAS statement for some but not all of the observations in the data set being created. The expression following the IF is evaluated; if it is true, then the statement following the THEN is executed. Syntax:
IF expression THEN statement ;
Use the IF-THEN/ELSE statements when you want to conditionally process all the observations in the data set being created. When the expression following the IF is true, the statement following the THEN is executed and the statement following the ELSE is ignored. When the expression is false, the statement following the ELSE is executed and the statement following the THEN is ignored. Syntax:
IF expression THEN statement ;
ELSE [IF] statement ;
Use the subsetting IF statement to select only those observations from the input data set that meet the IF condition. Therefore, the resulting data set contains a subset of the original observations. Syntax:
IF expression ;
In this case, SAS interprets the lack of a then-statement to mean "then include this observation in the data set".
| Array is an alias used to represent a set of variables to be process in a like manner |
General form of the ARRAY statement:
ARRAY array-name{dimension} $ length elements (initial values);
example:
data quarter;
set sasdata.donate;
array contrib{4} qtr1-qtr4;
meancon=mean(of qtr1-qtr4);
array differ{3} diff1-diff3;
do i=1 to 3;
differ{i}=contrib{i+1}-contrib{i};
end;
meandiff=mean(of diff1-diff3) ;
drop i;
run;
proc print data=quarter;
run;
Alternate method of importing data into SAS
In SAS 9, supported data files formats include dBASE, Lotus 1-2-3 and MS Excel spreadsheets. Using the Import Wizard, the user will be guided to create a data set from files in these formats. To import files, click on File on the menu and select import as in the following:


The Import Wizard will guide you through the remainder of the importation process including saving SAS syntax based on your input to the Import Wizard.
SAS/Analyst application is a data analysis module designed to provide easy access to
basic statistical analyses. The graphical user interface includes a range of
analytical
and graphical tasks. You can compute descriptive statistics, perform simple hypothesis
tests, and fit models with analysis of variance and regression analysis. The application
also provides some tasks not covered by SAS procedures, such as sample size and power
computations. In addition, you can produce several types of graphs.



To access SAS/Analyst, choose Solution from the menu and select --> Analyze --> Analyst. This interface takes a task-oriented approach to produce analyses and associated graphics. You can compute descriptive statistics, perform simple hypothesis tests, fit statistical models with regression and analysis of variance, and perform survival analysis as well as some multivariate analyses.
Most of the tasks provide access to analyses performed by SAS/STAT software, but some provide analyses not currently available with SAS/STAT procedures, such as certain hypothesis tests and basic sample size and power computations. In addition, you can produce many types of graphs, including histograms, box-and- whisker plots, probability plots, contour plots, and surface plots.
The Analyst Application enables you to input data in many ways, including opening data from non-SAS sources such as Excel files, inputting SAS data sets, or manually entering the data yourself. The data are displayed in a data table in which columns correspond to variables and rows correspond to observations or cases. You can edit individual elements in the data table, and you can create new columns and rows. You can also perform a number of other data manipulations such as subsetting the data, performing transforms, recoding, and stacking and splitting columns.
Once the data are ready, you specify tasks from pull-down menus, a customizable toolbar, or an index of commonly used task descriptions. The analysis results and plots are presented in separate windows and managed by a tree-list structure called a project tree. The underlying SAS code used to produce the results is available as a node in the project tree, and results can also be displayed in HTML form and viewed with a web browser. You can save projects and then recall them for further work.
Elementary Statistics Procedures
PROC CORR
computes correlation coefficients between variables, including Pearson product-moment and weighted product-moment correlations.PROC FREQ
produces one-way to n-way frequency and crosstabulation tables.PROC MEANS
produces simple univariate descriptive statistics for numeric variables.PROC SUMMARY
computes descriptive statistics on numeric variables in a SAS data set and outputs the results to a new SAS data set.PROC TABULATE
constructs tables of descriptive statistics from compositions of classification variables, analysis variables, and statistics keywords.PROC UNIVARIATE
produces simple descriptive statistics for numeric variables.
Free on-line sources and tutorials: SAS Corner, Benchmarks February 2002 issue
The following gives a brief review on the Display Manager System in SAS.
By default, SAS 9 has six basic windows: Enhanced Program Editor, Explorer, Results, Program Editor, Log and Output windows. The following function key map helps navigate in the new SAS workspace and switching between windows.

On the command line, you can also type in the commands or the first four characters of each command to perform the task, e.g.:

The command KEYS displays the KEYS window, which lists the function keys and their definitions. This list can differ by system, administrator, and version of SAS. Page through this window, making note of the keys associated with functions you use often. For instance, notice the key defined as ``home''. This key (usually CONTROL-F or CONTROL-U) is a toggle, moving the cursor between the command line and its current position in the editor area. You can also assign your own definitions and save them across SAS sessions. For instance, you can save many keystrokes when you are debugging a program by defining a function key to execute the following sequence of DM (Display Manager) commands:
LOG;
CLEAR;
OUT;
CLEAR;
PGM;
To redefine a key permanently (omit step 3 to redefine temporarily):
(1) open the KEYS window;
(2) type your definition over the displayed definition (delete any extra characters);
(3) enter the command SAVE in the command box; and
(4) enter the command END to close the KEYS window.
You need not redefine a key for this tutorial. However, to simplify the learning process, we recommend that you clear the LOG and OUT windows after successfully completing each subsection in this tutorial. (Just remember to first use the FILE 'filename' command if you want to save the contents of a window to a file.) One way to simplify this cleanup process is to define a function key to issue the DM commands LOG; CLEAR; OUT; CLEAR; PGM.
To be continued in SAS Programming II:
Statistical Procedures
Visualization of Data
Statistical Models
SAS for UNIX*
Updated: 03/10/2005
Author: Patrick McLeod