VIII. ANOVA and Linear Regression The following covers some of the common SAS procedures with which you can run some
intermediate level statistical
analyses. Use the Import Wizard to import the
Example Data 1 file using the SPSS File (*.sav) source option as
was done previously.
1. One-way ANOVA
Some sources will recommend use of PROC ANOVA for the one-way or single
factor analysis; however, PROC ANOVA assumes balanced cells (i.e. each group has
an equal number of cases). Given that we frequently do not have balanced cells,
use of PROC GLM is preferred. The current example compares different stimuli
conditions on ability to recall at time 1. In this example, our factor or
independent variable stimuli has three levels (or conditions); spoken, written,
and combined. Our dependent variable is the familiar recall at time 1 (recall1).
PROC GLM DATA=example1;
CLASS stimuli;
MODEL recall1 = stimuli;
MEANS stimuli;
RUN; We can run post-hoc tests (here with Tukey's version) by
adding additional operators to the means statement.
PROC GLM DATA=example1;
CLASS stimuli;
MODEL recall1 = stimuli;
MEANS stimuli / TUKEY;
RUN; Here we use the Tukey and the Ryan-Einot-Gabriel-Welsch
Multiple Range Test for our post hoc comparisons.
PROC GLM DATA=example1;
CLASS stimuli;
MODEL recall1 = stimuli;
MEANS stimuli / TUKEY REGWQ;
RUN; 2. Multi-way or Factorial ANOVA Here we are
looking for mean differences among two factors with six total conditions on
ability to recall at time 1 (recall1). The first factor, stimuli, has three
conditions and was described above. The second factor, candy; has two conditions
(Skittles & no candy).
PROC GLM DATA=example1;
CLASS candy stimuli;
MODEL recall1 = candy stimuli;
MEANS stimuli / REGWQ;
MEANS candy stimuli;
RUN; 3. One-way MANOVA Here, we are testing for
group differences among two dependent variables simultaneously using our
familiar three groups of stimuli. First, we run a PROC MEANS to take a look at
the descriptive statistics for each group across the two dependent variables.
Then, we run the actual MANOVA. PROC
MEANS N MEAN STD MIN MAX DATA=example1;
CLASS stimuli;
VAR recall1 recall2;
RUN;
PROC GLM DATA=example1;
CLASS stimuli;
MODEL recall1 recall2 = stimuli / SS3;
CONTRAST 'Printed vs Spoken&Printed and Spoken' stimuli 2 -1 -1;
CONTRAST 'Spoken vs Printed and Spoken' stimuli 0 1 -1;
MANOVA h=_all_;
RUN;
QUIT; Given that our two dependent variables above are really the
same variable measured at two points in time; it would be more appropriate to
run the Repeated Measures ANOVA.
PROC GLM DATA=example1;
CLASS stimuli;
MODEL recall1 recall2 = stimuli;
REPEATED TIME 2 (0 1) / SUMMARY;
RUN; 4. Factorial MANOVA Here, we are looking at
differences between stimuli groups, as well as candy groups, on recall at time 1
and age. To begin, we will take a look at some of the descriptive statistics of
our variables; then the correlation between our two dependent variables (age &
recall1); then run the GLM procedure.
PROC MEANS N MEAN STD MIN MAX
DATA=example1;
CLASS stimuli;
VAR age;
RUN;
PROC MEANS N MEAN STD MIN MAX DATA=example1;
CLASS candy;
VAR age;
RUN;
PROC MEANS N MEAN STD MIN MAX DATA=example1;
CLASS stimuli;
VAR recall1;
RUN;
PROC MEANS N MEAN STD MIN MAX DATA=example1;
CLASS candy;
VAR recall1;
RUN;
PROC MEANS N MEAN STD MIN MAX DATA=example1;
CLASS stimuli candy;
VAR age recall1;
RUN; PROC CORR DATA=example2;
VAR age recall1;
RUN; PROC GLM DATA=example1;
CLASS stimuli candy;
MODEL age recall1 = stimuli candy / SS3;
CONTRAST 'Printed vs Spoken&Printed and Spoken' stimuli 2 -1 -1;
CONTRAST 'Spoken vs Printed and Spoken' stimuli 0 1 -1;
MANOVA h=_all_ / SUMMARY PRINTE;
RUN; 5. Linear Regression. Use the Import Wizard to
import the 'regression_example_data.sav' file using the SPSS File (*.sav) source option and the member name 'red'. PROC PRINT
DATA=red;
RUN;
First, we'll do a simple
linear ordinary least squares (OLS) regression with two predictors (age &
recall1) and recall2 as our outcome variable.
PROC REG DATA=red;
MODEL apt = prison age peyrs;
RUN; SAS produces un-standardized regression coefficients by
default. If you also want SAS to produce the standardized coefficients
then you must include an STB (standardized beta) options statement directly
following the name of the last predictor; like the following example:
PROC REG DATA=red;
MODEL apt = prison age peyrs / STB;
RUN;
Next, we'll take a second look at the same regression model,
but have SAS create a graph of the residuals vs. the Cook's Distance.
PROC REG DATA=red;
MODEL apt = prison age peyrs;
OUTPUT OUT = T STUDENT = RES COOKD = COOKD;
RUN;
QUIT;
PROC GPLOT DATA = T;
PLOT res*cookd = 1 / vaxis=axis1;
RUN;
QUIT; Now, we'll review the residual values which is a three stage
process. We will first generate a new variable rabs containing the
absolute value of standardized residuals. Then we sort the data on rabs
in descending order. We then list the first 50 observations.
DATA T2;
SET T;
RABS = abs(res);
RUN;
PROC SORT DATA=T2;
BY DESCENDING rabs;
RUN;
PROC PRINT DATA=T2 (obs=50);
RUN; 6. Robust regression is done by Iterated Weighted Least Squares (IWLS).
The procedure for running robust regression is proc robustreg. There are
a couple of estimators for IWLS. We are going to use the Huber estimator in this
example. We can save the final weights created by the IWLS process. This can be
very useful. We will use the data set T2 generated above. It includes the
original data set and the diagnostic variables generated based on the OLS
regression model. *Note in the output the presence of the AIC & BIC for model
fit.
PROC ROBUSTREG DATA=T2 METHOD=m (wf=huber);
MODEL apt = prison age peyrs;
OUTPUT OUT = test1 weight=wgt;
RUN; Next, we'll take a look at the residuals of the robust
regression.
PROC SORT DATA=test1;
by wgt;
RUN;
PROC PRINT DATA=test1 (obs=50);
RUN; Now let's compare the results of a regular OLS regression and
a robust regression. If the results are very different, you will most likely
want to use the results from the robust regression.
ODS LISTING CLOSE;
PROC REG DATA=red;
MODEL apt = prison age peyrs;
ODS OUTPUT PARAMETERESTIMATES = a;
RUN;
QUIT;
PROC ROBUSTREG DATA=T2 METHOD=m (wf=huber);
MODEL apt = prison age peyrs;
ODS OUTPUT PARAMETERESTIMATES = b;
RUN;
QUIT;
ODS LISTING;
TITLE "OLS Regression";
PROC PRINT DATA=a;
TITLE "Robust Regression";
PROC PRINT DATA=b;
RUN; |