Page One

Campus Computing News

Cross Platform Virus Detection Comes to UNT

ISEC at UNT

Microsoft Office 97 Training Opportunities

Open Transport: An Alternative to FreePPP for MacOS Users

The SPSS GLM Procedure, or What Happened to ANOVA?

The Network Connection

List of the Month

WWW@UNT.EDU

Short Courses

IRC News

Staff Activities

Shift Key

    

The SPSS GLM Procedure, or What Happened to ANOVA?

By Craig Henderson, Research and Statistical Support Services Consultant

A question that I am sure many of you have asked (at least I did) when navigating through the statistics menu of SPSS versions 7.5 or 8.0 is what happened to the ANOVA models procedure? If you haven't asked this question, you may have asked why SPSS keeps the regression procedure while incorporating the General Linear Model procedure (GLM). In this article I intend to discuss similarities between ANOVA models and regression, different methods of coding categorical variables for regression analysis, features of SPSS's GLM procedure, and advantages and disadvantages of using GLM over SPSS's regression procedure. Those familiar with SAS PROC GLM are already aware of the flexibility that a general linear modeling approach offers to data analysis. However, this is a procedure relatively new to SPSS, and therefore, SPSS users may not be as familiar with this approach.

Regression and ANOVA

Both regression and analysis of variance are applications of the general linear model. Using mathematical notation, Yi = a + b 1X1i + b 2X2i + b 3X3i + ... + b pXpi + e i where Yi represents the score of subject i on the dependent variable, a represents the mean of the population when the value of X is zero, Xs are values on the independent variables (i.e., they provide information on group membership), and the b s are effect parameters (regression coefficients) that indicate something regarding the relationship between a particular independent variable and the dependent variable, and e represents error. The fundamental operation in regression and ANOVA is to estimate the effect parameters, given information on the population mean, group membership, and scores on the dependent measure.

Coding Categorical Variables for Regression Analysis

Pedhazur (1997) and Kirk (1982) discuss procedures by which group membership can be coded and used as a categorical predictor in a regression analysis. By use of these methods, regardless of the procedure employed, regression with categorical predictors and ANOVA will yield identical results. Although the overall results using categorical predictors in a regression analysis will be the same, some intermediate results and their interpretations will differ due to the coding scheme used. The coding schemes used most commonly are dummy coding, effect coding, and orthogonal coding.

Basically, coding schemes involve creating column vectors that represent levels of your independent variable. Regardless of coding schemes, the number of column vectors created is one less than the number of levels in the independent variable. The way in which the coding schemes differ involves the values chosen to represent group membership. In dummy coding, groups are differentiated by ones and zeros. For example, individuals in a control group are coded as zeroes, and those in an experimental group coded as ones. If there are three groups, Group 1 will be coded as ones in one column vector and as zeros in the other. Group 2 will be coded in the opposite manner, and Group 3 will consist of zeros in both column vectors. With dummy coding, the intercept term is equal to the mean of the group coded as 0 in all column vectors; this is known as the reference category or comparison group. Each individual's predicted score is equal to the mean of his/her group. Regression coefficients represent deviations from the mean of the comparison group (Pedhazur, 1997).

Effect coding differs from dummy coding in that instead of being coded with zeros, one group will be coded with -1s in all column vectors. The number of column vectors created is still one less than the number of groups employed in the analysis. In each vector, one group will be coded with 1s, one with 0s and the last group with -1s. With effect coding, the intercept is equal to the grand mean of the dependent variable; each regression coefficient represent deviations from the grand mean, thereby reflecting a treatment effect (Pedhazur, 1997). In orthogonal coding, column vectors are coded in the same manner as is done in applying coefficients for orthogonal contrasts. For example, with three groups, the first column vector can be coded with 1s for one group, -1s for the second, and zeros for the third. The second column vector can then be coded with ones for the first two groups and -2s for the third (Pedhazur, 1997). Remember that in setting up orthogonal contrasts, coefficients must sum to zero. The advantage of orthogonal coding is that it yields results directly interpretable as tests of orthogonal contrasts. The column vectors are thus coded in a manner that reflects the research hypotheses, and the statistical tests of the regression coefficients can be interpreted as supporting or not supporting ones research hypotheses.

There are several reasons that one would want to employ regression with categorical predictors over ANOVA. These include the fact that regression provides a natural measure of effect size in the standard output, R2. Furthermore, it is a more flexible procedure in that both continuous and categorical independent variables can be used in the analysis. Third, the researcher can test competing hypotheses by entering independent variables in a sequential or hierarchical manner. Finally, unlike ANOVA, unbalanced designs are not problematic in regression.

The SPSS GLM Procedure

With numerous independent variables and numerous levels of these independent variables, the coding process for conducting regression analyses with categorical predictors can be fairly labor intensive. Hence enters the SPSS GLM procedure. The GLM procedure is a very flexible procedure that allows the researcher to test hypotheses suited for either regression or ANOVA. It provides several benefits of what was previously offered in the ANOVA procedure such as testing for interactions and post hoc power analyses; however, in addition, it provides some of the flexibility available in the regression procedure. The researcher can employ continuous and categorical independent variables, unbalanced designs are not problematic (with Type III sums of squares), and similar to hierarchical entry available in regression, the Type III sums of squares option tests the unique contribution of each independent variable (i.e., removing all effects of other independent variables included in the analysis). In addition to the convenience of not having to code your categorical independent variables, the GLM procedure also offers some benefits over the SPSS regression procedure. As previously mentioned, it allows the researcher to test the power of his/her analyses. In addition, main effects and interaction plots are available in this procedure. However, some of the options available from the regression procedure are not available in GLM. For instance, the diagnostic tests of residuals are much more comprehensive in the regression procedure. Furthermore, the regression procedure provides outlier and collinearity diagnostics. Finally, partial plots are only available from the regression procedure.

And the answer is . . .

In answer to my opening question, SPSS ANOVA has been subsumed under the more general procedure, GLM. I have outlined some of the main advantages that this new procedure offers. Although it is more flexible than what was previously available, some improvements to the GLM procedure will hopefully be available in future releases. For example, simple effects tests are difficult to run directly from the GLM menu. I hope this article has reached you in a timely manner. Happy researching!n


References

Kirk, R. E. (1982). Experimental design (2nd Ed.). Monterey, CA: Brooks/Cole Publishing Company.

Pedhazur, E. J. (1997). Multiple regression in behavioral research (3rd Ed.). Orlando, FL: Harcourt Brace College Publishers.