
Link to the last RSS article here:
Part
II: From The Source - R 2.2.0 on OS-X Tiger 10.4.3 - Ed.
Canonical Correlation with SPSS
By Dr.
Mike Clark, Research and Statistical Support Services Consultant
Many in the social sciences often employ
multiple regression (MR) to solve the problem of how several variables
predict another variable. A linear combination of the independent
variables (IVs) is created that will have the minimum squared errors in
prediction. The square of that correlation between the linear
combination and the dependent variable (DV) is the amount of variance in
the dependent variable accounted for by the predictors.
Although it is easy to think of the independent variables as a set
that one believes has some relation to the dependent variable, many do
not as often think of a set of dependent variables that
one wishes to predict. Canonical correlation analyzes the
relationship between sets of variables, with one set of variables
typically seen as the independent set and another as the dependent set,
though the causal arrow is not necessarily specified. In a sense
it can be thought of multivariate regression though multiple
regression is actually a special case of canonical correlation.
To begin with, it helps to visualize what we’re about to do.
The figure below gives us an idea of what is going to happen.

Just like in MR we want to create linear combinations of the set of
IVs (X1-X3). However, now we have a set of DVs and will want to
create a linear combination of those also (Y1-Y3). Canonical
correlation analysis will create linear combinations (variates, X* and
Y* above) of the two sets that will have maximum correlation with one
another.
The advantage that canonical correlation has over typical MR is that
it can take into account the complex nature of data: we don’t have to
restrict ourselves to one DV, and it also allows for the possibility
that the two sets of variables have a relationship along more than one
dimension. In other words we may find that there are other linear
combinations of the two sets of variables such that would result in the
variates having a sizable (though lesser) correlation that also would be
of practical significance. In a given analysis you will be
provided with X number of canonical correlations equal to the number of
variables in the smaller set.
The mechanics of canonical correlation are covered in many
multivariate texts (see references below for some examples). Our
focus here will regard its utilization in SPSS. To begin with, the
menu system will not be able to assist us this time. The macro
involved must be called via syntax, however, there isn’t much to it.
Once we specify the macro to be used (it is available in the SPSS
folder), we then just note which variables go with each set (one can
think of set 1 as the IVs). The general format is as follows:
---
include file
'c:\Program Files\SPSS\Canonical correlation.sps'.
cancorr
set1= /
set2= .
---
The example provided here regards the association between a set of
job characteristics and measures of employee satisfaction. The raw data
can be found by following the SAS example link below.
Three variables associated with job characteristics are:
task variety: degree of variety involved in tasks, expressed
as a percent
feedback: degree of feedback required in job tasks,
expressed as a percent
autonomy: degree of autonomy required in job tasks,
expressed as a percent
Three variables associated with job satisfaction are:
career track satisfaction: employee satisfaction with career
direction and the possibility of future advancement, expressed as a
percent
management and supervisor satisfaction: employee
satisfaction with supervisor's communication and management style,
expressed as a percent
financial satisfaction: employee satisfaction with salary
and other benefits, using a scale measurement from 1 to 10
(1=unsatisfied, 10=satisfied)
So our syntax will look something like:
include file 'c:\Program
Files\SPSS\Canonical correlation.sps'.
cancorr
set1= Variety Feedback Autonomy/
set2= Career Supervisor Financial.
Unfortunately our output in SPSS is not in the familiar neat table
form but rather regular text format. As such I often paste it into
MS Word to make it a little easier to move around in. So what are
we looking at?
Correlation: we get the correlation coefficients for
items within each set, and also the correlations among all the variables
involved.
Canonical Correlation: depending on the number of
variables involved, we will see two or more canonical correlations
between the variates created for each set.
Significance test: Bartlett’s chi-square based on Wilks’
lambda. Note that these tests are not respective of each canonical
correlation, but instead regard all the canonical correlations, minus
any previous larger ones, at the same time. Essentially it is a
test of whether the eigenvalues are greater than zero. However
also be aware that like regular correlation coefficients, we are
typically more interested in the size of the correlation than
statistical significance. Here it looks like the first solution is
both very large and statistically significant (R = .92, p = .02).
Coefficients: Standardized and raw coefficients
used to create the linear combinations. The true ‘raw’
coefficients, the eigenvectors, are not provided.
Loadings: these are the structure coefficients (be sure
when seeing the term ‘loading’ it is clear what coefficients are being
interpreted). They are the correlation between the variables in
the set and the variate created from linear combination. Here you
have regular and cross loadings (loadings regarding the other variate).
Our largest loadings for this correlation for job satisfaction are
autonomy and feedback, and for job satisfaction are career track
satisfaction and management/supervisor satisfaction. Note that
financial considerations do not seem as important in the relationship of
these sets of variables.
Redundancy: each set gets a pair of output with regard
to ‘redundancy’. The proportion of variance of each set explained by its
own canonical variate will add to 1. In other words, the entire
canonical solution extracts all the variance seen with respect to the
variables involved. Regarding the individual values (i.e. ‘Explained by
its own Can. Var.’), these are adequacy coefficients, or average squared
loading for that particular variate on that dimension (e.g. with set 1,
squaring and averaging the loadings = .446). The amount explained
by the opposite variate is the redundancy, which can be seen in some
sense as a measure of predictive validity. However, some caution
should be exercised regarding its interpretation as it has limited
utility within the canonical correlation framework. Canonical
correlation does not try to maximize this value, but instead the
correlation among the variates. If one is more interested in
redundancy, one should instead perform ‘redundancy analysis’, which
searches for linear combinations of variables in one group that
maximizes the variance of the other group that is explained by the
linear combination. Such a procedure is available in SAS and R.
See the Thompson references for more on this matter.
So there you have a basic introduction to canonical correlation; one
can find the procedures in other packages below. The analysis is
often thought of as exploratory, but if your hypotheses regard sets of
continuous variables, canonical correlation may be a more suitable
alternative to running a multiple regression for each DV under
consideration, and so well worth utilizing.
Other packages
References
- Lattill, Carroll, & Green (2003). Analyzing Multivariate Data.
- Tastuoka (1971). Multivariate analysis
- Thompson, B. (1984). Canonical Correlation Analysis.
- Thompson (1991). A primer on the logic and use of canonical
correlation analysis.
Return to top
|