1 article(s) will be saved.

To continue, in Internet Explorer, select FILE then SAVE AS from your browser's toolbar above. Be sure to save as a plain text file (.txt) or a 'Web Page, HTML only' file (.html).

In Netscape, select FILE then SAVE AS from your browser's toolbar above.


Record: 1
Title:A primer on the logic and use of canonical correlation analysis.
Authors:Thompson, Bruce
Source:Measurement & Evaluation in Counseling & Development; Jul91, Vol. 24 Issue 2, p80, 16p, 5 charts, 1 diagram
Document Type:Article
Subject Terms:CANONICAL correlation (Statistics)
Abstract:Explains the basic logic of canonical analysis, and illustrates that canonical analysis is a general parametric analytic method subsuming other methods and offers guidelines regarding the correct use of the analytic approach. Basic logic of canonical calculations; Brief information about the canonical correlation analysis; Guidelines for interpreting canonical results.
Full Text Word Count:6358
ISSN:0748-1756
Accession Number:9705181341
Database: Psychology and Behavioral Sciences Collection
Notes:This title is available at the UNT Library

Section: METHODS, PLAINLY SPEAKING
A PRIMER ON THE LOGIC AND USE OF CANONICAL CORRELATION ANALYSIS

This article (a) explains the basic logic of canonical analysis; (b) illustrates that canonical analysis is a general parametric analytic method subsuming other methods: and (c) offers some guidelines regarding the correct use of this analytic approach.

Hinkle, Wiersma, and Jurs (1979,p. 415) noted that "it is becoming increasingly important for behavioral scientists to understand multivariate procedures even if they do not use them in their own research." And recent empirical studies of research practice confirm that multivariate methods are employed with some regularity in behavioral research (Elmore & Woehlke, 1988).

There are two reasons why multivariate methods are so important, as noted by Fish (1988). First, multivariate methods limit the inflation of Type I "experimentwise" error rates. Most researchers are familiar with "testwise" alpha, which refers to the probability of making a Type I error on a given hypothesis test. "Experimentwise" error rate refers to the probability of having made a Type I error anywhere within the study. For example, if a researcher conducts a balanced three-way, factorial ANOVA, testing each of the three main effects, the three two-way interaction effects, and the single three-way interaction effect at the testwise .05 alpha level, the experiment wise error rate for the study will be: alphaTW = 1 - (1- .05)7 = 30.2%. The same difficulty can occur when multiple dependent variables are tested in a given study. The problem is that the researcher will know that an "experimentwise" error is likely, but will not know which of the statistically significant results are errors and which are not.

But an even more important reason to use multivariate methods is that multivariate methods best honor the reality to which the researcher is purportedly trying to generalize. Most researchers live in a reality "in which the researcher cares about multiple outcomes, in which most outcomes have multiple causes, and in which most causes have multiple effects" (Thompson, 1986, p. 9). We must use analytic models that honor our view of reality, or else we will arrive at interpretations that actually distort reality (Eason, 991; Tatsuoka, 1973, p. 273).

Just as independent variables can interact to change results in ways that would go unnoticed if these interactions were not analyzed (Benson, 1991), so too dependent variables can interact with each other to create effects that would go unnoticed, absent a multivariate analysis. Only multivariate analyses simultaneously consider the full network of variable relationships, and honor a reality in which all the variables can and often do simultaneously interact and influence each other. Thus, multivariate analyses can yield results that would remain undetected if univariate analyses (e.g., ANOVA, regression) were employed, as both Fish (1988) and Maxwell (in press) demonstrate using actual examples.

Canonical correlation analysis is a multivariate analytic method that subsumes other parametric methods (e.g., t-tests, ANOVA, ANCOVA, regression, discriminant analysis, MANOVA) as special cases (Knapp, 1978). Some researchers have found canonical analysis to be useful. For example, Wood and Erskine (1976) identified more than 30 published applications of these methods. More recently, Thompson (1989a) cited roughly 100 canonical applications reported during the last decade. The purposes of the current article are (a) to explain the basic logic of canonical analysis in a concrete and accessible fashion; (b) to illustrate that canonical analysis is a general parametric analytic method subsuming other methods; and (c) to offer some guidance regarding the correct use of this analytic approach.

THE BASIC LOGIC OF CANONICAL CALCULATIONS

Thompson (1984) noted that canonical correlation can be presented in bivariate terms. This conceptualization is appealing, because most researchers feel very comfortable thinking in terms of the familiar bivariate correlation coefficient. Table 1 presents a small data set that will be employed to illustrate the basic logic of canonical correlation analysis (CCA). Appendix A presents the SAS computer program used to analyze the data; readers may find it useful to replicate these analyses and to examine other results reported in the output but not presented here, given space limitations.

The 12 cases of scores on each of two sets of scales ("CHA6" to "OTH2") were randomly sampled from a data base generated in one of the "Heart Smart" studies, an offshoot of the Bogalusa Heart Study longitudinal examination of the origins of cardiovascular disease during childhood. The first set of scores involves actual values for these subjects on three scales, each with six items, measuring children's perceptions of the sources of their health:

  1. Chance, i.e., random uncontrollable external factors ("CHA6");
  2. Internal, i.e., decisions or actions within one's own control ("INT6");
  3. Powerful Others, i.e., external factors under the direct control of others, such as nurses or doctors ("OTH6").

The second set of scores ("CHA2" to "INT2") involved responses on six items (two per scale) from a different source, but purportedly measuring the same three constructs. The example is elaborated by Thompson, Webber, and Berenson (1988), who presented one of the several related analyses conducted with the full database.

Thus, the small heuristic Table 1 data set involves a concurrent validity context. A CCA invoked in an analytically similar measurement context but with a data set with a realistic sample size and different variables is presented by Sexton, McLean, Boyd, Thompson, and McCormick (1988). Of course, CCA can be useful in addressing either substantive or measurement issues, but the latter context is perhaps more relevant to the focus of this journal.

Various analytic methods yield weights that are applied to variables to optimize some condition--such weights include beta weights, factor pattern coefficients, and discriminant function coefficients. These weights are all equivalent (e.g., Thompson & Borrello, 1985; Thompson, 1988), at least after a transformation in metric, but in canonical correlation analysis the weights are usually labeled standardized canonical function coefficients. It is difficult to fathom why the equivalent weights used in the various parametric methods are given different names, since the primary result is confusion and the illusion that parametric methods are different. The CCA function coefficients are applied to each individual's standardized data to yield the synthetic variables that are the basis for canonical analysis.

In regression only one set of weights is produced, but in canonical analysis several sets of weights and of the resulting synthetic variables can be created. These canonical functions are related to principal components, are uncorrelated or orthogonal, and can be rotated in various ways [Thompson, 1984; Thorndike, 1976). The number of functions that can be computed in a canonical analysis equals the number of variables in the smaller of the two variable sets. In the present example, since both sets of variables consisted of three variables, three canonical functions were extracted. Some of the computations used in this extraction are explained elsewhere by Thompson (1984,pp. 11-14) and are illustrated in the computer program, CANBAK (Thompson, 1982).

Table 2 depicts the computation of the synthetic variables scores actually correlated in CCA. The computations for Function I are presented here; readers may wish to themselves compute the synthetic scores for Functions 11 and 111. The weights for the criterion variables on Function I were: (a) .6717, CHA6; (b) .3570, INTO; (c) .42 14, OTH6. Thus, the weighted and aggregated criterion Z-scores of subject 1 yield a synthetic criterion score for this subject of .81896((.6717 x .5654) + (.3570 x -.2868) + (.4214 x 1.2852) = .3798 - .1024 + .5416). The weights for the predictor variables on Function I were: (a) .4494, CHA2; (b) .7200, INT2; (c) .2228, OTH2. Thus, the weighted and aggregated predictor Z-scores of subject 1 yield a synthetic criterion score for this subject of .70814((.4494 x -.3317) + (.7200 x .9202) + (.2228 x .8738) = - .1491 + .6625 + .1947).

The bivariate correlation between the synthetic scores on Function I is nothing more (or less) than the canonical correlation coefficient (Rc). Thus, for Function 1, Rc = .932195 = rCRIT1xPRED1. This is graphically illustrated in Figure 1. The synthetic variables are themselves Z-scores, the a intercept for the regression line is at the 0,0 coordinate, and the slope of the regression line is also Rc. Similarly, the bivariate correlation between the two sets of synthetic scores on Function II is the Rc for that function. The canonical functions coefficients are specifically computed to optimize the calculated relationships between the synthetic variables on each functions.

Table 3 presents most of the results for the full canonical analysis. The structure coefficients presented in the table have the same meaning in a canonical analysis as in other analyses, e.g., structure coefficients are always bivariate correlation coefficients between observed variable scores (e.g., "CHA6", "OTH6") and a synthetic variable (e.g., "CRIT1") created using weights. For example, if regression predictors are multiplied by regression weights and the products are summed for each individual, the correlation between scores on a given observed predictor and the synthetic variables scores (Y) is the structure coefficient for that predictor. Similarly, in the canonical case a structure coefficient on a given function is the bivariate correlation between a given criterion or predictor variable and the synthetic variable involving the variable set to which the variable belongs. For example, since "ZCHA6" was a criterion variable in the Table 2 example, the correlation (+.8098) between "ZCHA6" and "CRIT1" is the structure coefficient for "ZCHA6" on Function I.

In terms of actual contemporary analytic practice, Eason, Daniel, and Thompson (1990) found that in about one-third of the published canonical studies researchers only report and interpret function coefficients. But structure coefficients are vitally important in interpreting results in other analytic cases, such as factor analysis (Gorsuch, 1983, p. 207) and multiple regression analysis (Cooley & Lohnes, 1971, pp. 54-55; Thompson & Borrello, 1985). Similarly, with respect to CCA it is important not to interpret results based solely on function coefficients (Kerlinger & Pedhazur, 1973, p. 344; Levine, 1977, p. 20; Meredith, 1964, p. 55), though Harris (1989) may disagree. The structure and function coefficients for a variable set will be equal only if the variables in a set are all exactly uncorrelated with each other (Thompson, 1984, pp. 22-23), as would be the case, for example, if the variables in a set consisted of scores on orthogonally rotated principal components.

It would be dangerous to conclude that consulting either function or structure coefficients will always yield the same interpretations for a given data set. For example, Sexton, McLean, Boyd, Thompson and McCormick (1988) presented a canonical analysis in which one variable had a function coefficient of +0.02 on Function I, but the same variable had a structure coefficient of +0.89 on the same function. It is important to know when either set of coefficients suggests that a variable may be noteworthy.

CANONICAL CORRELATION ANALYSIS (CCA)

Long ago, Cohen (1968,p. 426) noted that ANOVA and ANCOVA are special cases of multiple regression analysis, and argued that in this realization "lie possibilities for more relevant and therefore more powerful exploitation of research data." However, Knapp (1978) offered mathematical proofs that CCA subsumes parametric methods, including both univariate and multivariate analyses. This realization is a basis for understanding how parametric methods are interrelated, which students often find to be helpful.

Three important insights can be gained from this perspective. All classical parametric methods (t-tests, ANOVA, MANOVA, etc.) are procedures that either implicitly or explicitly (a) use least squares weights, (b) focus on synthetic variables, and (c) yield effect sizes analogous to r2. Put differently, all classical analytic methods are correlational. As Keppel and Zedeck (1989) repeatedly emphasized, the power to make causal inferences inures to design features and not the analytic method selected, since conventional parametric analyses are all correlational.

It is beyond the scope of the present treatment to explore all the possible relationships among analytic techniques. Knapp (1978) offered the mathematical proofs and additional concrete illustrations are offered elsewhere (Thompson, 1988). However, a brief exploration of a couple of linkages may be useful to the reader. The Appendix A SAS program can be run using the Table 1 data to yield additional insights.

The linkage of CCA and multiple regression analysis is particularly easy to see, since both procedures are happily explicitly named correlational procedures. Suppose that the researcher wanted to predict "INT6" with "CHA2," "INT2" and "OTH2," and did so using both regression and canonical correlation procedures. When the Appendix A SAS program file was applied to the Table 1 data to yield these analyses, PROC REG computed the squared multiple correlation coefficient to be .4016 (F = 1.789, df = 3/8, p = .2269); PROC CANCORR computed the squared canonical correlation coefficient to be .401566 (F = 1.7894, df = 3/8, p = .2269). These results differ only as to the arbitrary number of digits used to report the identical results.

The relationships between the beta weights produced by PROC REG and the function coefficients produced by PROC CANCORR are a bit harder to see. These results are presented in Table 4. The table also illustrates that weights are related, though they are standardized using a different metric. Thompson and Borrello (1985) provide more detail.

The linkages between CCA and factorial ANOVA illustrate how CCA subsumes OVA methods (e.g., ANOVA, ANCOVA, MANOVA, MANCOVA) generally. For the 3 x 2 factorial ANOVA involving the IQ and experimental group assignment data presented in Table 1, PROC ANOVA yielded the following results for the three omnibus hypotheses: (1) IQ, F = 3.90; (2) experimental assignment, F = 1.85; (3) two-way interaction, F = 1.08. The Appendix A program was used to test four related canonical models, and the lambdas calculated from PROC CANCORR were then expressed as Fs, using the process summarized in Table 5.

These illustrative results correctly indicate that you can do regression with CCA, though you cannot do CCA with regression. You can do factorial ANOVA with CCA, though you cannot do CCA with ANOVA. The same relationship holds with other parametric methods (e.g., t-tests, ANCOVA, MANOVA). In short, CCA is a general parametric method subsuming other parametric methods as special cases.

GUIDELINES FOR INTERPRETING CANONICAL RESULTS

Canonical correlation analysis is a potent analytic method. It is especially useful when one has two sets of variables, each consisting of at least two variables. When the variables are intervally scaled, CCA does not require the researcher to convert some variables to nominal scale in order to conduct an OVA method (Thompson, 1991). But the difficulty of interpreting canonical results can challenge even the most seasoned analyst. As Thompson (1980,pp. 1, 16-17) noted,

one reason why the technique is [somewhat] rarely used involves the difficulties which can be encountered in trying to interpret canonical results... The neophyte student of canonical correlation analysis may be overwhelmed by the myriad coefficients which the procedure produces... [But] canonical correlation analysis produces results which can be theoretically rich, and if properly implemented, the procedure can adequately capture some of the complex dynamics involved in educational reality.

CCA is only as complex as reality itself. Nevertheless, some general guidelines for interpreting canonical results may be useful. Five such guidelines will be offered.

First, use both Rc2 values and statistical significance test results to decide which canonical functions to interpret. Significance testing has limited use in behavioral science, and is often only a test of whether the researcher has a large sample, and even the most ill-informed researcher knows this prior to running the test (Thompson, 1987, 1989c, 1991). Furthermore, there are special difficulties in testing all but the last Rc in CCA. Strictly speaking, the tests presented in the statistics packages are not tests of the significance of single functions (Thompson, 1984, pp. 19-20). For example, for the Table 1 data the first F (6.6738) from the SAS PROC CANCORR printout is a test involving the complete set of three Rc's, and not a test that the first Rc (.932195) is zero. Finally, CCA statistical significance tests do require the researcher to evaluate the multivariate normality of the data, and this distributional assumption cannot always be met. Thompson (1990b) describes a computer program that can be employed to evaluate this assumption.

Second, interpret both the function and the structure coefficients on functions that are deemed noteworthy. The reasons for this recommendation, suggested by the various researchers noted previously, primarily involve the fact that structure coefficients have special use in revealing the meaning of the synthetic variables actually being correlated in CCA, as Thompson (1990c) explained.

Third, do not interpret redundancy coefficients (Rd), except in the few concurrent validity applications in which both variables sets consist of the same variables. As explained in the Table 3 notes, an adequacy coefficient equals the mean of the squared structure coefficients for one variable set on one function. What is called a redundancy coefficient for a given variable set on a given function equals the adequacy coefficient for the set times the squared Rc for the function.

As Cramer and Nicewander (1979) proved in detail, redundancy coefficients are not truly multivariate (see also Thompson, 1988). This is very disturbing, because the main argument in favor of multivariate methods (for both substantive and statistical reasons) is that these methods simultaneously consider all relationships during the analysis (Fish, 1988; Thompson, 1986)! Furthermore, it is contradictory to routinely employ an analysis (CCA) that uses functions coefficients to optimize Rc, and then to interpret results (Rd) not optimized as part of the analysis, (e.g., redundancy coefficients).

The redundancy coefficient can only equal 1 when the synthetic variables for the function represent all the variance of every variable in the set, and the squared Rc also exactly equals 1. Thus, redundancy coefficients are useful only to test outcomes that rarely occur and which are generally not unexpected (Thompson, 1980, p. 16; Thompson, 1984). These coefficients are useful only when g functions (like g factors) are expected (cf. Sexton et al., 1988).

Fourth, consult communality coefficients to determine which variables are not contributing at all to the CCA solution. The communality coefficient for a variable equals the sum of the variable's squared structure coefficients across all functions. It may be useful to consider why variables with small communality coefficients did not contribute to obtained results. It may even be useful to omit these variables from the analysis (Thompson, 1984, pp. 47-51).

Fifth, use statistical or (better yet) empirical methods to evaluate the generalizability of the results in hand. The business of science is formulating generalizable insight. No one study, taken singly, establishes the basis for such insight. As Neale and Liebert (1986,p. 290) observed:

No one study, however shrewdly designed and carefully executed, can provide convincing support for a causal hypothesis or theoretical statement. . . Too many possible (if not plausible) confounds, limitations on generality, and alternative interpretations can be offered for any one observation. Moreover, each of the basic methods of research (experimental, correlational, and case study) and techniques of comparison (within- or between-subjects) has intrinsic limitations. How, then, does social science theory advance through research? The answer is, by collecting a diverse body of evidence about any major theoretical proposition.

Evaluating the generalizability of canonical results to other samples of subjects or of variables is a difficult task, but a task that the serious scholar can ill-afford to shirk. It must be emphasized that statistical significance testing does not inform the researcher regarding the likelihood that CCA Rc2 (i.e., effect sizes) or other coefficients (e.g., function or structure coefficients) will be replicable in future research (Carver, 1978).

With respect to the replicability of CCA effect sizes, these estimates appear to be reasonably stable if the researcher uses at least 5 to 10 subjects per variable (Thompson, 1990a). Furthermore, several statistical corrections of the effect sizes can be invoked. One might employ Wherry's (1931) correction formula to Rc2, as suggested by Cliff (1987,p. 446). But as incisively implied by Stevens (1986,pp. 78-84) with respect to the related regression case, the correction suggested by Herzberg (1969) may be especially useful, though it is more conservative. For example, for the Function I results reported in Table 3, the Wherry correction can be evaluated as:

Rc2 - ((1 - Rc2) * (VTot/(NTot - VTot - 1)))
.869 - ((1 - .869) * (6/(12 - 6 - 1)))
.869 - (.131 * (6/5))
.869 - (.131 * 1.2)
.869 - .1572 = .7118.

Efforts to estimate the sampling specificity of coefficients for specific variables are more difficult, or at least more tedious. CCA function and structure coefficients appear to be less stable than CCA omnibus effect sizes (Rc2's), though both appear to be equally unstable (Thompson, 1989b). Thus, it is especially important to evaluate the generalizability of these coefficients.

Some researchers randomly split their sample data, conduct separate analyses for the two subgroups, and then subjectively compare the results to determine if they appear to be similar. Two points need to be emphasized about such an approach. Such procedures almost always overestimate the invariance or generalizability of results, as Thompson (1984,p. 46) explains. Also, it is emphasized that inferences regarding replicability must be made empirically rather than subjectively, i.e., not by visually comparing coefficients across two randomly identified sample subgroups. Subjective comparisons will not do, because the functions in the two solutions may not occupy a common factor space. Functions that appear to be quite different may in fact yield quite similar synthetic variable scores--apparent differences in functions yielding comparable values for the synthetic variables actually related in canonical analysis are not very noteworthy (Thompson, 1989c). Cliff (1987,pp. 177-178) suggested that such cases involve "insensitivity" of the weights to departures from least squares constraints.

Empirical methods for evaluating the generalizability of CCA coefficients are explored by Thompson (1984,pp. 41-47; 1990c). A sophisticated logic called the "bootstrap," popularized by Efron and more recently by Lunneborg (e.g., 1987), may be especially useful (Thompson & Daniel, 1991). The bottom line is that in all studies, CCA or not, results from a single study must be interpreted with some caution.

An abridged illustrative interpretation of the Table 3 results using some of these five guidelines may be useful. Since the data involve a concurrent validity study, one would expect large effect sizes. Functions I and II both yield large Rc2 values, 86.9% and 82.5%, respectively. However, the coefficients for the variables are not sensible. On Function I, "INT2" has a large function coefficient (.7200) and shares considerable variance with the synthetic variable. "PRED1," i.e., rs2 = 82.5%. The function (.6717 and .4214) and squared structure (rs2's = 65.6% and 50.0%) suggest that Function I primarily involves "CHA6" and "OTH6" from the criterion variable set. One plausible expectation might have been that "INT2" and "INT6" would be related with the same function, but this expectation is not supported by the Function I results. Similarly, Function II primarily involves the relationship between "OTH2" (rs2 = 69.6%) and an aggregate primarily involving the variable's ability to predict the same two criterion variables, i.e., "CHA6" and "OTH6."

Of course, the heuristic analysis involved only two subjects per variable. Furthermore, scores on the two-item predictors variables were doubtless very unreliable. In the bivariate case, a correlation coefficient cannot exceed the square root of the product of the two variables' reliability coefficients. Since all parametric methods are correlational and involve effect sizes analogous to r2, it should be clear that measurement error attenuates effect sizes in all analyses.

SUMMARY

As Stevens (1986,p. 373, emphasis omitted) noted, CCA

is appropriate if the wish is to parsimoniously describe the number and nature of mutually independent relationships between the two [variable] sets... since the [function] combinations are uncorrelated, we will obtain a very nice additive partitioning of the total between association.

The current article has explained the basic logic of canonical correlation analysis. It was noted that all parametric analytic methods are correlational, and that all parametric tests can be conducted using canonical analysis, since canonical analysis subsumes parametric methods as special cases. Canonical analysis is potent be cause it does not require the researcher to discard variance of any of the variables, and because the analysis honors the complexity of a reality in which variables interact simultaneously.

TABLE 1 Random Sample (n = 12) of Health Locus of Control Data With Hypothetical IQ and Experimental Group Assignments
Legend for Table:

A - CHA6
B - INT6
C - OTH6
D - CHA2
E - INT2
F - OTH2
G - IQ
H - IQGRP
I - EXPERGRP
J - CIQGRP1
K - CIQGRP2
L - CEXGRP1
M - CIQBYEX1
N - CIQBYEX2

ID   A   B   C   D  E  F    G  H  I   J   K   L   M   N

 1   20  17  19  7  7  7   68  1  1  -1  -1   1  -1  -1
 2   21  20  15  8  7  5   69  1  1  -1  -1   1  -1  -1
 3   17  15  20  7  6  7   50  1  2  -1  -1  -1   1   1
-4   16  14  13  8  5  5   85  1  2  -1  -1  -1   1   1
 5   20  20  15  8  7  5   90  2  1   0   2   1   0   2
 6   14  21  15  7  5  7  109  2  1   0   2   1   0   2
 7   14  19  14  7  5  6  102  2  2   0   2  -1   0  -2
 8   14  23  10  6  6  6  108  2  2   0   2  -1   0  -2
 9   21  14  12  8  5  2  111  3  1   1  -1   1   1  -1
10   19  12  10  7  6  4  140  3  1   1  -1   1   1  -1
11   24  24  19  8  8  8  120  3  2   1  -1  -1  -1  -1
12   18  18   9  6  6  5  183  3  2   1  -1  -1  -1   1
TABLE 2 Variables in Z-score Form and Synthetic composite Scores on Function I
OBS    ZCHA6     ZINT6     ZOTH6     ZCHA2     ZINT2     ZOTH2

 1     .5654    -.2868    1.2852    -.3317     .9202     .8738
 2     .8738     .5075     .2029     .9950     .9202    -.3538
 3    -.3598    -.8164    1.5558    -.3317    -.0837     .8738
 4    -.6682   -1.0811    -.3382     .9950   -1.0875    -.3598
 5     .5654     .5075     .2029     .9950     .9202    -.3598
 6   -1.2849     .7722     .2029    -.3317   -1.0875     .8738
 7   -1.2849     .2427    -.0676    -.3317   -1.0875     .2570
 8   -1.2849    1.3018   -1.1499   -1.6583    -.0837     .2570
 9     .8738   -1.0811    -.6088     .9950   -1.0875   -2.2101
10     .2570   -.16107   -1.1499    -.3317    -.0837    -.9766
11    1.7989    1.5665    1.2852     .9950    1.9240    1.4905
12    -.0514    -.0021   -1.4205   -1.6583    -.0837    -.3598

          CRIT1     PRED1

OBS

 1       .81896    .70814
 2       .85358   1.02950
 3       .12251   -.01460
 4      -.97730   -.41598
 5       .64644   1.02950
 6      -.50189   -.73735
 7      -.80495   -.87476
 8      -.88295   -.74822
 9      -.05561   -.82823
10      -.88697   -.42685
11      2.30918   2.16449
12      -.64101   -.88563
TABLE 3 Canonical Solution for the Table 1 Data
Legend for Table:

A - Func.
B - Str.
C - Str. 2

                    Function I                Function II
Variable/
Coef.          A        B      C            A        B       C

CHA6         .6717   .8098   65.6%       -.8174   -.5633   31.7%
INT6         .3570   .4428   19.6%        .2433    .3901   15.2%
OTH6         .4214   .7071   50.0%        .7837    .5673   32.2%
Adequacy                     45.1%[b]                      26.4%
Rd                           39.2%[c]                      21.8%
Rc2                    86.9%                         82.5%
Rd                           38.3%                         20.7%
Adequacy                     44.0%                         25.2%
CHA2         .4494   .5564   31.0%        .1669   -.2017    4.1%
INT2         .7200   .9082   82.5%       -.6422   -.1324    1.8%
OTH2         .2228   .4314   18.6%       1.1368    .8345   69.6%

                         Function III
Variable/
Coef.               A         B        C     h2

CHA6             -.0266     .1643     2.7%   100.0%[a]
INT6             -.9253    -.8073    65.2%   100.0%
OTH6              .6099     .4220    17.8%   100.0%
Adequacy                             28.6%
Rd                                    4.6%
Rc2                            16.2%
Rd                                    5.0%
Adequacy                             30.8%
CHA2              .9721     .8061    65.0%   100.0%
INT2             -.6576    -.3971    15.8%   100.0%
OTH2              .1307     .3427    11.7%   100.0%

[a] Canonical communality (h2) coefficients are directly
analogous to the factor analytic coefficients of the same name,
and indicative how much of the variance of an observed variable
is contained within the set of synthetic variables. For example,
the communality coefficient for "CHA6" equals 65.6% + 31.7%
+ 2.7%.

[b] An adequacy coefficient indicates how adequately the
synthetic scores on a function do at reproducing the variance
in a set on Function I equals (65.6% + 19.6% + 50.0%) / 3 = 135.2
/ 3 = 45.1%.

[c] A redundancy (Rd) coefficient equals an adequacy coefficient
times Rc2, e.g., 45.1% times 86.9% equals 39.2%.
TABLE 4 The Relationship Between Regression beta Weights and CCA Function Coefficients
                                            Function
Variable               beta               Coefficients

CHA2             -0.07129038 / R =          -0.1125
INT2              0.28335280 / R =           0.4471
OTH2              0.45229076 / R =           0.7137

Note. R = Rc = 0.633692. The weights are
reported to the same number of decimal places
produced on the SAS output.
TABLE 5 Three Steps to Convet CCA Results to Factorial ANOVA F's
Step #1. Get CCa lambda for 4 sets of orthogonal contrast
variables.

Model   Predictors

1       CIQGRP1   CIQGRP2   CEXGRP1
2
3       CIQGRP1   CIQGRP2   CEXGRP1
4       CIQGRP1   CIQGRP2   CEXGRP1

Model                         lambda

1       CIQBYEX1   CIQBYEX2   .33717579
2       CIQBYEX1   CIQBYEX2   .77521614
3       CIQBYEX1   CIQBYEX2   .44092219
4                             .45821326

Step #2: Conver lambdas to ratios for each effect.

                               Full model        lambda
Effect                  Ratio    lambda         w/o Effect

IQ                      1 / 2   .33717579   /   .77521614   =
Exp. Assignment         1 / 3   .33717579   /   .44092219   =
IQ x Exp. Interaction   1 / 4   .33717579   /   .45821326   =

Effect                   Ratio

IQ                      .434944
Exp. Assignment         .764705
IQ x Exp. Interaction   .735849

Step #3: Convert ratios to ANOVA F's, by the lagorithm, F =
((1 - effect ratio) / ratio) x (df error / df effect)

IQ                       ((1 - .434944) / .434944) x (6 /2)
                         (     .565055) / .434944) x  3
                                                    3.897435
Exp. Assignment          ((1 - .764705) / .764705) x (6 / 1)
                         (     .235294  / .764705) x 6
                                        / .307692) x 6
                                                    1.846153
IQ x Exp. Interaction    ((1 - .735849) / .735849) x (6 / 2)
                         (     .264150  / .735849) x  3
                                          .358974  x 3
                                                    1.076923

GRAPH: FIGURE 1; Plot of CRIT1 by PRED1

REFERENCES

Benton, R. (1991). Statistical power considerations in ANOVA. In B. Thompson (Ed.), Advances in educational research Substantive findings, methodological developments (Vol. 1, pp. 119-132). Greenwich, CT: JAI Press.

Carver, R. P. (1978). The case against statistical significance testing. Harvard Educational Review, 48(3), 378-399.

Cliff, N. (1987). Analyzing multivariate data. San Diego: Harcourt Brace Jovanovich.

Cohen, J. (1968). Multiple regression as a general data-analytic system. Psychological Bulletin, 70, 426-443.

Cooley, W. W., & Lohnes, P. R. (1971). Multivariate data analysis. New York: John Wiley and Sons.

Cramer, E. M., & Nicewander, W. A. (1979). Some symmetric, invariant measures of multivariate association. Psychometrika, 44, 43-54.

Eason, S. (1991). Why generalizability theory yields better results than classical test theory: A primer with concrete examples. In B. Thompson (Ed.), Advances in educational research: Substantive findings, methodological developments (Vol. 1, pp. 83-98). Greenwich, CT: JAI Press.

Eason, S., Daniel, L., & Thompson, B. (1990,January). A review of practice in a decade's worth of canonical correlation analysis studies. Paper presented at the annual meeting of the Southwest Educational Research Association, Austin, Texas.

Elmore, P. B., & Woehlke, P. L. (1988). Statistical methods employed in American Educational Research Journal, Educational Researcher, and Review of Educational Research from 1978 to 1987. Educational Researcher, 17(9), 19-20.

Fish, L. J. (1988). Why multivariate methods are usually vital. Measurement and Evaluation in Counseling and Development, 21, 130-137.

Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Erlbaum.

Harris, R. J. (1989). A canonical cautionary. Multivariate Behavioral Research, 24, 17-39.

Herzberg, P. A. (1969). The parameters of cross validation. Psychometrika, Monograph supplement, No. 16.

Hinkle, D. E., Wiersma, W., & Jurs, S. G. (1979). Applied statistics or the behavioral sciences. Chicago: Rand McNally.

Keppel. G., & Zedeck, S. (1989). Data analysis for research designs. New York: W. H. Freeman.

Kerlinger, F. N., & Pedhazur. E. J. (1973). Multiple regression in behavioral research New York: Holt, Rinehart and Winston.

Knapp, T. R. (1978). Canonical correlation analysis: A general parametric significance testing system. Psychological Bulletin, 85, 410-416.

Levine, M. S. (1977). Canonical analysis and factor comparison. Newbury Park: Sage.

Lunneborg, C. E. (1987). Bootstrap applications for the behavioral sciences. Seattle: University of Washington.

Maxwell, S. (in press). Recent developments in MANOVA applications. In B. Thompson (Ed.), Advances in social science methodology (Vol. 2). Greenwich, CT: JAI Press.

Meredith. W. (1964). Canonical correlations with fallible data. Psychometrika. 29, 55-65.

Neale, J. M., & Liebert. R. M. (1986). Science and behavior: An introduction to methods of research (3rd ed.). Englewood Cliffs, NJ: Prentice-Hall.

Sexton, J. D., McLean, M., Boyd, R. D., Thompson, B., & McCormick, K. (1988). Criterion-related validity of a new standardized developmental measure for use with infants who are handicapped. Measurement and Evaluation in Counseling and Development, 21, 16-24.

Stevens, J. (1986). Applied multivariate statistics for the social sciences. Hillsdale, NJ: Erlbaum.

Tatsuoka, M. M. (1973). Multivariate analysis in educational research. In F. N. Kerlinger (Ed.), Review of research in education(pp. 273-319). Itasca, IL: Peacock.

Thompson, B. (1980,April). Canonical correlation: Recent extensions for modelling educational processes. Paper presented at the annual meeting of the American Educational Research Association. Boston. (ERIC Document Reproduction Service No. ED 199 269)

Thompson, B. (1982). CANBAK: A computer program which performs stepwise canonical correlation analysis. Educational and Psychological Measurement, 42. 849-851.

Thompson, B. (1984). Canonical correlation analysis: Uses and interpretation. Newbury Park: Sage.

Thompson, B. (1986,November). Two reasons why multivariate methods are usually vital. Paper presented at the annual meeting of the Mid-South Educational Research Association, Memphis.

Thompson, B. (1987,April). The use (and misuse) of statistical significance testing: Some recommendations for improved editorial policy and practice. Paper presented at the annual meeting of the American Educational Research Association, Washington, D.C. (ERIC Document Reproduction Service No. ED 287 868)

Thompson, B. (1988,April). Canonical correlation analysis: An explanation with comments on correct practice. Paper presented at the annual meeting of the American Educational Research Association, New Orleans. (ERIC Document Reproduction Service No. ED 295 957)

Thompson, B. (1989a, August). Applications of multivariate statistics: A bibliography of canonical correlation analysis studies. Paper presented at the annual meeting of the American Psychological Association, New Orleans. (ERIC Reproduction Service No. ED 311 070)

Thompson, B. (1989b, March). Invariance of multivariate results: A Monte Carlo study of canonical coefficients. Paper presented at the annual meeting of the American Educational Research Association, San Francisco. (ERIC Document Reproduction Service No. ED 306 226)

Thompson, B. (1989c). Statistical significance. result importance, and result generalizability: Three noteworthy but somewhat different issues. Measurement and Evaluation in Counseling and Development, 22, 2-6.

Thompson, B. (1990a). Finding a correction for the sampling error in multivariate measures of relationship: A Monte Carlo study. Educational and Psychological Measurement, 50, 15-31.

Thompson, B. (1990b). MULTINOR: A FORTRAN program that assists in evaluating multivariate normality. Educational and Psychological Measurement, 50, 845-848.

Thompson, B. (1990c, April). Variable importance in multiple and canonical correlation/regression. Paper presented at the annual meeting of the American Educational Research Association, Boston. (ERIC Document Reproduction Service No. ED 317 615)

Thompson, B. (1991). Review of Data analysis for research designs by G. Keppel & S. Zedeck. Educational and Psychological Measurement, 51.

Thompson, B., & Borrello, G. M. (1985). The importance of structure coefficients in regression research. Educational and Psychological Measurement, 45, 203-209.

Thompson, B., & Daniel, L. (1991,April). Canonical correlation analyses that incorporate measurement and sampling error considerations. Paper presented at the annual meeting of the American Educational Research Association, Chicago.

Thompson, B., Webber, L., & Berenson, G. S. (1988). Validity of a children's health locus of control measure: A "Heart Smart" study. American Journal of Health Promotion, 3(2), 44-49.

Thorndike, R. M. (1976,April). Strategies for rotating canonical components. Paper presented at the annual meeting of the American Educational Research Association, San Francisco. (ERIC Document Reproduction Service No. ED 123 259)

Wherry, R. J. (1931). A new formula for predicting the shrinkage of the coefficient of multiple correlation. Annals of Mathematical Statistics, 2, 440-451.

Wood, D. A., & Erskine, J. A. (1976). Strategies in canonical correlation with application to behavioral data. Educational and Psychological Measurement, 36, 861-878.

APPENDIX A: SAS Program for Table 1 Data

DATA HLOCMECD; INFILE ABC;

INPUT ID 3-4 CHA6 6-7 INT6 9-10 OTH6 12-13 CHA2 15-16 INT2 18-19 OTH2 21-22 IQ 24-26 IQGRP 28 EXPERGRP 30 CIQGRP1 32-33 CIQGRP2 35-36 CEXGRP1 38-39 CIQBYEX1 41-42 CIQBYEX2 44-45;

PROC PRINT; VAR ID CHA6 INT6 OTH6 CHA2 INT2 OTH2 IQ IQGRP EXPERGRP CIQGRP1 CIQGRP2 CEXGRP1 CIQBYEX1 CIQBYEX2; RUN;

TITLE `1. DESCRIPTION OF RAW DATA;

PROC CORR; VAR CHA6 INT6 OTH6 CHA2 INT2 OTH2 IQ IQGRP EXPERGRP CIQGRP1 CIQGRP2 CEXGRP1 CIQBYEX1 CIQBYEX2; RUN;

TITLE `2. THE LOGIC OF CCA';

PROC CANCORR ALL; VAR CHA6 INT6 OTH6; WITH CHA2 INT2 OTH2;

data hlocnew; set hlocmecd;

zcha6=(cha6-18.166666667)/3.24270744;

zint6=(int6-18.083333333)/3.77692355;

zoth6=(oth--14.250000000)/3.69582074;

zcha2=(cha2-07.250000000)/0.75377836;

zint2=(int2-06.083333333)/0.99620492;

zoth2=(oth2-05.583333333)/1.62135372;

crit1=(0.6717[*]zcha6)+(0.3570[*]zint6)+(0.4214[*]zoth6);

pred1=(0.4494[*]zcha2)+(0.7200[*]zint2)+(0.2228[*]zoth2);

crit=(-.8174[*]zcha6)+(0.2433[*]zint6)+(0.7837[*]zoth6);

pred2=(0.1669[*]zcha2)+(-.6422[*]zint2)+(1.1368[*]zoth2);

crit3=(-.0266[*]zcha6)+(-9253[*]zint6)+(0.6099[*]zoth6);

pred3=(0.9721[*]zcha2)+(-.6576[*]zint2)+0.1307[*]zoth2);

proc print; var zcha6 zint6 zoth6 zcha2 zint2 zoth2 crit1 pred1 crit2 pred2 crit3 pred3; run;

title '2a AN r MATRIX WITH MANY REVELATIONS';

proc corr; var zcha6 zint6 zoth6 zcha2 zint2 zoth2 crit1 pred1 crit3 pred3; run;

title '2b THE 1ST FUNCTION IN GRAPHIC FORM';

proc plot; plot crit1[*]pred 1=id/ vaxis=-3 to 7 by 1 vref=0 haxis=-3 to 8 by 1 href=0; run;

TITLE '3. CCA SUBSUMES PEARSON CORRELATION';

PROC CORR; VAR OTH6 OTH2;

PROC CANCORR ALL; VAR OTH6; WITH OTH2; RUN;

TITLE '4. CCA SUBSUMES T-TESTS & ONE-WAY ANOVA';

PROC TTEST; CLASS EXPERGRP; VAR OTH6;

PROC ANOVA; CLASS EXPERGRP; MODEL OTH6=EXPERGRP;

PROC CANCORR ALL; VAR OTH6; WITH CEXGRP1; RUN;

TITLE '5. CCA SUBSUMES FACTORIAL ANOVA';

PROC ANOVA; CLASS IQGRP EXPERGRP;

MODEL CHA6=IQGRP EXPERGRP IQGRP[*]EXPERGRP;

PROC CANCORR; VAR CHA6; WITH CIQGRP1 CIQGRP2 CEXGRP1 CIQBYEX1 CIQBYEX2;

PROC CANCORR; VAR CHA6; WITH CEXGRP1 CIQBYEX1 CIQBYEX2;

PROC CANCORR; VAR CHA6; WITH CIQGRP1 CIQGRP2 CIQBYEX1 CIQBYEX2;

PROC CANCORR; VAR CHA6; WITH CIQGRP1 CIQGRP2 CEXGRP1; RUN;

TITLE '6. CCA SUBSUMES MULTIPLE REGRESSION';

PROC REG; MODEL INT6=CHA2 INT2 OTH2/ STB;

PROC CANCORR ALL; VAR INT6; WITH CHA2 INT2 OTH2; RUN;

TITLE '7. CCA SUBSUMES FACTORIAL MANOVA';

PROC ANOVA; CLASS IQGRP EXPERGRP; MODEL CHA6 INT6=IQGRP IQGRP[*]EXCPERGRP; MANOVA H=(underbar)ALL(underbar)/SUMMARY;

PROC CANCORR ALL; VAR CHA6 INT6; WITH CIQGRP1 CIQGRP2 CEXGRP2 CIQBYEX1 CIQBYEX2;

PROC CANCORR ALL; VAR CHA6 INT6; WITH CEXGRP1 CIBYEX1 CIBYEX2;

PROC CANCORR ALL; VAR CHA6 INT6; WITH CIQGRP1 CIQGRP2 CIQBYEX1 CIQBYEX2;

PROC CANCORR ALL; VAR CHA6 INT6; WITH CIQGRP1 CIQGRP2 CEXGRP2; RUN;

TITLE '8. CCA SUBSUMES DISCRIMINANT';

PROC CANDISC ALL; VAR CHA6 INT6; CLASS EXPERGRP;

PROC CANCORR ALL; VAR CHA6 INT6; WITH CEXGRP1;

Note. The bulk of the program was executed as the first of two runs. The lower case commands required the results from the first run, since the relevant coefficients were not yet known. These commands were then added into the program, and the program was executed a second time. Of course, since the required values used on the lower case commands are presented here, for this particular example the reader can execute the full program in a single step.

~~~~~~~~

By BRUCE THOMPSON

Bruce Thompson is a professor of educational psychology at Texas A & M University, College Station, Texas, and adjunct professor of community medicine at Haylor College of Medicine, Houston, Texas.


Copyright of Measurement & Evaluation in Counseling & Development is the property of American Counseling Association. The copyright in an individual article may be maintained by the author in certain cases. Content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.
Source: Measurement & Evaluation in Counseling & Development, Jul91, Vol. 24 Issue 2, p80, 16p
Item: 9705181341