Path Analysis with Manifest
First, let's take a moment to discuss and describe our fictional* model.
Our model consists of seven directly measured variables or manifest variables.
They include; Education, Responsibility, Ambition, Wealth, Suggestibility,
(Ethical) Flexibility, and Political Success. Our model reflects hypothesized
causal relationships among characteristics of American politicians. Our model hypothesizes three
key causal variables (Wealth, Suggestibility, & [Ethical] Flexibility) for political success. We
further expect politicians who exhibit
high levels of education, responsibility, and ambition to also exhibit greater
*Again; this is a fictional example and is not meant to be taken seriously as
a research finding supported by empirical evidence. It is merely used here
for instructional example purposes.
If you are unfamiliar with standard path and structural equation models; there
are a few things you should take note of in our path diagram that tend to be
seen in published materials displaying path models and structural equation
models. First, the use of squares or rectangles to denote observed or measured
variables (often referred to as manifest variables). Second, the use of
straight, single headed arrows to denote hypothesized causal relationships
(often referred to as a paths). And third, the use of curved, double-headed
arrows to refer to bi-directional relationships (often referred to as correlations or covariances). Specific hypotheses should be used to clarify what
the researcher expects to find (e.g. a very strong positive relationship between
Wealth & Education).
One of the key issues with Path Analysis and SEM is the issue of
overidentification. A model is said to be overidentified if it contains more
unique inputs (sometimes called informations) than the number of parameters
being estimated. In our example, we have seven measured variables. We can apply
the following formula to calculate the number of unique inputs:
number of unique inputs = (p ( p + 1 ) ) / 2
where p = the number of manifest or measured variables. Given this formula and
our 7 manifest variables; we calculate 28 unique inputs or informations which is
greater than the number of parameters we are estimating. Looking at the diagram,
we see 10 covariances (C?), 6 paths (P?), 5 variable variances, and 2 error
variances (VAR?). Adding these up, we get 23 parameters to be estimated.
Remember too that path analysis and SEM require large sample sizes. Several
general rules have been put forth as lowest reasonable sample size estimates; at
least 200 cases at a minimum, at least 5 cases per manifest or measured
variable, at least 400 cases, at least 25 cases per measured variable...etc. The
bottom line is this; path analysis and SEM are powerful when done with adequate
large samples -- the larger the better.
The procedure for conducting path analysis and/or SEM in SAS is PROC CALIS;
however, PROC CALIS needs to have the data fed to it. There are three ways to
'feed' PROC CALIS the data, (1) a correlation matrix with the number of
observations and standard deviations for each variable, (2) a covariance matrix,
and (3) use of the raw data as input. Here we will use the correlation matrix
with number of observations and standard deviations. You can import the raw data
to SAS using the Import Wizard to import the
Example Data 5c file using the SPSS File (*.sav) source option and the
member name ex5c. Once imported, you can get the descriptive statistics and
correlations which you will need to run the path analysis.
PROC CORR DATA=ex5c;
Using the number of observations (n = 750), the standard deviations, and
the correlation matrix, you can proceed to the path analysis.
The syntax for estimating or fitting our Path Model is displayed below.
Note that the top half of the syntax simply enters the data for the path
The bottom half (PROC CALIS) is used to fit the path model.
INPUT _TYPE_ $ _NAME_ $ V1-V7;
V1 = 'education'
V2 = 'responsibility'
V3 = 'ambition'
V4 = 'wealth'
V5 = 'suggestibility'
V6 = 'moral flexibility'
V7 = 'political success';
N . 750 750 750 750 750 750 750
STD . 0.9709 1.0218 0.9873 0.9999 0.9666 1.0072 1.0001
CORR V1 1.0000 . . . . . .
CORR V2 .3546 1.0000 . . . . .
CORR V3 .3377 .3198 1.0000 . . . .
CORR V4 .5912 .6581 .5319 1.0000 . . .
CORR V5 .0203 .0131 .0422 .0138 1.0000 . .
CORR V6 .0225 -.0034 .0591 .0349 .5249 1.0000 .
CORR V7 -.0047 .0016 .0046 -.0236 .7047 .7185 1.0000
PROC CALIS COVARIANCE CORR RESIDUAL
V7 = PV7V4 V4 + PV7V5 V5 + PV7V6 V6 + E1,
V4 = PV4V1 V1 + PV4V2 V2 + PV4V3 V3 + E2;
E1 = VARE1,
E2 = VARE2,
V1 = VARV1,
V2 = VARV2,
V3 = VARV3,
V5 = VARV5,
V6 = VARV6;
V1 V2 = CV1V2,
V1 V3 = CV1V3,
V1 V5 = CV1V5,
V1 V6 = CV1V6,
V2 V3 = CV2V3,
V2 V5 = CV2V5,
V2 V6 = CV2V6,
V3 V5 = CV3V5,
V3 V6 = CV3V6,
V5 V6 = CV5V6;
VAR V1 V2 V3 V4 V5 V6 V7;
The PROC CALIS statement is followed by options. First, COVARIANCE tells SAS we
want to use the covariance matrix to perform the analysis. Even though we are
using the correlation matrix as our data input, SAS calculates the covariance
matrix for the PROC CALIS. The CORR option specifies that we want the output to
include the correlation matrix or covariance matrix on which the analysis is
run. The RESIDUAL option allows us to see the absolute and standardized
residuals in the output. The MODIFICATION option tells SAS to print the
modification indices (e.g. Lagrange Multiplier Test). The next part of the
syntax, LINEQS, provides SAS with the specific linear equations which specify
the paths we want estimated. The first of which can be read as: variables 7 is
causally effected by the path between variable 7 and variable 4, the path
between variable 7 and variable 5, the path between variable 7 and variable 6,
and the error variance associated with variable 7. Next, we see the STD lines which
specify which variances we want estimated (listed as VAR here and in the diagram
above). Last, the COV statements specify all the covariances which need to be
estimated. Then, the VAR line simply lists the variables to be used in the
*Please note; the first page of output was produced by the PROC CORR directly
after importing the data (above). Therefore, the references to page numbers of
output associated with the PROC CALIS will begin on the second page (p. 2) of
the total output file (e.g. page 1 of the PROC CALIS output actually has the
number 2 in the top right corner). The page number discrepancy is noted here
because all PROC CALIS procedures tend to produce several pages of output.
The first page of the PROC CALIS output consists of general information,
including the number of endogenous variables (any variable with a straight
single-headed arrow pointing at it) and the number of exogenous variables (any
variable without any straight single-headed arrows pointing to it).
The second page of the PROC CALIS output consists of a listing of the parameters
to be estimated; essentially a review of the specified model from the CALIS
The third page shows the general components of the model (e.g. number of
variables, number of informations, number of parameters, etc.); as well as the
descriptive statistics and covariance matrix for the variables entered in the
The fourth page provides the initial parameter estimates.
The fifth page includes the iteration history. Often it is important to focus on
the last line of the Optimization results (left side of the middle of the page)
which states whether or not convergence criterion was satisfied. Also of
importance is the beginning of the predicted covariance matrix, which is used
for comparison to the matrix of association (original covariance matrix) to
produce residual values.
The sixth page continues the predicted covariance matrix.
The seventh page displays fit indices. As you can see, a fairly comprehensive
list is provided. Please note that although Chi-square is displayed it should not
be used as an interpretation of goodness-of-fit due to the large sample sizes
necessary for path analysis and SEM (which inflates the chi-square statistic to
the point of meaninglessness). Some of the more commonly reported fit indices
are the RMSEA (root mean square error of approximation), which when below .05
indicates good fit; the Schwarz's Bayesian Criterion (also called BIC; Bayesian
Information Criteria), where the smaller the value (i.e. below zero) the better the fit; and the Bentler & Bonnett's Non-normed Index (NNFI) as well as the Bentler & Bonnett's
normed fit index (NFI)--both of which should be greater than .90 and above to
indicate good fit.
Page 8 provides the Raw residual matrix and the ranking of the 9 largest
The 9th page shows the Standardized residual matrix and the 9 largest
Standardized residuals; we expect values close to zero which indicates good
fit. Any values greater than |2.00| indicates lack of fit and should be
The 10th page displays a sideways histogram of the distribution of the
Standardized residuals. Generally we expect to see a normal distribution of
residuals with no values greater than |2.00|.
The 11th page displays our path coefficients in Raw form, as well as t-values
and standard errors for the t-values associated with each. Further down
on the 11th page, we see estimated variance parameters and estimated covariances;
each with t-values and standard errors for the t-values. Remember
that t-values for coefficients are statistically significant (p <
.05, two-tailed) if their absolute value is greater than 1.96; meaning they are
significantly different from zero. It is also recommended that a review of the
standard errors be performed, as extremely small standard errors (those very
close to zero) may indicate a problem with fit associated with one variable
being linearly dependent upon one or more other variables.
The 12th page provides Standardized path coefficients and squared multiple
correlations for endogenous variables (often considered the dependent variables
in such a model). The 'Squared Multiple Correlations' R-square column gives us
an idea of how well our model fits because, these values are interpreted as the
percentage of variance in our endogenous variables accounted for by their
respective exogenous variables. As an example; we could interpret V7 (Political
Success) as having 66.66% of its variance accounted for by the combination of V4
(Wealth), V5 (Suggestibility), and V6 (Ethical Flexibility).
The 13th page begins the listing of the modification indices, which continues to
the end of the output. One should be careful when interpreting modification
indices and should do so only after carefully interpreting all the previous
output first. Modification indices generally take two forms; ones which
recommend the exclusion of a parameter from the specified model and ones which
recommend inclusion of a parameter to the model. Both types attempt to estimate
the decrease in chi-square associated with the recommendation being implemented
(i.e. increased goodness of fit). However, as mentioned above, chi-square is
generally not an acceptable measure of goodness-of-fit and therefore
modification indices should be treated with caution.
The 14th page.
The 15th page.
The 16th page.
Below you will find our completed path diagram with standardized path
Generally speaking, the output for any PROC CALIS will follow the same format
seen here for path analysis; for example, the order of the output's presentation
will be the same for the SEM example in the next tutorial.
Please realize this tutorial is not meant to be an exhaustive review; it
is merely an introduction. This tutorial is not meant to replace one or several
And that concludes the tutorial on Path Analysis with manifest variables.
The tutorial on the basics of Structural Equation Modeling (SEM) can be found