|
Logistic Regression (Binary)
Binary (also called binomial) Logistic regression is appropriate when the
outcome is a dichotomous variable (i.e. categorical with only two categories)
and the predictors are of any type: nominal, ordinal, and / or interval/ratio
(numeric). Either Multi-nomial Logistic Regression or Discriminant Function
Analysis is appropriate when the outcome variable is polytomous (i.e.
categorical with more than two categories). Standard multiple regression can
only accommodate an outcome variable which is continuous or nearly continuous
(i.e. interval/ratio in scale) and it works best with continuous or nearly
continuous predictor variables. Although standard regression can accommodate
categorical predictors using one of the following strategies for those types of
predictors: dummy coding, effects coding, orthogonal coding, or criterion
coding. Binary logistic and multinomial logistic regression can also accommodate
categorical predictors, but categorical predictors must be identified in the
menu system when the analysis is being specified.
For the duration of this tutorial
we will be using the
logreg1.sav file; which contains 1 dichotomous categorical outcome variable (y) and
4 predictor
variables (x1 - x4). The outcome (y) contains the values 0 and 1.
Begin by clicking on Analyze, Regression, Binary Logistic...
Next, highlight / select the outcome variable (y) and use the top arrow
button to move it to the Dependent: box. Then, highlight the four predictor
variables (x1, x2, x3, x4) and use the second arrow button to move them to the
Covariates: box. Notice, if one or more of the predictors was categorical, we
would need to click on the Categorical... button to specify them as such. Click on the Options... button and select Classification
plots, Hosmer-Lemeshow goodness-of-fit, Casewise listing of residuals,
Correlations of estimates, Iteration history, and CI for exp(B):. Then, click
the Continue button, then click the OK button.
The output should be similar to what is displayed below.
The Case Processing Summary table provides an overview of missing data; here
there was no missing data. The Dependent Variable Encoding table shows how the
outcome variable was coded, if it was coded. Here, the outcome variable was not
coded; therefore, the values are listed in both columns. If, for example, the
outcome variable represented responses yes and no, then the left column
(Original Value) would show the associated value labels, 'yes' and 'no' while
the right column (Internal Value) would show the values 0 and 1 for each. By
default, the binary logistic regression predicts the odds of membership in the outcome
category with the highest value; here predicting membership in the 1 value, as
opposed to membership in the 0 value.
The Beginning Block evaluates our model with only the constant in the
equation (sometimes called the null model). The constant is analogous to the
y-intercept in OLS regression. The iteration history was specified in the
options, is displayed throughout the output file. The first Iteration History
table (directly below) shows that estimation was terminated at iteration # 1
because the parameter estimates did not change by more than 0.001. The -2 Log
likelihood (-2 LL) is a likelihood ratio and represents the unexplained variance in the
outcome variable. Therefore, the smaller the value, the better the fit. The
Classification Table shows how well our null model correctly classifies cases.
The rows represent the number of cases in each category in the actual data and
the columns represent the number of cases in each category as classified by the
null model. The key piece of information is the overall percentage in the lower
right corner which shows our null model is only 50% accurate; which is equal to
the accuracy of random guessing.
The Variables in the Equation table shows the logistic coefficient (B)
associated with the intercept as it is included in the model. This table is
similar to and contains analogous information as the coefficients table in a
standard regression. The logistic coefficient for the constant is similar to the
y-intercept term in standard regression. The Wald statistic is a chi-square
'type' of statistic and is used to test the significance of the variable in the
model. The Exp(B) refers to the change in odds ratio attributed to the variable.
The Variables not in the Equation table simply lists the Wald test score, df,
and p-value for each of the variables not included in the
beginning block model. Notice the Overall Statistics is not a total, but rather
an estimate of overall Wald statistic associated with the model had all the
variables been included.
The number of blocks will increase with and correspond to the number of
blocks of covariates or predictors entered into the model. Meaning, when
specifying the variable for inclusion into the model, you notice above in the
second figure (Logistic Regression dialog box) we could have clicked the Next
button and entered more variables as a distinct block (as is done in sequential
or hierarchical regression). Here, we only have one set of predictors so
there is only the intercept model block (Block 0) above and the complete model
(Block 1) below.
The iteration history (above) was specified in the options, is displayed
throughout the output file. The Iteration History table (above left) shows that
estimation was terminated at iteration # 11 because the parameter estimates did
not change by more than 0.001. The -2 LL is a likelihood ratio and represents
the unexplained variance in the outcome variable. Therefore, the smaller the
value, the better the fit. Notice here the -2 LL (57.759) is substantially lower
than that given above for the null model (554.518). The Omnibus Tests of Model
Coefficients table reports the chi-square associated with each step in a
stepwise model. Here, there is only one step from the constant model to the
block containing predictors so all three values are the same. The significance
value or p-value indicates our model (Block 1; with predictors) is
significantly different from the constant only model; meaning there is a
significant effect for the combined predictors on the outcome variable.
The Model Summary table displays the -2 LL as was shown and discussed
directly above. The two R² estimates
are not truly R² estimates; they are
pseudo-R²; meaning they are analogous
to R² in standard multiple
regression, but do not carry the same interpretation. They are not
representative of the amount of variance in the outcome variable accounted for
by all the predictor variables. The Nagelkerke estimate is calculated in such a
way as to be constrained between 0 and 1. So, it can be evaluated as indicating
model fit; with a better model displaying a value closer to 1. The larger Cox &
Snell estimate is the better the model; but it can be greater than 1. These
metrics should be interpreted with caution and although not ignored, they offer
little confidence in interpreting the model fit. The Hosmer and Lemeshow Test
table (above right) is the preferred test of goodness-of-fit. As with most
chi-square based tests however, it is prone to inflation as sample size
increases. Here, we see model fit is acceptable
χ² (9) = 14.559, p = .068,
which indicates our model predicts values not significantly different from what
we observed. To be clear, you want the p-value to be greater than
your established cutoff (generally 0.05) to indicate good fit. The Contingency
Table for Hosmer and Lemeshow Test (below left) simply shows the observed and
expected values for each category of the outcome variable as used to calculate
the Hosmer and Lemeshow chi-square.
The Classification Table (above right) shows how well our full model
correctly classifies cases. A perfect model would show only values in the
diagonal--correctly classifying all cases. Adding across the rows represents the
number of cases in each category in the actual data and adding down the
columns represents the number of cases in each category as classified by the
full model. The key piece of information is the overall percentage in the lower
right corner which shows our model (with all predictors & the constant) is 98.3%
accurate; which is excellent. One way of assessing the model's fit is to compare
the overall percentage in the full model's table to the overall percentage in
the null model table. Another, more highly regarded way is to compare the full
model's overall percentage to the chance percentage (50% in this example) plus
25% = 75%. Note, chance percentage can be weighted by the proportion of
cases in each category of the outcome variable; thus making it more
conservative. For instance, if we had 250 cases (62.5%) in the 1 category and
150 cases (37.5%) in the 0 category, then we would square and sum those
proportions to arrive at a more conservative chance percentage for comparison
(53.2%). Taking the weighted chance percentage and adding 25% brings the
comparison to 78.2 %; still far below what our full model is capable of.
Weighted chance percentage: .625² + .375²
= .391 + .141 = .532 = 53.2%
The Variables in the Equation table (above), shows the logistic coefficient
(B) for each predictor variable. The logistic coefficient is the expected amount
of change in the logit for each one unit change in the predictor. The logit is
what is being predicted; it is the odds of membership in the category of the
outcome variable with the numerically higher value (here a 1, rather than 0).
The closer a logistic coefficient is to zero, the less influence it has in
predicting the logit. The table also displays the standard error, Wald
statistic, df, Sig. (p-value); as well as the Exp(B) and
confidence interval for the Exp(B). The Wald test (and associated p-value)
is used to evaluate whether or not the logistic coefficient is different than
zero. The Exp(B) is the odds ratio associated with each predictor. We expect
predictors which increase the logit to display Exp(B) greater than 1.0, those
predictors which do not have an effect on the logit will display an Exp(B) of
1.0 and predictors which decease the logit will have Exp(B) values less than
1.0. Note that the Exp(B) is wildly large for the x3 predictor. This is due to a
combination of the strong relationship between that variable and the outcome
variable; and the fact that x3 is nearly categorical itself. Generally, when
using continuous variables as predictors, you will not see such large Exp(B).
The Correlation Matrix table simply shows the correlations between each of
the predictor variables and the constant (note the outcome is not included).
The graph (above) shows how our full model predicts membership. It is
unusually clear in the middle because our model was so accurate. When a model is
less accurate, more symbols (here 1 and 0) would appear in the middle,
displaying their probability (x-axis). The better the model, the clearer the
middle of the graph.
The Casewise List table displays cases which were incorrectly classified by
the model. Here, we only have four cases mis-classified.
As with most of the tutorials / pages within this site, this page should not
be considered an exhaustive review of the topic covered and it should not be
considered a substitute for a good textbook.
|