Initial Data Analysis (IDA) continued from previous Module. What are
descriptive statistics? Descriptive statistics allow us to *describe *a set
of scores or multiple sets of scores. There are typically four categories of
descriptive statistics; central tendency, dispersion, distribution, and
relation.
Central Tendency: There are three general measures of central tendency. (1).
Mean. The mean is the most frequently used to describe the center of a
distribution of scores. It is the arithmetic average of a series of scores. Mean
is very sensitive to outliers and for this reason, it is often preferable to use
the trimmed mean which *trims *some percentage of extreme scores (e.g.
20%). (2). Median. The median is the point that delineates two halves of a
series of scores. (3). Mode. The mode is the most frequently occurring score in
a series.
Dispersion: There are 5 general measures of dispersion. (1). Variance.
Variance is the sum of the squared deviations from the mean divided by the
degrees of freedom. In lay terms, variance is the average deviation of the
scores around the mean. (2). Standard Deviation. Standard deviation is the
square-root of the variance. It is a standardized measure of dispersion (most
frequently used) which allows us to compare distributions of different
variables. Notice that sums of squares is crucial to both. (3). Z-scores (also
called Standard Scores). Z-scores represent a transformation applied to each
score which allows us to compare scores from different distributions. (4).
Range. The range is simply the highest score minus the lowest score and gives an
idea of the spread of scores or distance. (5). Minimum & Maximum. Simply the
minimum and maximum scores. All measures of dispersion provide an idea of
distance or spread.
Distribution: There are two measures of distribution, both offer a
description of the shape of a distribution of scores. Skewness refers to the
amount of non-symmetry a distribution of scores contains. Negative skew is when
the tail points to the smaller values and most scores are located at the higher
values. Positive skew is when the tail points to the larger values and most
scores are located at the smaller values. Zero skew indicates symmetry. Kurtosis
is used to measure the amount of tail magnitude, commonly referred to as *
peakedness *or *flatness* of a distribution. Kurtosis is also referred
to as a measure of normality. It is based on the size of a distribution's tails.
A distribution with a large, *positive* kurtosis has thin tails and the
distribution looks peaked. This is known as leptokurtic. A distribution with a
large, *negative* kurtosis has large tails or thick tails and the distribution
looks flat. This is known as platykurtic (like a plateau).
Relation. There are two measures of relation; both refer to the amount of
shared variance two variables have. Measures of relation are unique in that they
are descriptive, but can also be used inferentially when assessing magnitude.
Covariance is an unstandardized measure of relation. Correlation is a
standardized measure of relation; meaning it can be used to compare the
relationships between multiple variables.
**Getting descriptive statistics in SPSS.**
Open the data file named "Cars.sav" which is available
here.
**Method 1:**
With the Cars data file open in the Data window, go to Analyze, Descriptive
Statistics, and then Descriptives...
Now you should have a smaller window open, highlight/select "Time to
Accelerate from 0 to 60 (sec) [accel]" and use the arrow to put it into the
variables box.
Next, click on "Options..." and select the descriptive statistics you want
(typically mean, standard deviation, variance, range, standard error (S.E.) of
the mean, minimum and maximum, as well as kurtosis and skewness). Then click
"Continue".
You may also need to get the Z-scores for a variable and therefore, you can
click the box in the lower left corner "Save standardized values as variables".
This function will create a new variable in your data sheet (in the right-most
column of data view) which will contain the values of each Z-score corresponding
to each individual score for that variable (accel).
Next, click "OK". The output should contain a single, very long [to the
right] table with all the descriptive statistics specified (except the Z-scores
which are in the data file).
**Method 2:**
With the Cars data file open in the Data window, go to Analyze, Descriptive
Statistics, and then Frequencies...
Now you should have a smaller window open, highlight/select ""Time to
Accelerate from 0 to 60 (sec) [accel]" and use the arrow to put it into the
variables box.
Next, click on "Statistics..." and select all the statistics specified
earlier, as well as quartiles; then click "Continue".
Next, click on "Charts..." and select Histograms and Show normal curve on
histogram. Then click "Continue" and then click "OK".
You should now see some output similar to that below. You'll notice the
output table containing all the descriptive statistics is smaller and easier to
read than the one provided by the Descriptive Statistics function above.
There are four benefits to using the Frequencies function for gathering
descriptive statistics. First, you can get more descriptive statistics
(quartiles), second; you can get a graphical display of the variable (histogram
for continuous variables and bar graph for categorical variables). Third, you
get a frequencies table; and fourth, the descriptive statistics table is smaller
and easier to read with frequencies function. However, you can only get the
standardized scores (Z-scores) by doing the Descriptives function.
**Method 3: The Explore Function for getting descriptive statistics by group**
With the
Explore Example data file open in the Data window, go to Analyze, Descriptive
Statistics, and then Explore...
Next, pick your dependent variable, in this example we'll use the variable
"total score on blame scale [bt]". Highlight and move it to the Dependent List:
box. Then, pick your independent variable, in this example we'll use the
grouping variable "GENDER [sex]". Highlight it and move it to the Factor List:
box. Then click on the Statistics... button.
Now we can specify what we want to get. Check Descriptives, M-estimators,
Outliers, and Percentiles. Then click the Continue button. Next, click on the
Plots button and select Histogram and Normality plots with tests. Then click the
Continue button. Then click the OK button.
You should see some output similar to that displayed below.
You'll notice you get the Case Processing Summary which simply reports the
number of participants/cases, percentages, and number of missing for each group
of your independent or grouping variable. Then you get the descriptive
statistics for each group, percentiles, then the table of extreme values. This
last one; extreme values, is very handy for helping to detect and/or evaluate
outliers. Likewise, the Tests of Normality also are helpful for evaluating
assumptions of some common inferential (parametric) analyses. Finally, you're
given the plots for each group; histogram, stem-and-leaf, and box plot. The box
plot is also very handy for evaluating the normality and outliers within the
groups. Notice within the box plot, extreme values are marked with the case
number and the star symbol, while less extreme (but likely influential) points
are marked with the case number and the circle symbol.
Obviously, SPSS is capable of more complex graphing. If one is so inclined,
one could simply go to Graphs in the tool bar and practice making different
types of graphs with the current data. Like most functions of SPSS, it is often
easy enough to point and click ones way through a short trial-and-error session
to get what one wants. Recall, the strength of SPSS and what it takes pride in,
is its user-friendliness. SPSS is extremely easy to use and figuring out how to
get what one wants out of it often takes less time than if one used a tutorial
(such as this) to learn.
**Method 4: Correlation**
With the
Explore Example data file open in the Data window, go to Analyze, Correlate,
Bivariate...
Now you can move 'total score on blame scale' and 'total score on reasons for
assigned prison time' to the Variables: box. Notice, you can get any or all
three types of correlation and 2 or 1 tailed significance with or without
flagging. Next, click on the Options... button and specify Means and standard
deviations as well as Cross-product deviations and covariances. Then click the
Continue button, then click the OK button.
You should see output similar to that provided below. Notice, as is the case
with most analysis in SPSS; we specified and received the descriptive statistics
for the variables we analyzed (mean, standard deviation, number of
observations).
So, we see the correlation between these two variables is -.050 with a *p-*value
of .159. We could also say that only about 0.25% of the variance in one variable
is accounted for by the other variable. Correlation squared give the percentage
of variance in one variable which is accounted for by the other variable; a form
of an effect size measure (-.050 * -.050 = .0025 = .25%). Clearly, there is a
very weak (and not statistically significant) relationship between these two
variables. The covariance is -3.431 and there were 793 cases used to compute the
correlation/covariance. Notice only cases with complete data were used. |