Some basic IDA graphing
IDA? Initial Data Analysis (IDA) should be performed on every
data set you access or collect. The rationale for IDA is to determine if your
data represent what it should represent and to determine if there are errors in
the data.
There are three graphical representations we typically use. Frequency Tables
for all variables which display the frequency and percentages for each value.
Bar graphs for categorical variables, which allows us to quickly see the
distribution of cases in each category. And, Histograms for continuous or nearly
continuous variables; which allow us to observe the distribution of scores on a
continuous variable.
There are generally speaking two methods of displaying data outside text when
preparing a manuscript or research report. Tables, which are text based and
therefore can be set to typeface; and Figures which are graphical displays (e.g.
histograms, piecharts, topographical maps, wiring schematics, etc.). Both forms
of display can be used during IDA to discover data entry errors, describe the
sample characteristics, and determine if your data fits the assumptions of a
particular analysis.
Data Entry Errors. When conducting IDA, you can evaluate the data looking at
the frequency tables for missing data and/or values that do not correspond to
the values expected for a variable. For example; if you have Gender/Sex coded as
1 = female and 2 = male; but, you notice in the frequency table that case #5
shows a value of 12; that would be inconsistent with the coding strategy and
known genders/sex for most species.
Describing the Sample. When writing a research report or simply assessing the
external validity of your study, you must evaluate the sample (i.e. individual
characteristics). First, you may be concerned with the external validity of the
sample (how representative it may or may not be of the population you are
studying). Second, you will likely want to communicate a description of your
sample when writing up the study (to allow others to replicate your findings).
As an example, consider that if only 10% of our sample was male, your results
are only really applicable to females. Using the frequencies function in SPSS,
you can easily produce graphical representations for a given variable or
multiple variables.
Example of Simple Graphing with the Frequencies function as would be done
during IDA.
The mock study we will use today concerns the effectiveness
of two types of therapy for depression on increasing Life Satisfaction Rating.
The independent variable was Type of Therapy, with two conditions; either
Cognitive Behavioral Therapy (CBT) or ElectroConvulsive Therapy (ECT). The
dependent variable was Life Satisfaction Rating (a series of 10 questions
that were totaled to yield a score between 10 and 50 for each of the 16
participants).
The Example Data can be found
here.
Getting the Frequencies and Tables/Figures:
 Click on Analyze, then Descriptive Statistics, then
Frequencies. First, you are going to do the categorical data, so
highlight/select “Gender/Sex” and “Type of Therapy”, and put them in the
variables box. (Make sure “Display Frequency Tables” is checked). Then,
click on Charts, and select Bar Charts (because these are categorical
variables). Now click continue and then click Ok. You should get output
similar to that displayed below.
Now flip back to Data View.
 Click on Analyze, then Descriptive Statistics, then
Frequencies. Second, you’re going to do the continuous data, so click on the
Reset button <at the bottom>.
Now, select “Age” and “Life Satisfaction Rating” and put them in the
variables box. Then click on Statistics and select what you think would be
necessary, click continue; then click on Charts and select Histograms and check
the box for “with normal curve”. Now click continue and then ok. You should see
output similar to that provided below.
Some interpretation questions:
1. What can we say about the gender of our sample in terms
of external validity? Hint; look at the bar chart for Gender.
2. Could it be that we have a data entry error or invalid
score for one of our participants on the Life Satisfaction Rating? Hint: reread
the description of our dependent variable, Life Satisfaction Rating, and
look closely at the histogram of that variable.
