Research and Statistical Support

MODULE 2

Some basic IDA graphing

IDA? Initial Data Analysis (IDA) should be performed on every data set you access or collect. The rationale for IDA is to determine if your data represent what it should represent and to determine if there are errors in the data.

There are three graphical representations we typically use. Frequency Tables for all variables which display the frequency and percentages for each value. Bar graphs for categorical variables, which allows us to quickly see the distribution of cases in each category. And, Histograms for continuous or nearly continuous variables; which allow us to observe the distribution of scores on a continuous variable.

There are generally speaking two methods of displaying data outside text when preparing a manuscript or research report. Tables, which are text based and therefore can be set to typeface; and Figures which are graphical displays (e.g. histograms, pie-charts, topographical maps, wiring schematics, etc.). Both forms of display can be used during IDA to discover data entry errors, describe the sample characteristics, and determine if your data fits the assumptions of a particular analysis.

Data Entry Errors. When conducting IDA, you can evaluate the data looking at the frequency tables for missing data and/or values that do not correspond to the values expected for a variable. For example; if you have Gender/Sex coded as 1 = female and 2 = male; but, you notice in the frequency table that case #5 shows a value of 12; that would be inconsistent with the coding strategy and known genders/sex for most species.

Describing the Sample. When writing a research report or simply assessing the external validity of your study, you must evaluate the sample (i.e. individual characteristics). First, you may be concerned with the external validity of the sample (how representative it may or may not be of the population you are studying). Second, you will likely want to communicate a description of your sample when writing up the study (to allow others to replicate your findings). As an example, consider that if only 10% of our sample was male, your results are only really applicable to females. Using the frequencies function in SPSS, you can easily produce graphical representations for a given variable or multiple variables.

Example of Simple Graphing with the Frequencies function as would be done during IDA.

The mock study we will use today concerns the effectiveness of two types of therapy for depression on increasing Life Satisfaction Rating. The independent variable was Type of Therapy, with two conditions; either Cognitive Behavioral Therapy (CBT) or Electro-Convulsive Therapy (ECT). The dependent variable was Life Satisfaction Rating (a series of 10 questions that were totaled to yield a score between 10 and 50 for each of the 16 participants).

The Example Data can be found here.

Getting the Frequencies and Tables/Figures:

1. Click on Analyze, then Descriptive Statistics, then Frequencies. First, you are going to do the categorical data, so highlight/select “Gender/Sex” and “Type of Therapy”, and put them in the variables box. (Make sure “Display Frequency Tables” is checked). Then, click on Charts, and select Bar Charts (because these are categorical variables). Now click continue and then click Ok. You should get output similar to that displayed below.

Now flip back to Data View.

1. Click on Analyze, then Descriptive Statistics, then Frequencies. Second, you’re going to do the continuous data, so click on the Reset button <at the bottom>.

Now, select “Age” and “Life Satisfaction Rating” and put them in the variables box. Then click on Statistics and select what you think would be necessary, click continue; then click on Charts and select Histograms and check the box for “with normal curve”. Now click continue and then ok. You should see output similar to that provided below.

Some interpretation questions:

1. What can we say about the gender of our sample in terms of external validity? Hint; look at the bar chart for Gender.

2. Could it be that we have a data entry error or invalid score for one of our participants on the Life Satisfaction Rating? Hint: re-read the description of our dependent variable, Life Satisfaction Rating, and look closely at the histogram of that variable.