rssunt.gif (12308 bytes)


SPSS Workshop II

 

Examples |Census, Anxiety, Anxiety 2, Anxiety3, Anxietyb, Cars | Evaluation

Objectives

This is the second of the SPSS short course series.  The series is designed for beginning users who want to get started with the program and experienced users who want to refresh the basics of the program. After this course, you should be able to:

1. Understand the file structure of SPSS.
2. Be familiar with data access via SPSS.
3. Manage data sets in SPSS.
4. Perform simple statistical analysis and graphing procedures.
5. Perform statistical analysis with the General Linear Model procedure of SPSS and be familiar with the output.

You can review the first course by following this link: Introduction to SPSS.  This course will focus on data manipulation as well as provide an introduction to syntax and data analysis in SPSS.

Data Entry and Manipulation

Data entry is the starting point for all statistical analysis in SPSS and other programs.  Getting to data into SPSS is easy if directly entering in data.  A general guideline is to stick to numbers for all data, and label categorical labels according to their number representation.  This will ensure no problems with subsequent analyses and allow for easier transfer to other statistical programs.  Getting data in from other places can be a little more tricky, particularly when merging datasets.

Concatenating and Merging SPSS Files

There are several instances in which you will probably like to merge two or more existing SPSS data files.  SPSS merges files in two ways:  (a) concatenating parallel SPSS system files and (b) merging with parallel and nonparallel matches.

1. Concatenating SPSS system files

Concatenating refers two placing two things on top of one another.  Concatenating data files basically involves adding cases to an existing data file.  In order to use this option, the two files must be identical in number of variables and variable names.  

In the following example, the Anxiety dataset from the SPSS folder is used.  A similar dataset was created for demonstration purposes, and we'll be appending the data file Anxiety.sav with the cases from Anxietyb.sav.

Concatenating files via the GUI interface is also a very straightforward procedure.   Select the "Merge Files" option found under the Data Menu and select "Add Cases".  You will then be prompted to select the file that you wish to merge.  Simply double click on that file, and the resulting dialog box will look something like this:

 

 

As can be seen in this figure, there are no unpaired variables.  You do have an option to rename variables by clicking the "Rename" button, but it actually works better if the variables in the two files are the same prior to merging.  The option to "Indicate case source as variable" will indicate which data file your cases came from.

If you happen to have more variables in the data set to be merged, they will show up in the Unpaired Variables box above.  One must move them over to include them in the merging process or SPSS will ignore them.  The old data will now have missing values on that new variable.

2. Parallel and Nonparallel Merging

If instead of simply appending a data file with more cases, you would like to add variables from another SPSS system file, or add both cases and variables another procedure to merge data files is warranted.

In this example, we are merging the files Anxiety2 from the SPSS folder and Anxiety3 which I created We are adding variables from Anxiety3 to Anxiety2.  The subjects in the two files are the same, they are merged by subject.  You can get hopelessly messed up in merging files if you do not include this option to match by a unique identifier such as ID number.  You must also have your data files sorted by the unique identifier prior to merging.

In using the GUI interface, again select "Merge Files" from the Data Menu, but in this instance, select "Add Variables".  Again, double-click on the file that you are interested in merging, and you will get a dialog box that looks like this:

 

 

Variables will be excluded if they already belong to the working data file.   Also, we are matching cases on key variables.  In this example, we are using "subject" as the key variable.  This basically means that values for variables from both data files, will be matched for each specific case.  If there are additional cases in the file to be merged, they will appear in the same place on the new file.  One should perform a healthy inspection of the new dataset to make sure things matched up as they should.

Inserting Variables and Cases into an SPSS File

Inserting variables and cases into SPSS data files is a very straightforward procedure. The "Insert Variable" option is available from the Data Menu.  Simply move your cursor to the place in the data file that you would like to insert a variable, select the variable adjacent to which you want your new variable to go by clicking once on the variable name, and select the "Insert Variable" option from the Data Menu.   Inserting cases is very similar.  Move your cursor to the point in which you want to add the case, select the row by clicking once on the row heading, and select the "Insert Case" option, also from the Data Menu.  An alternative and perhaps even easier method for either is to simply right click on a variable heading or case number and select to insert a new one from the menu provided.

Moving to Specific Variables and Cases in an SPSS File

(Census data is used in the rest of the examples until the very end)

Moving to existing variables and cases is also a simple procedure.  To move to a variable in an SPSS data file, select the "Variables" option from the Utilities Menu.  At this point a dialog box opens similar to the figure below:

 

From here, you can select the variable in which you are interested.  Click on the "Go To" option to go that that variable in the Data file.  Moving to specific cases is accomplished with the Data Menu by choosing the "Go To Case" option.

Sort and Split File

The Sort File option allows the user to sort the file by ascending or descending values.  It is found under the Data Menu.  The Sort File command is often accompanied by the Split File command that allows the user to split the active system file into subgroups that can be analyzed separately by SPSS.  For example, if a researcher wanted separate analyses for males and females, he/she could create a split file and then run the analysis he/she wanted to run.  The Split File command must be preceded by the Sort File command in the syntax.  However, these two procedures can be from a single dialog box via the GUI interface.

 

Selecting "Compare groups" will provide the separate analyses together in the same output box.  The option "Organize output by groups" will present the results separately for males and females in the Output window.  To do this, select the variable "Sex" and move it to the "Groups Based on" box by using the large arrow separating this box from the variable list.  You also have an option if the file is already sorted or if you would like to sort the file by the grouping variable, in this case, "Sex".  If you select the option to "Sort the file by grouping variable," SPSS will run the "Sort File" command in the background.

Compute and Recode

1. The Compute Command

The Compute Command creates new numeric variables or modifies the values of existing numeric or string variables for each case in the active system file.  It conducts its operations across columns.  There are a multitude of options available for the Compute Command, arithmetic operators, arithmetic functions, statistical functions, and missing value functions.  I will leave it to the reader to fully explore the functionality of this command. 

Using the GUI interface, the Compute Command is available from the Transform Menu, as can be seen in the figure below:

 

 

The Target Variable is the name of the new variable.  Existing variables can be selected from the variable list in the left hand column and moved to the scratch pad with the large arrow.  If you are interested in using Arithmetic Operators, the operations can be selected from the dialog box that resembles a calculator.  *Note:  the ~ symbol means not; therefore ~= represents "not equal to".  Arithmetic, statistical, and missing value functions are available from the Function dialog box, the box that has "ABS(numexpr)" and other functions visible.  The "If" option is if you want to create a new variable based on modifications of an existing variable.  For example, when recoding the continuous variable, "educ" for educational level into ordinal categories (i.e., some high school, high school graduate, some college, etc), the researcher can name the Target Variable "educlev" for educational level category, and use the following logic:

If educ < 12, educlev = 1

In this example, the category 1 for educlev would represent that the individual completed some high school.

2. The Recode Command

The Recode Command changes the coding scheme of an existing variable.  A common use for the Recode Command is for dealing with reverse-scored questionnaire items.   For example, if a Likert scale item is coded from 1 to 5, with 1 being "Strongly Agree" and 5 being "Strongly Disagree", and the item is reverse scored, for data analysis purposes, you would want to recode the item, with 5 recoded to 1, 4 to 2, 2 to 4, and 1 to 5.  Other common uses of the Recode Command are creating categorical variables from continuous variables and recoding System Missing values.  The Recode Command can also be used to recode character string values to numeric values.  With the Recode Command, you have the option of whether you would like to overwrite the existing variable, or if you would like to create a new recoded variable (the latter will prevent possible data loss)

 

.

 

A handy recent addition to recoding in SPSS is the Automatic Recode option in the transform menu.  This will transform a string variable to numeric while automatically labeling the new numeric variable with the values from the old string variable.

Selecting Specific Cases

There are many times that you will want to carry out operations on certain cases in your data and not others.  In order to do this, you will use what is known as a conditional if statement.  With the Select If Command, you can select specific cases for analysis based upon logical conditions found in the data (e.g., select if gender=1).   There are three ways to go about doing this:

1.  Creating a filter variable

Select Cases is an option located under the Data Menu.  When you select this option, the resulting dialog box will resemble the figure below:

 

 

To select specific cases, you would select the radio button adjacent to "If condition is satisfied" and then select the If button.  You will get a dialog box identical to what you have seen before with the Compute Command, and when selecting groups of cases to recode from the Recode Command:

 

 

Variables can be selected from the variable list to the left of the dialog box and moved over with the large arrow, or they can be directly typed into the large blank scratch pad.  If, for example, you would like to conduct a separate analysis for subjects younger than 50, you could type "age < 50" in the scratch pad and select "Continue".  This will bring you back to the main menu.  Now direct you attention toward the bottom of the dialog box to the section that has the heading of "Unselected Cases Are".  The default option for selecting cases is to create what is known as a filter variable.  The filter variable is a dichotomous variable, composed of 0s and 1s.  Cases that you have selected to work with will receive a value of 1 and will be the only cases used in the following analyses.   The cases that you have selected out will have values of zero and will not be employed in subsequent analyses.  In addition, the box adjacent to where the data values begin for a case will be marked through for each unselected case to indicate that they will not be used in the subsequent analyses.  As can be seen in the figure below, cases with the ages of 1, 4, and 7 will not be used in subsequent analyses.

 

 

One may create their own filter variable for smaller datasets, and so use that new variable in lieu of creating one via selection by some other method as described previously.  Also there are some instances in which you will want to select a certain subset of cases to create a completely separate data file.  This is useful if you have a subset of cases that you will be performing many analyses on.  In these cases you can select the cases that you want, and permanently delete all other cases. 

 

Syntax

It is important to become familiar with SPSS syntax because the SPSS syntax window offers the convenience of a text editor, which can be more time efficient than the GUI interface.  One can run and rerun analyses while making slight changes by simply selecting the appropriate text and running it, rather than clicking three or four times just to get to the appropriate dialog box.  Secondly, some of the statistical procedures available in SPSS are not available with the point-and-click approach (e.g. canonical correlation).  A final and very practical reason to know some syntax regards communication.  It is much easier for example to send syntax in an email than explain all the menus and clicking one would have to do to perform complex or multiple analyses.  The person on the receiving end, assuming they have a copy of the data, can simply run the syntax file or copy and paste the syntax and get the exact same output.

There is not enough time in these short courses for an extended discussion of SPSS syntax.  However we can get you started and provide a couple useful bits of code you can use right away that may make things easier during your analyses.

1.  Paste

Perhaps the easiest way to start getting into syntax for those more 'menu-inclined' is with the "Paste" button.  You'll find this button in any box brought up in the menu system.

 

 

By selecting Paste instead of OK, SPSS will create a syntax file with the exact code that would be used to provide the output desired.

 

 

At this point one can highlight the text and click the right triangle to get their output.  To learn more about the syntax and perhaps about options for analyses that are not available in the menu system, one may peruse SPSS's Command Syntax Reference from the help menu.  Almost 1900 pages are available to sift through to help you find what you need (not always easy).  Again though, learning some of the nuances of the syntax can be made easier with this approach.

2.  Using the Temporary Command

As we just finished going over selecting cases we'll discuss the temporary command as starting point for syntax.  There is a problem with creating the filter variable, and that is that you (or maybe I should say I) will often forget that the filter variable is on when you want to go back to analyzing the entire data set.  In addition, although selecting cases via creating a filter variable is easy to do using the GUI interface, as can be seen above, if you are directly generating the syntax, the syntax is rather unwieldy.  This is where the Temporary Command can come into play.  Temporary lets SPSS know that the Select If statement should only apply to the procedure immediately following.  Once it has run this procedure, SPSS goes back to using all cases.  This option is not available with the point-and-click approach; it must be coded using SPSS syntax.  The syntax for selecting subjects less than 50, using the Temporary Command is demonstrated below:

3.  Using the Do If, Else If, and End If Statements

The Do If-End If structure extends the Select If Command by allowing the user to execute one or more transformations on the same subset of cases based on logical expressions defined by the user.  Basically, Do If statements are similar in nature to the Select If Command; the Do If statement tells SPSS that the following procedures are to be done on the subset of cases included in the statement.  For example, the syntax below indicates that the following compute statement should be applied only for occupational categories (i.e., occcat80) 5, 6, and 7.  The Do If statement is then closed off by an End If statement.  This tells SPSS to go back to analyzing all cases.  The way in which the Do If-End If structure differs from using the Select If Command is that the Do If-End If structure is best used for executing multiple transformations, whereas select if is more useful when used with a single conditional statement.  The Do If-End If structure also differs in that it can be extended by the Else If Command.  For example, in the syntax below, the routine is begun with a Do If statement, for which a new variable is computed and given the value zero for occupational categories 5, 6, and 7.  However, other requirements are necessary for occupational categories 1 through 4.  These requirements can be executed by building Else If statements into the Do If-End If structure.  This involves multiplying the salary of the subject by different values (e.g., the salary for subjects in occupational category 1 is multiplied by .1).  There is no GUI counterpart for the Do If-End If structure; this is only accessed by coding SPSS syntax.


 

4. The Do Repeat Command

The Do Repeat-End Repeat structure is very similar to the Do If-End If structure.   The difference is that the Do Repeat-End Repeat structure conducts the same transformations on a set of variables.  This can greatly reduce the number of commands that you must enter to accomplish the task, as well as the time that it will take you to perform the operations.  For example, if a large set of variables needs to be recoded in the same way, instead of using numerous recode commands, a single recode command can be nested within the Do Repeat-End Repeat structure.  The Do Repeat-End Repeat structure uses a stand-in variable to represent a list of variables.  In subsequent transformations, the name of the stand in variable is used.  The following syntax is an example using Do Repeat-End Repeat structure:


 


In this example, the variables hlth1 through work9 in the data file will be recoded so that values of one will be changed to values of 2 and vice versa.  The stand-in variable for this command is simply denoted as "X".

 

Analysis & Graphical Display

Analysis in SPSS can take several forms, and as this is not designed to be a statistics course, demonstration will be limited to providing a feel for the Analyze and Graph menu.

Descriptive Statistics

A starting point for any data analysis entails becoming familiar with the dataset itself.  Exploratory Analysis, Initial Data Analysis etc. all refer to the required portion of the research process in which we come to understand the variables involved.  One can do this with the SPSS 'Descriptive Statistics' menu.

 

 

As one can see, we have a few options.  Frequencies are commonly selected, particularly for categorical data.  The 'Descriptives' menu offers nothing that can't be found in the 'Explore' menu, and so is useless outside of a less cluttered display and creating standardized variables from those of interest.  Plus Explore allows for a breakdown according to a factor variable without needing to use the split file option in the data menu.  Example:

 

       

 

Clicking on the 'Statistics' brings up the second window above where one can choose to add to the output.  SPSS provides trimmed means and M-estimators but considering that robust procedures that would actually use such values are nonexistent in this package, one wonders why they bothered.  The 'outliers' option is not too informative in many cases as it just shows the 5 highest and lowest values, of which none may qualify for being a true outlier.  The 'Plots' button brings up the third image, where one can implement tests of normality and obtain graphical displays of the distribution of the data.  One may right-click on the words or click the help button for a further understanding of the options available.  Note also that descriptive output may be available as an option in the inferential analyses one performs using other menus, but may not provide the desired detail.

Inferential Statistics

Many analyses are available in the Analyze menu from simple correlations to multivariate design and more that available through syntax.  However, one should not let SPSS's options dictate what analyses are performed.  SPSS does not provide much in the way of statistical analysis post-1975 and so other packages may be necessary to accomplish one's tasks with more statistical power.  But what SPSS may lack in more modern analysis it makes up for in performing the analyses available with relative ease. 

For example, with the Cars data one might suspect there is a relationship between the cars' weights and their mile per gallon gasoline consumption.  To run the linear regression is rather easy, simply Analyze/Regression and select weight as the independent variable and mpg as the dependent variable.  Here is the output:

 
 

       

 

Graphical Display of Data

We would also like to take a look at the relationship graphically.  SPSS has typically been fairly weak with regard to graphical display though substantial improvement was seen with v. 12.0.  The Graphs menu is easy to use and allows gives a few options for more simple display of graphics.  However one doesn't have a whole lot of control of the output nor can one manipulate it very easily.  However, an example is given below of a simple scatterplot from the above examination.  A couple of things stick out.  One is that there appears to be a curvilinear relationship rather than strictly linear one, and secondly, one case appears to be an extreme data point (lower left) that will require some action.  In this case, the data point is a miskey of some kind (4 cu inch engine?) and has missing data on several of the other variables and so we would not want to include it in the analysis. 

 

 

So there you have it for the second part of the SPSS short course series.  One should by this point be able to dive in and not feel too overwhelmed, though it will still take a bit to get used to so be patient.  Always feel free to set up an appointment with the folks at Research and Statistical Support when you get stuck, and we'll be happy to help you out.