![]() |
|
![]() |
|
![]() |
![]() |
|
| |
||||||
This page is currently under construction!
Second SPSS short course page
This page is designed for beginning users who want to get started with the program and perhaps more experienced users who might want to refresh a couple of the basics. The focus is:
You can review the first course by following this link: Introduction to SPSS. This course will focus on data manipulation as well as provide an introduction to syntax and data analysis in SPSS.
Datasets: Census | Anxiety | Anxiety 2 | Anxiety3 | Anxietyb | Cars
Data entry is the starting point for all statistical analysis in SPSS and other programs. Getting to data into SPSS is easy if directly entering in data. A general guideline is to stick to numbers for all data, and label categorical labels according to their number representation. This will ensure no problems with subsequent analyses and allow for easier transfer to other statistical programs. Getting data in from other places can be a little more tricky, particularly when merging datasets. See the previous course page regarding info about getting different file types into SPSS.
There are several instances in which you will probably like to merge two or more existing SPSS data files. SPSS merges files in by adding cases or variables and in the case of the latter, one can do so in parallel or nonparallel form. In order to save yourself tremendous headache, make the files to be merged look exactly alike.
Adding Cases
To begin with we'll add cases to an existing data file. In the following example, the Anxiety dataset from the SPSS folder is used (linked above also). A similar dataset was created for demonstration purposes, and we'll be appending the data file Anxiety.sav with the cases from Anxietyb.sav.
Concatenating files via the GUI interface is a very straightforward procedure. Select the "Merge Files" option found under the Data Menu and select "Add Cases". You will then be prompted to select the file that you wish to merge. Simply double click on that file, and the resulting dialog box will look something like this:

As can be seen in this figure, there are no unpaired variables. You do have an option to rename variables by clicking the "Rename" button, but it actually works better if the variables in the two files are the same prior to merging. The option to "Indicate case source as variable" will indicate which data file your cases came from.
If you happen to have more variables in the data set to be merged, they will show up in the Unpaired Variables box above. One must move them over to include them in the merging process or SPSS will ignore them. The old data will now have missing values on that new variable.
Adding variables or variables and cases
If instead of simply appending a data file with more cases, you would like to add variables from another SPSS system file, or add both cases and variables another procedure to merge data files is warranted.
In this example, we are merging the files Anxiety2 from the SPSS folder and Anxiety3 which I created, and is simply trials 5 and 6 to be added to the other dataset that contains the first four trials. The subjects in the two files are the same, and they will be merged by subject number. You can get in quite a bind in merging files if you do not include this option to match by a unique identifier, such as we are with the ID number in this case. You must also have your data files sorted by the unique identifier prior to merging.
In using the GUI interface, again select "Merge Files" from the Data Menu, but in this instance, select "Add Variables". Again, double-click on the file that you are interested in merging, and you will get a dialog box that looks like this:

Variables will be excluded if they already belong to the working data file. Also, we are matching cases on key variables. In this example, we are using "subject" as the key variable. This basically means that values for variables from both data files, will be matched for each specific case. If there are additional cases in the file to be merged, they will appear in the same place on the new file. One should perform a healthy inspection of the new dataset to make sure things matched up as they should.
Adding variables and cases at the same time results in a new dataset with lots of gaps, and missing data is a serious problem for any analysis. Ignoring missing cases leads to biased estimates and model misspecification from leaving out pertinent variables leads to erroneous conclusions regarding any particular analysis that only uses the partial data. I would suggest you think very hard about why you want to create an incomplete dataset.
Inserting Variables and Cases into an SPSS File
Inserting variables and cases into SPSS data files is a very straightforward procedure. The "Insert Variable" option is available from the Data Menu, but it's easier to just go to the place in the dataset you want add the case or variable. Simply right click on a variable heading or case number and select to insert a new one from the menu that results.
Moving to Specific Variables and Cases in an SPSS File
Moving to existing variables and cases is also a simple procedure. To move to a variable or case in an SPSS data file, select either option from the edit menu (they both bring both up the same dialog box), and just select where you want to go. This is typically only needed for very large datasets as scrolling can generally get you where you need to quickly enough. Also, if SPSS minimizes your dataset when selecting the option from the edit menu, it's not something you did wrong, but one of the many bugs associated with v. 16.

(Census data is used in the rest of the examples until the very end)
The Sort File option allows the user to sort the file by ascending or descending values. It is found under the Data Menu. The Sort File command is often accompanied by the Split File command that allows the user to split the active system file into subgroups that can be analyzed separately by SPSS. For example, if a researcher wanted separate analyses for males and females, he/she could create a split file and then run the analysis he/she wanted to run. The Split File command must be preceded by the Sort File command in the syntax. However, these two procedures can be from a single dialog box via the GUI interface.

Selecting "Compare groups" will provide the separate analyses together in the same output box. The option "Organize output by groups" will present the results separately for males and females in the Output window. Personally I find the former preferable visually. To do this, select the variable "Sex" and move it to the "Groups Based on" box by using the large arrow separating this box from the variable list. You also have an option if the file is already sorted or if you would like to sort the file by the grouping variable, in this case, "Sex". If you select the option to "Sort the file by grouping variable," SPSS will run the "Sort File" command in the background.
There are many times that you will want to carry out operations on certain cases in your data and not others. In order to do this, you will use what is known as a conditional if statement. With the Select If Command, you can select specific cases for analysis based upon logical conditions found in the data (e.g., select if gender=1). There are three ways to go about doing this:
1. Creating a filter variable
Select Cases is an option located under the Data Menu. When you select this option, the resulting dialog box will resemble the figure below:

To select specific cases, you would select the radio button adjacent to "If condition is satisfied" and then select the If button. You will get a dialog box similar to what you will see later with the Compute function and when selecting groups of cases to recode from the Recode Command:

Variables can be selected from the variable list to the left of the dialog box and moved over with the large arrow, or they can be directly typed into the large blank scratch pad. If, for example, you would like to conduct a separate analysis for subjects younger than 501, you could type "age < 50" in the scratch pad and select "Continue". This will bring you back to the main menu. Now direct your attention toward the bottom of the dialog box to the section that has the heading of "Unselected Cases Are". The default option for selecting cases is to create what is known as a filter variable. The filter variable is a dichotomous variable, composed of 0s and 1s. Cases that you have selected to work with will receive a value of 1 and will be the only cases used in the following analyses. The cases that you have selected out will have values of zero and will not be employed in subsequent analyses. In addition, the box adjacent to where the data values begin for a case will be marked through for each unselected case to indicate that they will not be used in the subsequent analyses. As can be seen in the figure below, cases with the ages of 1, 4, and 7 will not be used in subsequent analyses.

One may create their own filter variable for smaller datasets, and so use that new variable in lieu of creating one via selection by some other method as described previously. Also there are some instances in which you will want to select a certain subset of cases to create a completely separate data file. This is useful if you have a subset of cases that you will be performing many analyses on. In these cases you can select the cases that you want, and permanently delete all other cases.
Creating new variables from existing data: Compute and Recode
The Compute Function
The Compute Command creates new numeric variables or modifies the values of existing numeric or string variables for each case in the active system file. It conducts its operations across columns. There are a multitude of options available for the Compute Command, arithmetic operators, arithmetic functions, statistical functions, and missing value functions. I will leave it to the reader to fully explore the functionality of this command.
Using the GUI interface, the Compute Command is available from the Transform Menu, as can be seen in the figure below:

The Target Variable is the name of the new variable. Existing variables can be selected from the variable list in the left hand column and moved to the scratch pad with the large arrow or via drag and drop. If you are interested in using Arithmetic Operators, the operations can be selected from the dialog box that resembles a calculator. Note: the ~ symbol means not; therefore ~= represents "not equal to". Arithmetic, statistical, and missing value functions are available from the Function dialog box, the box that has "ABS(numexpr)" and other functions visible. The "If" option is if you want to create a new variable based on modifications of an existing variable. For example, when recoding the continuous variable, "educ" for educational level into ordinal categories (i.e., some high school, high school graduate, some college, etc), the researcher can name the Target Variable "educlev" for educational level category, and use the following logic:
If educ < 12, educlev = 1
In this example, the category 1 for educlev would represent that the individual completed some high school. Just as an aside, in general I find using compute much easier to do via syntax.
The Recode Function
The Recode Command changes the coding scheme of an existing variable. A common use for the Recode Command is for dealing with reverse-scored questionnaire items. For example, if a Likert scale item is coded from 1 to 5, with 1 being "Strongly Agree" and 5 being "Strongly Disagree", and the item is reverse scored, for data analysis purposes, you would want to recode the item, with 5 recoded to 1, 4 to 2, 2 to 4, and 1 to 5. Other common uses of the Recode Command are creating categorical variables from continuous variables (as mentioned in the previous footnote, rarely if ever recomended) and recoding System Missing values. The Recode Command can also be used to recode character string values to numeric values. With the Recode Command, you have the option of whether you would like to overwrite the existing variable, or if you would like to create a new recoded variable (the latter will prevent possible data loss)
.
A handy recent addition to recoding in SPSS is the Automatic Recode option in the transform menu. This will transform a string variable to numeric while automatically labeling the new numeric variable with the values from the old string variable.
It is important to become familiar with SPSS syntax because programming can and usually is much more time efficient than the GUI interface. One can run and rerun analyses while making slight changes by simply selecting the appropriate text and running it, rather than clicking three or four times just to get to the appropriate dialog box. Secondly, there are many statistical procedures available in SPSS are not available with the point-and-click approach (e.g. canonical correlation). A final and very practical reason to know some syntax regards communication. It is much easier for example to send syntax in an email than explain all the menus and clicking one would have to do to perform complex or multiple analyses. The person on the receiving end, assuming they have a copy of the data, can simply run the syntax file or copy and paste the syntax and get the exact same output, especially important given that SPSS is typically not backwards compatible regarding output files.
There is not enough time in these short courses for an extended discussion of SPSS syntax. However we can give a couple examples to get you started and provide a couple useful bits of code you can use right away that may make things easier during your analyses.
Perhaps the easiest way to start getting into syntax for those more 'menu-inclined' is with the "Paste" button. You'll find this button in any box brought up in the menu system.

By selecting Paste instead of OK, SPSS will create a syntax file with the exact code that would be used to provide the output desired.

At this point one can highlight the text and click the right triangle to get their output. To learn more about the syntax and perhaps about options for analyses that are not available in the menu system, one may peruse SPSS's Command Syntax Reference from the help menu. Over 2000 pages are available to sift through to help you find what you need (not always easy admittedly). Again though, learning some of the nuances of the syntax can be made easier with this approach.
As we went over selecting cases eariler we'll discuss the temporary command as starting point for syntax. There is a problem with creating the filter variable, and that is that you (or maybe I should say I) will often forget that the filter variable is on when you want to go back to analyzing the entire data set. In addition, although selecting cases via creating a filter variable is easy to do using the GUI interface, as can be seen above, if you are directly generating the syntax, the syntax is rather unwieldy. This is where the Temporary Command can come into play. Temporary lets SPSS know that the Select If statement should only apply to the procedure immediately following. Once it has run this procedure, SPSS goes back to using all cases. This option is not available with the point-and-click approach; it must be coded using SPSS syntax. The syntax for selecting subjects less than 50, using the Temporary Command is demonstrated below:

Using the Do If, Else If, and End If Statements
The Do If-End If structure extends the Select If Command by allowing the user to execute one or more transformations on the same subset of cases based on logical expressions defined by the user. Basically, Do If statements are similar in nature to the Select If Command; the Do If statement tells SPSS that the following procedures are to be done on the subset of cases included in the statement. For example, the syntax below indicates that the following compute statement should be applied only for occupational categories (i.e., occcat80) 5, 6, and 7. The Do If statement is then closed off by an End If statement. This tells SPSS to go back to analyzing all cases. The way in which the Do If-End If structure differs from using the Select If Command is that the Do If-End If structure is best used for executing multiple transformations, whereas select if is more useful when used with a single conditional statement. The Do If-End If structure also differs in that it can be extended by the Else If Command. For example, in the syntax below, the routine is begun with a Do If statement, for which a new variable is computed and given the value zero for occupational categories 5, 6, and 7. However, other requirements are necessary for occupational categories 1 through 4. These requirements can be executed by building Else If statements into the Do If-End If structure. This involves multiplying the salary of the subject by different values (e.g., the salary for subjects in occupational category 1 is multiplied by .1). There is no GUI counterpart for the Do If-End If structure; this is only accessed by coding SPSS syntax.

The Do Repeat-End Repeat structure is very similar to the Do If-End If structure. The difference is that the Do Repeat-End Repeat structure conducts the same transformations on a set of variables. This can greatly reduce the number of commands that you must enter to accomplish the task, as well as the time that it will take you to perform the operations. For example, if a large set of variables needs to be recoded in the same way, instead of using numerous recode commands, a single recode command can be nested within the Do Repeat-End Repeat structure. The Do Repeat-End Repeat structure uses a stand-in variable to represent a list of variables. In subsequent transformations, the name of the stand in variable is used. The following syntax is an example using Do Repeat-End Repeat structure:

In this example, the variables hlth1 through work9 in the data file will be recoded so
that values of one will be changed to values of 2 and vice versa. The stand-in
variable for this command is simply denoted as "X".
While one can do quite a bit with the syntax, the fact is that SPSS as a program lags behind many others in its offerings for academic researchers, and when it can't do something already, people will use macros and scripts to get the job done. Macros use SPSS syntax to create a function to do a very specific job. For example, since the t-test syntax for independent samples above allows for only testing against a largely uninteresting zero difference value, I could maybe write a macro that would perhaps allow me to test whether the difference I see is greater than or less than some more meaningful value based on findings from previous research. I could then use that same macro and just change the data involved. Unfortunately SPSS isn't much of a programming language either, and so even doing some simple things can easily become unwieldy using its syntax. And while you can find plenty of macros on the web, e.g. www.spsstools.net, SPSS itself only provides a couple 'certified' macros and the others you find are not adequately tested nor come with much in the way of a helpfile.
Scripting invovles SPSS using other programming languages (Visual Basic, Python, R) to do these sorts of things. My opinion is that if you can use those, you need to move to a different statisitcal program with a better programming language. For example, it is a bit mind boggling that one would run a script from the R statistical program in SPSS, rather than just stay in the R environment and have everything you'd need there.
Analysis in SPSS can take several forms, and as this is not designed to be a statistics course, demonstration will be limited to providing a feel for the Analyze and Graph menu.
Descriptive Statistics
A starting point for any data analysis entails becoming familiar with the dataset itself. Exploratory Analysis, Initial Data Analysis etc. all refer to the required portion of the research process in which we come to understand the variables involved. One can do this with the SPSS 'Descriptive Statistics' menu.

As one can see, we have a few options. Frequencies are commonly selected, particularly for categorical data. The 'Descriptives' menu offers nothing that can't be found in the 'Explore' menu, and so is useless outside of a less cluttered display and creating standardized variables from those of interest. Plus Explore allows for a breakdown according to a factor variable without needing to use the split file option in the data menu. Example:

Clicking on the 'Statistics' brings up the second window above where one can choose to add to the output. SPSS provides trimmed means and M-estimators but considering that robust procedures that would actually use such values are nonexistent in this package, one wonders why they bothered. The 'outliers' option is not too informative in many cases as it just shows the 5 highest and lowest values, of which none may qualify for being a true outlier. The 'Plots' button brings up the third image, where one can implement tests of normality and obtain graphical displays of the distribution of the data. One may right-click on the words or click the help button for a further understanding of the options available. Note also that descriptive output may be available as an option in the inferential analyses one performs using other menus, but may not provide the desired detail.
Inferential Statistics
Many analyses are available in the Analyze menu from simple correlations to multivariate design and more that are available through syntax. However, one should not let SPSS's options dictate what analyses are performed. SPSS does not provide much in the way of statistical analysis post-1975 and so other packages may be necessary to accomplish one's tasks with more statistical power. But what SPSS may lack in more modern analysis it makes up for in performing the most rudimentary forms of analyses available with relative ease (if you prefer clicking 15 things versus typing one line of code in some packages; I personally find the latter much easier).
For example, with the Cars data one might suspect there is a relationship between the cars' weights and their mile per gallon gasoline consumption. To run the linear regression is rather easy, simply Analyze/Regression and select weight as the independent variable and mpg as the dependent variable. Here is the output:

Graphical Display of Data
We would also like to take a look at the relationship graphically. SPSS has always been fairly weak with regard to graphical display relative to other programs, with some of the graphs bordering on laughable (e.g. the default 3d scatterplots). The Graphs menu is easy to use until you want to tweak and tailor the graph to your own liking, in which case you're likely in for a headache (and bugs). One doesn't have a whole lot of control of the initial output nor can one manipulate it very easily. However, an example is given below of a simple scatterplot from the above examination. A couple of things stick out. One is that there appears to be a curvilinear relationship rather than strictly linear one (there are actually subgroups in this data with linear relationships of varying degrees), and secondly, one case appears to be an extreme data point (lower left) that will require some action. In this case, the data point is a miskey of some kind (4 cu inch engine?) and has missing data on several of the other variables and so we would not want to include it in the analysis.

WRAP-UP
So there you have it for the second part of the SPSS short course series. One should by this point be able to dive in and not feel too overwhelmed, though it will still take a bit to get used to so be patient. Always feel free to set up an appointment with the folks at Research and Statistical Support when you get stuck, and we'll be happy to help you out.
Footnotes
1. I would rarely if ever recommend dichotomizing any continuous variable, this is for demonstration only.
Back to the SPSS Intructional Page
| RSS Main Page
| Computer
Center Home
| Academic Computing
Services | Help
Desk | Training
About Us | Publications | Our Mission Questions, comments and corrections for this site:Rich Herrington, Patrick McLeod, Mike Clark Last updated: |