SPSS syntax

Entering Data

*Compared to SAS the two are very similar. (A15) to tell SPSS that make is a string variable (with a length of 15)
DATA LIST LIST
/ make (A15) mpg weight price .
BEGIN DATA.
"AMC Concord",22,2930,4099
"AMC Pacer",17,3350,4749
"AMC Spirit",22,2640,3799
"Buick Century",20, 3250,4816
"Buick Electra",15,4080,7827
END DATA.

LIST.

**Another example of creating a dataset from the syntax. Not recommended if you have a good spreadsheet.

DATA LIST FIXED/
make (A17) price 19-23 mpg 25-26 rep78 28 hdroom 30-32
trunk 34-35 weight 37-40 length 42-44 turn 46-47
displ 49-51 gratio 53-56 foreign 58 .
BEGIN DATA.
AMC Concord 4099 3 2.5 11 2930 186 40 121 3.58 0
AMC Pacer 4749 3 3.0 11 3350 173 40 258 2.53 0
AMC Spirit 3799 3.0 12 2640 168 35 121 3.08 0
Audi 5000 9690 17 5 3.0 15 2830 189 37 131 3.20 1
Audi Fox 6295 23 3 2.5 11 2070 174 36 97 3.70 1
BMW 320i 9735 25 4 2.5 12 2650 177 34 121 3.64 1
Buick Century 4816 20 3 4.5 16 3250 196 40 196 2.93 0
Buick Electra 7827 15 4 4.0 20 4080 222 43 350 2.41 0
Buick LeSabre 5788 18 3 4.0 21 3670 218 43 231 2.73 0
Buick Opel 4453 26 3.0 10 2230 170 34 304 2.87 0
Buick Regal 5189 20 3 2.0 16 3280 200 42 196 2.93 0
Buick Riviera 10372 16 3 3.5 17 3880 207 43 231 2.93 0
Buick Skylark 4082 19 3 3.5 13 3400 200 42 231 3.08 0
Cad. Deville 11385 14 3 4.0 20 4330 221 44 425 2.28 0
Cad. Eldorado 14500 14 2 3.5 16 3900 204 43 350 2.19 0
Cad. Seville 15906 21 3 3.0 13 4290 204 45 350 2.24 0
Chev. Chevette 3299 29 3 2.5 9 2110 163 34 231 2.93 0
Chev. Impala 5705 16 4 4.0 20 3690 212 43 250 2.56 0
Chev. Malibu 4504 22 3 3.5 17 3180 193 31 200 2.73 0
Chev. Monte Carlo 5104 22 2 2.0 16 3220 200 41 200 2.73 0
Chev. Monza 3667 24 2 2.0 7 2750 179 40 151 2.73 0
Chev. Nova 3955 19 3 3.5 13 3430 197 43 250 2.56 0
Datsun 200 6229 23 4 1.5 6 2370 170 35 119 3.89 1
Datsun 210 4589 35 5 2.0 8 2020 165 32 85 3.70 1
Datsun 510 5079 24 4 2.5 8 2280 170 34 119 3.54 1
Datsun 810 8129 21 4 2.5 8 2750 184 38 146 3.55 1
Dodge Colt 3984 30 5 2.0 8 2120 163 35 98 3.54 0
Dodge Diplomat 4010 18 2 4.0 17 3600 206 46 318 2.47 0
Dodge Magnum 5886 16 2 4.0 17 3600 206 46 318 2.47 0
Dodge St. Regis 6342 17 2 4.5 21 3740 220 46 225 2.94 0
Fiat Strada 4296 21 3 2.5 16 2130 161 36 105 3.37 1
Ford Fiesta 4389 28 4 1.5 9 1800 147 33 98 3.15 0
Ford Mustang 4187 21 3 2.0 10 2650 179 43 140 3.08 0
Honda Accord 5799 25 5 3.0 10 2240 172 36 107 3.05 1
Honda Civic 4499 28 4 2.5 5 1760 149 34 91 3.30 1
Linc. Continental 11497 12 3 3.5 22 4840 233 51 400 2.47 0
Linc. Mark V 13594 12 3 2.5 18 4720 230 48 400 2.47 0
Linc. Versailles 13466 14 3 3.5 15 3830 201 41 302 2.47 0
Mazda GLC 3995 30 4 3.5 11 1980 154 33 86 3.73 1
Merc. Bobcat 3829 22 4 3.0 9 2580 169 39 140 2.73 0
Merc. Cougar 5379 14 4 3.5 16 4060 221 48 302 2.75 0
Merc. Marquis 6165 15 3 3.5 23 3720 212 44 302 2.26 0
Merc. Monarch 4516 18 3 3.0 15 3370 198 41 250 2.43 0
Merc. XR-7 6303 14 4 3.0 16 4130 217 45 302 2.75 0
Merc. Zephyr 3291 20 3 3.5 17 2830 195 43 140 3.08 0
Olds 98 8814 21 4 4.0 20 4060 220 43 350 2.41 0
Olds Cutl Supr 5172 19 3 2.0 16 3310 198 42 231 2.93 0
Olds Cutlass 4733 19 3 4.5 16 3300 198 42 231 2.93 0
Olds Delta 88 4890 18 4 4.0 20 3690 218 42 231 2.73 0
Olds Omega 4181 19 3 4.5 14 3370 200 43 231 3.08 0
Olds Starfire 4195 24 1 2.0 10 2730 180 40 151 2.73 0
Olds Toronado 10371 16 3 3.5 17 4030 206 43 350 2.41 0
Peugeot 604 12990 14 3.5 14 3420 192 38 163 3.58 1
Plym. Arrow 4647 28 3 2.0 11 3260 170 37 156 3.05 0
Plym. Champ 4425 34 5 2.5 11 1800 157 37 86 2.97 0
Plym. Horizon 4482 25 3 4.0 17 2200 165 36 105 3.37 0
Plym. Sapporo 6486 26 1.5 8 2520 182 38 119 3.54 0
Plym. Volare 4060 18 2 5.0 16 3330 201 44 225 3.23 0
Pont. Catalina 5798 18 4 4.0 20 3700 214 42 231 2.73 0
Pont. Firebird 4934 18 1 1.5 7 3470 198 42 231 3.08 0
Pont. Grand Prix 5222 19 3 2.0 16 3210 201 45 231 2.93 0
Pont. Le Mans 4723 19 3 3.5 17 3200 199 40 231 2.93 0
Pont. Phoenix 4424 19 3.5 13 3420 203 43 231 3.08 0
Pont. Sunbird 4172 24 2 2.0 7 2690 179 41 151 2.73 0
Renault Le Car 3895 26 3 3.0 10 1830 142 34 79 3.72 1
Subaru 3798 35 5 2.5 11 2050 164 36 97 3.81 1
Toyota Celica 5899 18 5 2.5 14 2410 174 36 134 3.06 1
Toyota Corolla 3748 31 5 3.0 9 2200 165 35 97 3.21 1
Toyota Corona 5719 18 5 2.0 11 2670 175 36 134 3.05 1
Volvo 260 11995 17 5 2.5 14 3170 193 37 163 2.98 1
VW Dasher 7140 23 4 2.5 12 2160 172 36 97 3.74 1
VW Diesel 5397 41 5 3.0 15 2040 155 35 90 3.78 1
VW Rabbit 4697 25 4 3.0 15 1930 155 35 89 3.78 1
VW Scirocco 6850 25 4 2.0 16 1990 156 36 97 3.78 1
END DATA.
FORMATS hdroom (F3.1) gratio (F4.2) .


*Space delimited data is pretty much the same

DATA LIST LIST
/ make (A15) mpg weight price .
BEGIN DATA.
"AMC Concord" 22 2930 4099
"AMC Pacer" 17 3350 4749
"AMC Spirit" 22 2640 3799
"Buick Century" 20 3250 4816
"Buick Electra" 15 4080 7827
END DATA.

LIST.

*Just save the data from the file menu.


Importing Data

*SPSS pretty much always gets data import in terms of the data types (nominal, ordinal, scale) wrong unless you specifically tell it what the variable types are via syntax.
For example, the SAS import below, which should work as is, creates an empty data set because SPSS thinks they are all
nominal variables, when in fact only 1 of them isn't.


*Import an Excel File

GET DATA
/TYPE=XLS
/FILE='C:\Documents and Settings\mjc0016\Desktop\fitness.xls'
/SHEET=name 'Sheet1'
/CELLRANGE=full
/READNAMES=on
/ASSUMEDSTRWIDTH=32767.
DATASET NAME DataSet1 WINDOW=FRONT.

*Import a SAS dataset
GET
SAS DATA='C:\Documents and Settings\mjc0016\Desktop\5700\Code\carsdata\cars2.sas7bdat'.
DATASET NAME DataSet2 WINDOW=FRONT.

*Import R

*As SPSS now has an R addon there 'shouldn't' be any issue with just doing things as if you were in R although you'd be
using SPSS' crappy syntax editor instead. Otherwise, R data files can be exported as text files and can be imported in that fashion.


Manipulating data

Sort

*Sort cases. The (A) means ascending.

SORT CASES BY
Varname (A) .

Split

*This will produce single tables with separate output for males and females.

SORT CASES BY Gender .
SPLIT FILE
LAYERED BY Gender .

Merge
*Adding cases. The following assumes your are adding datasets that look the same.

*From an external file


ADD FILES /FILE=*
/FILE='C:\Users\mike\Desktop\data\dataset2.sav'.
EXECUTE.

*From a dataset that's already open. The number in Dataset will depend on how many datasets you've opened in that session.

ADD FILES /FILE=*
/FILE='DataSet2'.
EXECUTE.

*Adding variables. Change the word ADD to MATCH in the code above.


*This will produce separate output tables for each gender.
SORT CASES BY Gender .
SPLIT FILE
SEPARATE BY Gender .

Using Subsets of the data
*Select if. As you can see, this is horrendous for such a simple process, and that's because it actually has to create a filter variable
which is needless.


USE ALL.
COMPUTE filter_$=(Gender=1).
VARIABLE LABEL filter_$ 'Gender=1 (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMAT filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE .

*Easier and better since it turns off the select cases after you do what is desired. The key here is the 'temporary' command.

temporary.
select if (gender=1).
DESCRIPTIVES
VARIABLES=age.

Recoding Variables

*The following creates a new variable gender2 that simply reverses the values

RECODE
Gender
(1=0) (0=1) INTO Gender2 .
EXECUTE .

*Recode from a numeric into a string variable

STRING Gender2 (A8) .
RECODE
Gender
(0='Male') (1='Female') INTO Gender2 .
EXECUTE .

*Autorecode, let SPSS decide the new coding scheme. This is probably only useful for turning multiple choice responses into a numeric variable

AUTORECODE
VARIABLES=Gender /INTO Gender2
/PRINT.


Computing New Variables
**Example of summing two different ways. This is also a good example of one of the 'quirks' of SPSS. If you use method 1, it will do it
regardless of missing values, i.e. there will be a score for every case as long as it has at least a score on one of the variables. Method 2
will only sum for complete cases.

*Method1

COMPUTE Newvar = SUM(Var1,Var2) .
EXECUTE .

*Method2

COMPUTE Newvar = var1+var2 .
EXECUTE .


Frequencies

*This uses the University of Florida data in the SPSS folder.  Adds a barchart.  In this case you cannot do anything regarding the look of the bar chart from here, but must instead take the much more time-consuming route of clicking your way through the graph itself and changing everything that's there to how you'd like it.  Compared with R, with one straightforward line of code you can produce a perfectly tailored graph complete with Titles and legend.
 

FREQUENCIES VARIABLES=college
/BARCHART FREQ
/ORDER=ANALYSIS.

*The Frequencies procedure above is extremely limited.   To go further we'll need the CROSSTABS function.  The Tables part specifies what you want on the rows BY the variable you want on the column.  This barchart produces something like what you see with the R code, however again has extremely limited functionality.  Indeed, all you can do is what you see below.  To reverse the display, e.g. see a bar chart of gender counts for each college, you'd have to rerun the entire procedure with 'college BY gender' and you'd still have to waste time pointing and clicking all the color specifications, titles etc.

CROSSTABS
/TABLES=gender BY college
/FORMAT=AVALUE TABLES
/CELLS=COUNT
/COUNT ROUND CELL
/BARCHART.


Getting Measures of Central Tendency and Variability

*Getting basic measures of central tendency and variability. SPSS is sorely lacking in comparison to both R and SAS with
respect to both options and flexibility. Not recommended for anything 'robust'.


*Create a variable "X" of 100 cases with mean 100 sd 15. Very similar to SAS, and like SAS extremely clunky compared to R.

NEW FILE.
INPUT PROGRAM.
LOOP #I=1 TO 100.
COMPUTE X = RV.NORMAL(100,15).
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
EXECUTE.

*While the descriptives procedure is 'prettier' in my opinion, it has only the most rudimentary offerings.
DESCRIPTIVES VARIABLES=X
/STATISTICS=MEAN STDDEV.

*Examine, which is for some reason called "Explore" in the menus, offers more but no tweaking (e.g. I get a 5% trimmed mean,
which I've never seen recommended, and which you can't change.)


EXAMINE VARIABLES=X
/PLOT NONE
/MESTIMATORS HUBER(1.339) ANDREW(1.34) HAMPEL(1.7,3.4,8.5) TUKEY(4.685)
/STATISTICS DESCRIPTIVES
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.
 


Summarizing Data

**Compute a new variable with 100 cases sampling from a normal distribution of mean 100 and standard deviation of 15
NEW FILE.
INPUT PROGRAM.
LOOP #I=1 TO 100.
COMPUTE newvar = RV.NORMAL(100,15).
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
EXECUTE.

**create a new variable with the same mean and standard deviation. it's N is set by the max of the current dataset (defined as 100 above)
COMPUTE newvar2 = RV.NORMAL(100,15).
EXECUTE.

**check the mean and sd of both. note how they are not exactly 100 or 15 respectively, but in the ball park at least.

DESCRIPTIVES VARIABLES=newvar newvar2
/STATISTICS=MEAN STDDEV MIN MAX.

**get histograms 'notable' suppresses the frequencies, which we don't want since each value occurs only once, and plot a normal curve on it

FREQUENCIES VARIABLES=newvar newvar2
/FORMAT=NOTABLE
/HISTOGRAM NORMAL
/ORDER=ANALYSIS.


**a simple paired samples t-test. given the random data, they should have no correlation or have a statistically significant outcome,
** but there is a low probability it might. Unlike other programs, SPSS can only test against a zero value,
**which is typically a very boring and uninformative test, and technically inappropriate if prior research indicates there is a non-zero difference.


T-TEST PAIRS=newvar WITH newvar2 (PAIRED)
/CRITERIA=CI(.95)
/MISSING=ANALYSIS.


GOING FURTHER
#1. Create two new variables but with different means, same sd, same means different sd or whatever you like, again plotting the histograms.
Key concepts: normal distribution, central tendency, variance. Related: homogeneity of variance assumption, hypothesis testing of mean differences

#2. Assuming normal distributions as we had above and a null hypothesized value of 0 difference, using a typical value of p=.05 as being statistically
significant, what is the probability that even with such random data, what is the probability that we might, just by chance, obtain a
"statistically significant" difference between the means of the two samples of data?
Key concepts: hypothesis testing of mean differences, observed p-value, alpha, random sampling.
 


t-tests

The T-TEST procedure is largely useless I find.  It offers nothing over the menu settings, which already suck.  It is only with the one-sample case that you can specify a specific null hypothesis value, there is no way to get an effect size, and one-sided tests are not possible (you have to double the p-value yourself).

One-sample case
T-TEST
/TESTVAL = value       
/MISSING = ANALYSIS
/VARIABLES = varname
/CRITERIA = CI(value) .
 
Two independent samples

T-TEST
GROUPS = groupvarname(0 1) *The numbers always have to be specified, even if there are only 2 groups in the factor total.
/MISSING = ANALYSIS
/VARIABLES = var1 var2 var3  *whichever variables you want to do a t-test for.  Makes doing multiple t-tests easy.
/CRITERIA = CI(value) .

Two dependent samples

T-TEST
PAIRS = var1 WITH var2 (PAIRED)  
/CRITERIA = CI(value)
/MISSING = ANALYSIS.

*This can actually be modified make it easy to do e.g. multiple paired t-tests

T-TEST PAIRS=TEACHER CONSTRUC MANAGER.
T-TEST PAIRS=TEACHER MANAGER WITH CONSTRUC ENGINEER.
T-TEST PAIRS=TEACHER MANAGER WITH CONSTRUC ENGINEER (PAIRED).

The first T-TEST compares TEACHER with CONSTRUC, TEACHER with MANAGER, and CONSTRUC with MANAGER.
The second T-TEST compares TEACHER with CONSTRUC, TEACHER with ENGINEER, MANAGER with CONSTRUC, and MANAGER with ENGINEER. TEACHER is not compared with MANAGER, and CONSTRUC is not compared with ENGINEER.
The third T-TEST compares TEACHER with CONSTRUC and MANAGER with ENGINEER.


Correlation

Not much here that you don't have in the menus except that the correlation matrix information can be written out to a dataset on its own e.g. with /MATRIX OUT(*).  However one handy option is the ability to use 'with' so that you could e.g. display correlations of predictors to the DV as opposed to every single correlation possible (see the second example).

CORRELATIONS
/VARIABLES=var1 var2 var3
/PRINT=TWOTAIL SIG
/STATISTICS DESCRIPTIVES
/MISSING=PAIRWISE .

CORRELATIONS
/VARIABLES= var1 var2 WITH var3
/PRINT=TWOTAIL SIG
/STATISTICS DESCRIPTIVES
/MISSING=PAIRWISE .

Simple regression

The following is about all you can do to begin on regression in SPSS.  Along with basic descriptive and model output, residuals, predicted values are saved. Outlier values for each case as confidence intervals for the mean predicted value and prediction intervals for specific values, i.e. what is plotted in the IGRAPH.  For assumption inspection: histogram of residuals, graph of predicted vs. residual values, normality of the residuals (via Examine), Durbin-Watson statistic.  So for your assumptions you get 1 statistical test (normality of residuals), 2 graphs (regarding normality, homoscedasticity), and 1 statistic with no p-value (regarding the independence assumption) before having to go for macros etc. to adequately test the rest.  And if you actually want to do something about violated assumptions?  Good luck.

REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS CI R ANOVA
/CRITERIA=PIN(.05) POUT(.10) CIN(95)
/NOORIGIN
/DEPENDENT DVName
/METHOD=ENTER PredictorName
/SCATTERPLOT=(*ZRESID ,*ZPRED )
/RESIDUALS DURBIN HIST(ZRESID)
/SAVE PRED SEPRED MAHAL COOK LEVER MCIN ICIN RESID ZRESID SDRESID .
 

EXAMINE
VARIABLES=RES_1
/PLOT NPPLOT
/STATISTICS NONE
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.

IGRAPH /VIEWNAME='Scatterplot' /X1 = VAR(PredictorName) TYPE = SCALE /Y = VAR(DVName)
TYPE = SCALE /COORDINATE = VERTICAL /FITLINE METHOD = REGRESSION LINEAR
INTERVAL(95.0) = MEAN INDIVIDUAL LINE = TOTAL SPIKE=OFF /X1LENGTH=3.0
/YLENGTH=3.0 /X2LENGTH=3.0 /CHARTLOOK='NONE' /SCATTER COINCIDENT = NONE.
EXE.