Lab specific code.  This page is for SPSS or R syntax that directly reflects the datasets used in a lab demonstration.  Additions are only made here at the request of the lab instructors, otherwise you can typically tailor the syntax on the data and code page to fit you needs.

Week of 2-9
From now on just right click, save and open within the program.
2-16: R, SPSS  Regression
2-23: R, SPSS  Assumptions  Data: Charity, Depression
3-2: R, SPSS   Data: Mireault
3-9: R, SPSS   Data: Ginzberg, Sobel test data
3-30: R, SPSS Data: Available via instructor.
4-6: R, SPSS  Data: Anorexia
4-13: R, SPSS  Data
4-20: R, SPSS, Data:  Recall1, Recall2, Marriage, Ice
4-27: R, SPSS, Data: Stress, Smoking, Anorexia
Week of 9-8
Week of 9-15
Week of 9-22
Week of 9-29
Week of 10-6
Week of 10-13
Week of 10-20 used no new code
Week of 10-27
Week of 11-3
Week of 11-10
Week of 11-17

 

Week of 9-8:

What it requires:
Create 3 variables in the Variable View. Make one of them categorical (e.g. ID number, age, gender)
Create a new variable, Depression. Pretend the values are the subject’s scores on a depression measure. Enter their scores in data view, with values ranging 1 through 10.
Create new variable, Anxiety. Enter scores with values ranging 1 through 10.
 
What you will do:
Recode into a new, categorical variable that indicates a cut score of 5 to differentiate between “depressed” and “not depressed.”
Combine the anxiety scores with depression scores to come up with a total depression and anxiety score (depanx) for each subject.
Sort data by ‘depression’ variable.
Split the dataset by gender, so that the analyses you run will be conducted separately for males and females.
 
RECODE
depression (lowest thru 5=0) (6 thru highest=1) into sad.
variable labels sad ‘sadcategory’.
execute.

COMPUTE depanx=depression + anxiety.
execute.
 
SORT cases by depression(a).
 
SORT cases by gender.
SPLIT FILE separate by gender.
descriptives variables=depression
/statistics=mean stddev min max.

 

Week of 9-15

What it requires:
Creating a dataset
 
What you will do:
Recode gender from numeric to a factor variable
If else statements in both SPSS and R
Get simple graphical displays of data in R: histogram, barplot,

 

ID=1:5
Score=c(17,24,15,29,16)
mean(Score)
sd(Score)
ScoreData=data.frame(ID, Score)

#Add a variable called “Gender” (0, 0, 1, 1, 0) to the database.
ScoreData$Gender=c(0,0,1,1,0)
attach(ScoreData)

#Recode “Gender” from 0s and 1s to Males and Females.
library(car)
ScoreData$Gender2= recode(Gender, "'0'='Male'; else='Female'",as.factor.result=TRUE)

ScoreData$Score2=ifelse(Gender == 0,Score*.5, Score)
attach(ScoreData)
#To see any object you’ve created so far, type in the name of the object. Call up the database to see ID, Score, Gender, Gender2, and Score2.

ScoreData

#Plot the means of Score2 and Gender2.
plotMeans(Score2,Gender2,error="conf.int",level=.9,main="this is my plot.")


Graphical part

#after Import of 1991 GSS data (SPSS folder)
attach(GSS1991)
hist(AGE,col="lightblue",border="violetred2")

#Create a barplot of region and mean education level.
barplot(table(REGION,EDUC),beside=T, legend = levels(REGION))
#Reverse region and education on the barplot.
barplot(table(EDUC,REGION),beside=T)

#Create a boxplot of age and region.
boxplot(AGE~REGION, col="lightblue")

#Create a scatterplot of occupational prestige, education, and region.
scatterplot(PRESTG80~EDUC|REGION,reg.line=FALSE,smooth=F)

#Violin plots
library(vioplot)
newGSS=na.omit(data.frame(AGE, REGION)) #na.omit gets rid of missing values
newGSS2=split(newGSS$AGE, newGSS$REGION)
vioplot(newGSS2$West, newGSS2$'South East', newGSS2$'North East', col="tomato", rectCol="blue")

#Compare boxplot to violin.
par(mfrow=c(2,1))     #after making this graph close it so subsequent graphs will not be in 2 X 1 format or run par(mfrow=c(1,1))
boxplot(newGSS2, col="papayawhip")
vioplot(newGSS2$West, newGSS2$'South East', newGSS2$'North East', col="tomato", rectCol="blue")

SPSS

do if (sex=2).
compute happy2=happy*.5.
else if (sex=1).
compute happy2=happy.
end if.
execute.



Week of 9-22

Goal: get basic descriptive information.  See last week to add associated graphs.

SPSS

*Categorical variables require barcharts.  For any continuous variable you'll want a histogram.  For large datasets and/or with many values, you do not want to get a table of frequencies on continuous variables as that would be unwieldy.

FREQUENCIES
VARIABLES=gender race program   
/BARCHART FREQ
/ORDER= ANALYSIS.

FREQUENCIES
VARIABLES=age
/HISTOGRAM
/ORDER= ANALYSIS.

DESCRIPTIVES
VARIABLES=HoursStudyF08 readingClass
/STATISTICS=MEAN STDDEV MIN MAX .

*Example of weighted average, although this is for demonstration, the weights you apply should have serve some theoretical or statistical function i.e. it's not something you'd just play around with to see what happens.

WEIGHT
BY age .
DESCRIPTIVES
VARIABLES=HoursStudyF08 readingClass
/STATISTICS=MEAN STDDEV MIN MAX .

EXAMINE
VARIABLES=HoursStudyF08 readingClass
/PLOT BOXPLOT STEMLEAF HISTOGRAM
/COMPARE GROUP
/MESTIMATORS HUBER(1.339) ANDREW(1.34) HAMPEL(1.7,3.4,8.5) TUKEY(4.685)
/STATISTICS DESCRIPTIVES
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.

R

attach(Dataset)   #make sure to match this to whatever name you gave the data upon importing it. The following assumes the class dataset has been imported.
library(psych)
describe(readingClass, trim=.20) #20% trim (off both sides of the distribution).  Feel free to play around with it.   Don't forget to match the case regarding the variable name.  Compare for example, the mean and median the mean.
hist(readingClass)  #Because of the more extreme scores, the mean would suggest almost a full hour more of reading per week by the typical student.

Depending on the variable you choose and how a dataset is imported you may need to make sure the describe function sees it as numeric, ignores missing values and so forth. as.numeric makes sure the variable is seen as numeric, na.rm=T makes sure that NAs (non-applicable i.e. missing values) are removed (rm) first.  For example:

describe(as.numeric(varname1), tr=.2, na.rm=TRUE)
 
library(robustbase)
huberM(readingClass)  
 
#mu is Huber's M, an example of an M-estimator.  It also returns 's' which is roughly the median absolute deviation (compare to describe function above) and the number of iterations it took to come up with the estimate (not for us to worry with but it comes up with the final estimate after a sequence of 'guesses' based on some complex math stuff).

Week of 9-29

SPSS Lab Syntax - week of 9.29

*1. descriptives and bar graph for eagle vs. dolphin - this could be done any number of ways.

FREQUENCIES
VARIABLES=animal
/BARCHART FREQ
/ORDER= ANALYSIS .

*2. social scale standardized, with mean, sd, var, min, and max.

DESCRIPTIVES
VARIABLES=social /SAVE
/STATISTICS=MEAN STDDEV VARIANCE MIN MAX .

*3. z score for social using equation.

COMPUTE Zsocial2 = (social--3.9990)/2.07124 .
EXECUTE .

*4. obtain histograms of social, Zsocial, and Zsocial2, include mean, sd, var.

FREQUENCIES
VARIABLES=social Zsocial Zsocial2 /FORMAT=NOTABLE
/STATISTICS=STDDEV VARIANCE MEAN
/HISTOGRAM NORMAL
/ORDER= ANALYSIS .

*5. obtain resistant descriptives for internet, then run histogram.

EXAMINE
VARIABLES=internet
/PLOT NONE
/MESTIMATORS HUBER(1.339) ANDREW(1.34) HAMPEL(1.7,3.4,8.5) TUKEY(4.685)
/STATISTICS DESCRIPTIVES
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.

*histogram for internet using legacy dialog.

GRAPH
/HISTOGRAM(NORMAL)=internet .

*6. obtain histogram for happiness, predict descriptives using chart builder.

GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=happiness MISSING=LISTWISE
REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: happiness=col(source(s), name("happiness"))
GUIDE: axis(dim(1), label("General happiness over last month: Higher = ",
"more happy"))
GUIDE: axis(dim(2), label("Frequency"))
ELEMENT: interval(position(summary.count(bin.rect(happiness))) ,
shape.interior(shape.square))
END GPL.


R Lab Syntax - week of 9.29

*1. obtain descriptives and graph for sleep
library(psych)       #give psych a second to load if need be
attach(classdata)    #Note the warning, one of R's base datasets is called sleep. If you ever use an object with the name of another object it will let you know.

describe(sleep)
hist(sleep)

Syntax: if using menus for descriptives and graph: Statistics > summaries > numerical summaries, Rcmdr spits out the following. Gives you an idea of how menu systems (Rcmdr SPSS or otherwise) have to do extra work to get the same thing you could at via commands on your own.

numSummary(classdata[,"sleep"], statistics=c("mean", "sd"))

In menus, Graphs > histogram leads to the following

Hist(classdata$sleep, scale="frequency", breaks="Sturges", col="darkgray")

*2. standardize
Syntax from menus. Again, this is an Rcmdr specific approach.

.Z <- scale(classdata[,c("economic")])
classdata$Z.economic <- .Z[,1]
remove(.Z)

Alternatively, use this syntax to standardize the z as a new variable

classdata$Z.economic=scale(economic)

*3. OPTIONAL: use z score equation
describe(economic)
classdata$Z.economic2=(economic--4.14)/2.32

since we've added two variables, reattach the new dataset.

attach(classdata)

*4. histograms
hist(economic)
hist(Z.economic)

*5. resistant descriptives
describe(stress, trim=.2)
hist(stress)

*6. histogram, descriptives
hist(places)
describe(places,trim=.2)

*7. sampling from normal distribution
x=rnorm(10,50,10)
hist(x)
mean(x)
x=rnorm(50,50,10)
hist(x)
mean(x)
x=rnorm(100,50,10)
hist(x)
mean(x)

*8. bootstrapping
library(bootstrap)

Take 500 bootstrapped samples of the original stress variable and calculates the mean each time

bootmean=bootstrap(stress,500,mean)

bootmean

Regarding the output: thetastar are the sampled means, the values with nulls are additional options that we didn't use, call is just a reminder of the actual code we typed earlier

hist(bootmean$thetastar)

The above plots the part of the bootmean output called thetastar, i.e. the means

mean(stress)
mean(bootmean$thetastar)

Compare the bootstrapped distribution's mean with the original


Week of 10-6

SPSS

*The following assumes you have already found the mean and sd via descriptives

COMPUTE happrob =Cdf.NORMAL(happiness,Mean,SD).   
EXECUTE.
 
R
attach(classdata)   #change classdata if necessary to your dataset name
 
#Probability of getting a certain score or greater/less
pnorm(5,mean(happiness),sd(happiness),lower.tail=T)
pnorm(5,mean(happiness),sd(happiness),lower.tail=F)
pnorm(happiness,mean(happiness),sd(happiness))
 
#Find score at a certain percentile
qnorm(.80,mean(OnCampus),sd(OnCampus),lower.tail=T)
qnorm(.10,mean(OnCampus),sd(OnCampus),lower.tail=T)
qnorm(.50,mean(OnCampus),sd(OnCampus),lower.tail=T)
 
library(psych)
describe(OnCampus)
 
#Create a sampling distribution of a statistic
library(bootstrap)
bootstat=bootstrap(variable,500,statistic#Change 'variable' appropriately. statistic needs to be replaced with mean, median, sd, or mad
bootstat
hist(bootstat$thetastar)
statistic(variable)      #depending on which you chose earlier, 'statistic' and 'variable' should be the same as above
mean(bootstat$thetastar)     #If you looked at a resistant measure initially, change this to median for comparison.
 
#See the bootstrap in action
library(animation)
data=rnorm(50, 50,10) #create some normal data; in this case 50 T scores
mean(data) #for comparison
boot.iid(data, statistic = mean, m=100, main=c("Original Data, 50 cases","Density of bootstrap estimates"))
 
It uses a sunflower plot, the ‘petals’ of the top graph of the original data represent how many times each specific score was sampled.
 

Week of 10-13

attach(classdata)

library(bootstrap)
bootstat=bootstrap(variable,500,mean)  #Change 'variable' appropriately.
bootstat
hist(bootstat$thetastar)
mean(variable)      #'variable' should be the same as above
mean(bootstat$thetastar)

mean(variable) #Change 'variable' appropriately here and in the next line
boot.iid(variable, statistic = mean, m=100, main=c("Original Data, 35 cases","Density of bootstrap estimates"))


Week of 10-27

SPSS

EXAMINE
VARIABLES=HoursStudyF08 email
/PLOT HISTOGRAM
/STATISTICS DESCRIPTIVES
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.

Change the test value to your own
T-TEST
/TESTVAL = 5
/MISSING = ANALYSIS
/VARIABLES = email
/CRITERIA = CI(.95) .

T-TEST
/TESTVAL = 3
/MISSING = ANALYSIS
/VARIABLES = HoursStudyF08
/CRITERIA = CI(.95) .

CORRELATIONS
/VARIABLES=HoursStudyF08 BookCostF08 email
/PRINT=TWOTAIL SIG
/STATISTICS DESCRIPTIVES
/MISSING=PAIRWISE .

R

1st t.test example assumes data has been attached.

t.test(email, alternative='greater', mu=5.0, conf.level=.95)

library(MASS)
library(psych)
samp1=mvrnorm(n =30, 50, 100, empirical = T)
describe(samp1)
samp2=mvrnorm(n =30, 55, 100, empirical = T)
describe(samp2)
samp3=mvrnorm(n =30, 50, 225, empirical = T)
describe(samp3)
samp4=mvrnorm(n =60, 50, 100, empirical = T)
describe(samp4)


t.test(samp1, mu=45)
t.test(samp2, mu=45)
t.test(samp3, mu=45)
t.test(samp4, mu=45)

cor(classdata[,c("BookCostF08","email","HoursStudyF08")], use="complete.obs")

scatterplot.matrix(~BookCostF08+email+HoursStudyF08, reg.line=lm, smooth=TRUE, span=0.5, diagonal = 'density', data=classdata)


Week of 11-3

SPSS

Uses tips data.

temporary.
select if (DAY=##).  *pick your day. 1= FRI 2=SAT 3=SUN 4=THUR
T-TEST
/TESTVAL = 3       
/VARIABLES = TIP
/CRITERIA = CI(.95).
 
CORRELATIONS
/VARIABLES= TOTBILL SIZE WITH TIP
/STATISTICS DESCRIPTIVES
/MISSING=PAIRWISE.
 
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/STATISTICS COEFF OUTS R ANOVA
/DEPENDENT TIP
/METHOD=ENTER TOTBILL.
 
Extra
GRAPH
/SCATTERPLOT(BIVAR)=TIP WITH TOTBILL BY SMOKER
/MISSING=LISTWISE .

R

Assumes the dataset name is tips and attached
mean(TIP)
myday <- subset(tips, DAY=="?")  #the name of the object (it does not have to be myday), choose the day to put in the quotes: one of thur fri sat sun
t.test(datasetname$TIP, alternative='two.sided', mu=3, conf.level=.95)  #change datasetname accordingly (e.g. to myday as it was called above), and feel free to do a one-sided test.
cor(tips[,c("TOTBILL","SIZE")], TIP, use="pairwise.complete.obs")
mymodel <- lm(TIP~TOTBILL, data=tips)    #data=tips not required if attached.
summary(mymodel)
 
Extra
scatterplot.matrix(~SIZE+TIP+TOTBILL | SMOKER, data=tips)
 

Week of 11-10

1. Basic Model

 

attach(workpub)

RegModel.1 <- lm(num_pubs~work_eth, data=workpub)

summary(RegModel.1)

range(work_eth)

 

2. The following does it like Rcmdr, but you can get the graphs by themselves by just plot(RegModel.1) and cycling through them.

oldpar <- par(oma=c(0,0,3,0), mfrow=c(2,2))

plot(RegModel.1)

par(oldpar)

 

influencePlot(RegModel.1)

 

3. Requires library(lmtest)

resettest(num_pubs ~ work_eth, power=2:3, type=”regressor”, data=workpub)

bptest(num_pubs ~ work_eth, varformula = ~ fitted.values (RegModel.1), studentize=FALSE, data=workpub)

dwtest (num_pubs ~ work_eth, alternative=”two.sided”, data=workpub)

shapiro.test(RegModel.1$residuals)

 

1. Independent samples t-test

t.test(cooperation~condition, alternative='two.sided', conf.level=.95, var.equal=FALSE, data=Guyer)

2. Dependent samples t-test

t.test(States$SATM, States$SATV, alternative='two.sided', conf.level=.95, paired=TRUE)  #For some reason, when doing paired t-tests the data set specification is not an option


Week of 11-17

1. Input your R or R-squared and other model specifics.  K refers to the number of predictors.
attach(Davis)
cor(repht,height,use="complete")
mymodel=lm(repht~height)
summary(mymodel)
 
library(MBESS)
ci.R(R=.74, N=183, K=1, conf.level=.95)
ci.R2(R2=.55, N=183, K=1, conf.level=.95)
 
2. Two ways to calculate Cohen's d (i.e. standardized mean difference).
library(psych)
describe.by(height,sex)

smd(Mean.1=163.74, Mean.2=178.01, s.1=11.64, s.2=6.44, n.1=112, n.2=88)
smd(Davis$height[sex=="F"], Davis$height[sex=="M"])
 
Now, once you have it, get the interval estimate of it.
ci.smd(smd=-1.47, n.1=112, n.2=88, conf.level=.95)
 
Take out the outlier. Notice the increase in Cohen's d.  This is because the variance is 1/4 of its original value when that case is removed.
Fem = Davis$height[sex=="F"]
Fem2 = Fem[c(1:3,5:112)]
smd(Fem2, Davis$height[sex=="M"])
 

Optionals:

ss.power.R2(Population.R2 = .10, alpha.level = .05, desired.power = .80, p = 1)
power.t.test(power = .80, delta = .5, sig.level = .05)

 


2-9

SPSS SYNTAX   It is hoped to be ready by lab but there were some 17.0 compatibility issues. Download.

Histograms

* Chart Builder.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=simplicity MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: simplicity=col(source(s), name("simplicity"))
GUIDE: axis(dim(1), label("simplicity"))
GUIDE: axis(dim(2), label("Frequency"))
ELEMENT: interval(position(summary.count(bin.rect(simplicity))), shape.interior(shape.square))
END GPL.

* Chart Builder.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=fatalism MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: fatalism=col(source(s), name("fatalism"))
GUIDE: axis(dim(1), label("fatalism"))
GUIDE: axis(dim(2), label("Frequency"))
ELEMENT: interval(position(summary.count(bin.rect(fatalism))), shape.interior(shape.square))
END GPL.

* Chart Builder.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=depression MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: depression=col(source(s), name("depression"))
GUIDE: axis(dim(1), label("depression"))
GUIDE: axis(dim(2), label("Frequency"))
ELEMENT: interval(position(summary.count(bin.rect(depression))), shape.interior(shape.square))
END GPL.

Scatterplot

* Chart Builder.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=simplicity fatalism depression MISSING=LISTWISE
REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: simplicity=col(source(s), name("simplicity"))
DATA: fatalism=col(source(s), name("fatalism"))
DATA: depression=col(source(s), name("depression"))
GUIDE: axis(dim(1.1), ticks(null()))
GUIDE: axis(dim(2.1), ticks(null()))
GUIDE: axis(dim(1), gap(0px))
GUIDE: axis(dim(2), gap(0px))
TRANS: simplicity_label = eval("simplicity")
TRANS: fatalism_label = eval("fatalism")
TRANS: depression_label = eval("depression")
ELEMENT: point(position((simplicity/simplicity_label+fatalism/fatalism_label+
depression/depression_label)*(simplicity/simplicity_label+fatalism/fatalism_label+
depression/depression_label)))
END GPL.
 

Correlations

CORRELATIONS
/VARIABLES=simplicity fatalism depression
/PRINT=TWOTAIL SIG
/MISSING=PAIRWISE.
NONPAR CORR
/VARIABLES=simplicity fatalism depression
/PRINT=SPEARMAN TWOTAIL SIG
/MISSING=PAIRWISE.

R code  download

Packages to install: Rcmdr, simpleboot, aplpack, MASS(VR), HH

Bootstrap Review

x=rnorm(1000)

mean(x)

sd(x)

 

library(simpleboot)

b=one.boot(x, mean, R=100)

hist(b,do.rug=T)

 

Robust Correlation The following assumes the work ethic and publications has been imported

library(aplpack)
attach(workethic) #workethic needs to be whatever you named the data upon import
bagplot(WORK_ETH,NUM_PUBS)

library(MASS)
cor(workethic)
cov.rob(workethic, cor=T,method="mcd")$cor
 

Regression

mymodel = lm(NUM_PUBS~WORK_ETH) #mymodel is an arbitrary name

summary(mymodel)

 

This alternative approach will automatically spit out the summary

(summary(lm(NUM_PUBS~WORK_ETH)))

 

Obtain predicted values

mypredvalues=predict(mymodel, interval = "prediction")

mypredvalues

library(HH)

ci.plot(mymodel)