Lab specific code. This page is for SPSS or R syntax that directly reflects the datasets used in a lab demonstration. Additions are only made here at the request of the lab instructors, otherwise you can typically tailor the syntax on the data and code page to fit you needs.
|
|
ID=1:5
Score=c(17,24,15,29,16)
mean(Score)
sd(Score)
ScoreData=data.frame(ID, Score)
#Add a variable called “Gender” (0, 0, 1, 1, 0) to the database.
ScoreData$Gender=c(0,0,1,1,0)
attach(ScoreData)
#Recode “Gender” from 0s and 1s to Males and Females.
library(car)
ScoreData$Gender2= recode(Gender, "'0'='Male'; else='Female'",as.factor.result=TRUE)
ScoreData$Score2=ifelse(Gender == 0,Score*.5, Score)
attach(ScoreData)
#To see any object you’ve created so far, type in the name of the object. Call
up the database to see ID, Score, Gender, Gender2, and Score2.
ScoreData
#Plot the means of Score2 and Gender2.
plotMeans(Score2,Gender2,error="conf.int",level=.9,main="this is my plot.")
Graphical part
#after Import of 1991 GSS data (SPSS folder)
attach(GSS1991)
hist(AGE,col="lightblue",border="violetred2")
#Create a barplot of region and mean education level.
barplot(table(REGION,EDUC),beside=T, legend = levels(REGION))
#Reverse region and education on the barplot.
barplot(table(EDUC,REGION),beside=T)
#Create a boxplot of age and region.
boxplot(AGE~REGION, col="lightblue")
#Create a scatterplot of occupational prestige, education, and region.
scatterplot(PRESTG80~EDUC|REGION,reg.line=FALSE,smooth=F)
#Violin plots
library(vioplot)
newGSS=na.omit(data.frame(AGE, REGION)) #na.omit gets rid of missing values
newGSS2=split(newGSS$AGE, newGSS$REGION)
vioplot(newGSS2$West, newGSS2$'South East', newGSS2$'North East', col="tomato",
rectCol="blue")
#Compare boxplot to violin.
par(mfrow=c(2,1)) #after making this graph close it so
subsequent graphs will not be in 2 X 1 format or run par(mfrow=c(1,1))
boxplot(newGSS2, col="papayawhip")
vioplot(newGSS2$West, newGSS2$'South East', newGSS2$'North East', col="tomato",
rectCol="blue")
SPSS
do if (sex=2).
compute happy2=happy*.5.
else if (sex=1).
compute happy2=happy.
end if.
execute.
Goal: get basic descriptive information. See last week to add associated graphs.
SPSS
*Categorical variables require barcharts. For any continuous variable you'll want a histogram. For large datasets and/or with many values, you do not want to get a table of frequencies on continuous variables as that would be unwieldy.
FREQUENCIES
VARIABLES=gender race program
/BARCHART FREQ
/ORDER= ANALYSIS.
FREQUENCIES
VARIABLES=age
/HISTOGRAM
/ORDER= ANALYSIS.
DESCRIPTIVES
VARIABLES=HoursStudyF08 readingClass
/STATISTICS=MEAN STDDEV MIN MAX .
*Example of weighted average, although this is for demonstration, the weights you apply should have serve some theoretical or statistical function i.e. it's not something you'd just play around with to see what happens.
WEIGHT
BY age .
DESCRIPTIVES
VARIABLES=HoursStudyF08 readingClass
/STATISTICS=MEAN STDDEV MIN MAX .
EXAMINE
VARIABLES=HoursStudyF08 readingClass
/PLOT BOXPLOT STEMLEAF HISTOGRAM
/COMPARE GROUP
/MESTIMATORS HUBER(1.339) ANDREW(1.34) HAMPEL(1.7,3.4,8.5) TUKEY(4.685)
/STATISTICS DESCRIPTIVES
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.
R
Depending on the variable you choose and how a dataset is imported you may need to make sure the describe function sees it as numeric, ignores missing values and so forth. as.numeric makes sure the variable is seen as numeric, na.rm=T makes sure that NAs (non-applicable i.e. missing values) are removed (rm) first. For example:
SPSS
Lab Syntax - week of 9.29
*1. descriptives and bar graph for eagle vs. dolphin - this could be done any
number of ways.
FREQUENCIES
VARIABLES=animal
/BARCHART FREQ
/ORDER= ANALYSIS .
*2. social scale standardized, with mean, sd, var, min, and max.
DESCRIPTIVES
VARIABLES=social /SAVE
/STATISTICS=MEAN STDDEV VARIANCE MIN MAX .
*3. z score for social using equation.
COMPUTE Zsocial2 = (social--3.9990)/2.07124 .
EXECUTE .
*4. obtain histograms of social, Zsocial, and Zsocial2, include mean, sd, var.
FREQUENCIES
VARIABLES=social Zsocial Zsocial2 /FORMAT=NOTABLE
/STATISTICS=STDDEV VARIANCE MEAN
/HISTOGRAM NORMAL
/ORDER= ANALYSIS .
*5. obtain resistant descriptives for internet, then run histogram.
EXAMINE
VARIABLES=internet
/PLOT NONE
/MESTIMATORS HUBER(1.339) ANDREW(1.34) HAMPEL(1.7,3.4,8.5) TUKEY(4.685)
/STATISTICS DESCRIPTIVES
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.
*histogram for internet using legacy dialog.
GRAPH
/HISTOGRAM(NORMAL)=internet .
*6. obtain histogram for happiness, predict descriptives using chart builder.
describe(sleep)
hist(sleep)
Syntax: if using menus for descriptives and graph:
Statistics > summaries > numerical summaries, Rcmdr spits out the following.
Gives you an idea of how menu systems (Rcmdr SPSS or otherwise) have to do extra
work to get the same thing you could at via commands on your own.
numSummary(classdata[,"sleep"], statistics=c("mean", "sd"))
In menus, Graphs > histogram leads to the following
Hist(classdata$sleep, scale="frequency", breaks="Sturges", col="darkgray")
*2. standardize
Syntax from menus. Again, this is an Rcmdr specific approach.
.Z <- scale(classdata[,c("economic")])
classdata$Z.economic <- .Z[,1]
remove(.Z)
Alternatively, use this syntax to standardize the z as a
new variable
classdata$Z.economic=scale(economic)
*3. OPTIONAL: use z score equation
describe(economic)
classdata$Z.economic2=(economic--4.14)/2.32
since we've added two variables, reattach the new dataset.
*8. bootstrapping
library(bootstrap)
Take 500 bootstrapped samples of the original stress
variable and calculates the mean each time
bootmean=bootstrap(stress,500,mean)
bootmean
Regarding the output: thetastar are the sampled means, the
values with nulls are additional options that we didn't use, call is just a
reminder of the actual code we typed earlier
hist(bootmean$thetastar)
The above plots the part of the bootmean output called
thetastar, i.e. the means
mean(stress)
mean(bootmean$thetastar)
Compare the bootstrapped distribution's mean with the
original
*The following assumes you have already found the mean and sd via descriptives
attach(classdata)
library(bootstrap)
bootstat=bootstrap(variable,500,mean)
#Change 'variable' appropriately.
bootstat
hist(bootstat$thetastar)
mean(variable)
#'variable' should be the same as above
mean(bootstat$thetastar)
mean(variable)
#Change 'variable' appropriately here and in the
next line
boot.iid(variable, statistic = mean, m=100, main=c("Original
Data, 35 cases","Density of bootstrap estimates"))
SPSS
EXAMINE
VARIABLES=HoursStudyF08 email
/PLOT HISTOGRAM
/STATISTICS DESCRIPTIVES
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.
Change the test value to
your own
T-TEST
/TESTVAL = 5
/MISSING = ANALYSIS
/VARIABLES = email
/CRITERIA = CI(.95) .
T-TEST
/TESTVAL = 3
/MISSING = ANALYSIS
/VARIABLES = HoursStudyF08
/CRITERIA = CI(.95) .
CORRELATIONS
/VARIABLES=HoursStudyF08 BookCostF08 email
/PRINT=TWOTAIL SIG
/STATISTICS DESCRIPTIVES
/MISSING=PAIRWISE .
R
1st t.test example assumes data has been attached.
cor(classdata[,c("BookCostF08","email","HoursStudyF08")], use="complete.obs")
scatterplot.matrix(~BookCostF08+email+HoursStudyF08, reg.line=lm, smooth=TRUE,
span=0.5, diagonal = 'density', data=classdata)
SPSS
Uses tips data.
R
1. Basic Model
attach(workpub)
RegModel.1 <- lm(num_pubs~work_eth, data=workpub)
summary(RegModel.1)
range(work_eth)
2. The following does it like Rcmdr, but you can get the graphs by themselves by just plot(RegModel.1) and cycling through them.
oldpar <- par(oma=c(0,0,3,0), mfrow=c(2,2))
plot(RegModel.1)
par(oldpar)
influencePlot(RegModel.1)
3. Requires library(lmtest)
resettest(num_pubs ~ work_eth, power=2:3, type=”regressor”, data=workpub)
bptest(num_pubs ~ work_eth, varformula = ~ fitted.values (RegModel.1), studentize=FALSE, data=workpub)
dwtest (num_pubs ~ work_eth, alternative=”two.sided”, data=workpub)
shapiro.test(RegModel.1$residuals)
1. Independent samples t-test
t.test(cooperation~condition, alternative='two.sided', conf.level=.95, var.equal=FALSE, data=Guyer)
2. Dependent samples t-test
t.test(States$SATM, States$SATV, alternative='two.sided', conf.level=.95, paired=TRUE) #For some reason, when doing paired t-tests the data set specification is not an option
Optionals:
ss.power.R2(Population.R2 = .10,
alpha.level = .05, desired.power = .80, p = 1)
power.t.test(power = .80, delta = .5, sig.level = .05)
SPSS SYNTAX It is hoped to be ready by lab but there were some 17.0 compatibility issues. Download.
Histograms
* Chart
Builder.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=simplicity MISSING=LISTWISE
REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: simplicity=col(source(s), name("simplicity"))
GUIDE: axis(dim(1), label("simplicity"))
GUIDE: axis(dim(2), label("Frequency"))
ELEMENT: interval(position(summary.count(bin.rect(simplicity))),
shape.interior(shape.square))
END GPL.
* Chart Builder.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=fatalism MISSING=LISTWISE
REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: fatalism=col(source(s), name("fatalism"))
GUIDE: axis(dim(1), label("fatalism"))
GUIDE: axis(dim(2), label("Frequency"))
ELEMENT: interval(position(summary.count(bin.rect(fatalism))),
shape.interior(shape.square))
END GPL.
* Chart Builder.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=depression MISSING=LISTWISE
REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: depression=col(source(s), name("depression"))
GUIDE: axis(dim(1), label("depression"))
GUIDE: axis(dim(2), label("Frequency"))
ELEMENT: interval(position(summary.count(bin.rect(depression))),
shape.interior(shape.square))
END GPL.
Scatterplot
* Chart
Builder.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=simplicity fatalism depression
MISSING=LISTWISE
REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: simplicity=col(source(s), name("simplicity"))
DATA: fatalism=col(source(s), name("fatalism"))
DATA: depression=col(source(s), name("depression"))
GUIDE: axis(dim(1.1), ticks(null()))
GUIDE: axis(dim(2.1), ticks(null()))
GUIDE: axis(dim(1), gap(0px))
GUIDE: axis(dim(2), gap(0px))
TRANS: simplicity_label = eval("simplicity")
TRANS: fatalism_label = eval("fatalism")
TRANS: depression_label = eval("depression")
ELEMENT: point(position((simplicity/simplicity_label+fatalism/fatalism_label+
depression/depression_label)*(simplicity/simplicity_label+fatalism/fatalism_label+
depression/depression_label)))
END GPL.
Correlations
CORRELATIONS
/VARIABLES=simplicity fatalism depression
/PRINT=TWOTAIL SIG
/MISSING=PAIRWISE.
NONPAR CORR
/VARIABLES=simplicity fatalism depression
/PRINT=SPEARMAN TWOTAIL SIG
/MISSING=PAIRWISE.
R code download
Packages to install: Rcmdr, simpleboot, aplpack, MASS(VR), HH
Bootstrap Review
x=rnorm(1000)
mean(x)
sd(x)
library(simpleboot)
b=one.boot(x, mean, R=100)
hist(b,do.rug=T)
Robust Correlation The following assumes the work ethic and publications has been imported
library(aplpack)
attach(workethic) #workethic needs to be whatever you named the data upon import
bagplot(WORK_ETH,NUM_PUBS)
library(MASS)
cor(workethic)
cov.rob(workethic, cor=T,method="mcd")$cor
Regression
mymodel = lm(NUM_PUBS~WORK_ETH) #mymodel is an arbitrary name
summary(mymodel)
This alternative approach will automatically spit out the summary
(summary(lm(NUM_PUBS~WORK_ETH)))
Obtain predicted values
mypredvalues=predict(mymodel, interval = "prediction")
mypredvalues
library(HH)
ci.plot(mymodel)