
Link to the last RSS article here:
R
for the Windows Platform: Installation and Configuration - Ed.
R Commander: A Simple Windows Interface for R on the
Windows Platform
By Dr. Rich
Herrington, Research and Statistical Support Services Manager
This
month we demonstrate how to utilize the R library "Rcmd" (R
Commander) - a simple menu system for R on the Windows Platform. The following is
an excerpt from the R website
http://www.r-project.org - "R is a language and
environment for statistical computing and graphics. It is a
GNU project which is
similar to the S language and environment which was developed at Bell
Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers
and colleagues. R can be considered as a different implementation of
S. There are some important differences, but much code written for S
runs unaltered under R. R provides a wide variety of statistical
(linear and nonlinear modeling, classical statistical tests,
time-series analysis, classification, clustering, ...) and graphical
techniques, and is highly extensible. The S language is often the
vehicle of choice for research in statistical methodology, and R
provides an Open Source route to participation in that activity. One
of R's strengths is the ease with which well-designed
publication-quality plots can be produced, including mathematical
symbols and formulae where needed. Great care has been taken over the
defaults for the minor design choices in graphics, but the user
retains full control. R is available as Free Software under the terms
of the Free Software
Foundation's GNU
General Public License in source code form. It compiles and runs
out of the box on a wide variety of UNIX platforms and similar systems
(including FreeBSD and Linux). It also compiles and runs on Windows
9x/NT/2000 and MacOS" (from Introduction).
Starting R in the SDI Mode
To use the Rcmdr library, we will want to configure the R system
window to have a "single document interface" mode. Right mouse
click the R icon on your desktop and select properties.
Add " --sdi " to the Target field after the Rgui.exe statement
(see below). Click apply, then ok. Double
click the R icon on your desktop to start the R system.

To load a library in R, select Packages from the main menu
bar and select Load Package.

Select the Rcmdr library in
the package selection window. The following menu system will
appear:

The Rcmdr or "R Commander" library is a simple menu system based on
Tcl/Tk. You can read more about Rcmdr at
http://www.socsci.mcmaster.ca/jfox/Misc/Rcmdr/index.html .
To import an SPSS data set:

Give the imported data set a working session name (i.e. Survey), then browse the file
system to select your SPSS data set. We will use survey data that
was collected from an undergraduate statistics class.


Editing and Viewing of the data can be accomplished by selecting the
"Edit data set" button or the "View data set" button:

This survey data was collected as part of an undergraduate course in
applied statistics. The survey was collected using the UNT "Zope"
server using the open source package "QSurvey". You can read
more about collecting surveys on this system at:
http://www.unt.edu/rss/class/survey/QSurvey.html If you
are interested learning how to collect surveys on the UNT Zope server you
can sign up for the "New Survey Technologies" Short Course -
details at:
http://www.unt.edu/training/shortcrs.htm The actual survey can
be found at:
http://kryton.cc.unt.edu:8080/psy3610_survey/QSurvey1 For
example, the first page of questions:

The first step in the modeling process is to set up a model. An
obvious choice might be ANOVA or a Regression Model. Suppose we wish
to predict "current general happiness" from "income", "stress", and
"physical fitness":




Many of the functions in Rcmdr depend on setting up a model first.
To set up a regression model we go to the menu bar and select
"Statistics-Fit models-Linear regression":


Select a response variable: HAPPY, then select the predictor
variables: STRESS, FIT, and SALARY. The regression output will
appear in the R console:

In the output above, we see that SALARY has a positive association with
general happiness: the unstandardized beta coefficient of .75046 has
a significance value of p=.002. STRESS and FITNESS do not appear to
be statistically significant (the p values - Pr(>| t |) - are not less
than .05). Part of conducting regression analysis should be checking
residual diagnostics and checking for outliers.

The diagnostic plots on the residuals indicate that the residuals are
approximately normally distributed (upper right panel - we want a straight
line through the dots). However, two values are marked as potential
outliers with Cook's outlier diagnostic (influential observations).
Case 10 and 15 should be considered for removal:

A more informative view of influential observations can be gained by
looking at an influence plot:

The mouse can be used to identify the case numbers that are potential
outliers. After identifying outliers with the interactive
crosshair, click the "Stop" button in the upper left graphics window, to
return control back the windowing system.


We can remove the data points identified by creating a new data set called
"Survey2". Using the Rcmdr log window type the following script and
highlight this script and click the "Submit" button in the lower left
portion of the Rcmdr window:

This short script removes those cases that are outliers, but retains
all other observations. In this case we have removed case number 10
and 15. Next, we attach the "Survey2" data set so that it is the active working
data set:


Next, we need to setup a new active model. Go to the menu bar
once again and select, "Statistics-Fit models - Linear Regression" and
select HAPPY as the outcome variable and STRESS, FIT, and SALARY as the
predictor variables. The following output is obtained in the R
Console:

It would appear that even with the two outliers removed, that the
relationship between SALARY and HAPPY remains. Scatter-plot matrices
are usually a useful for checking the linearity of the relationships
between the predictor and outcome variables.
 
Select the variables: HAPPY, STRESS, FIT, and SALARY.
Select "Plot by groups" and select GENDER. The "Plot by groups"
allows one to look for sub-populations in the data. That is, are the
relationships between the pairs of variables different for different
groups?

In the figure above, relationships between pairs of variables are seen
at the intersection of the rows and columns for the 16 panels. For example, the relationship
between STRESS and SALARY seem to be different for men and women.
there is a positive relationship between STRESS and SALARY for women
(increasing levels of stress are associated with increasing levels of
salary), whereas, for men there is a negative relationship (increasing
levels of salary are associated with decreasing levels of stress).
However, for both men and women, increasing levels of SALARY are
associated with increasing levels of HAPPY (positive relationship).
Additionally, the relationship between HAPPY and STRESS seem to be
different for men and women. For women, HAPPY does not seem to change as a
function of STRESS. For men, as STRESS increases, HAPPY declines.
On the basis of this new information, for our original model, we would
want to add GENDER as a predictor, and add an interaction term between
GENDER and STRESS. Since GENDER is a categorical predictor, we will
want to choose a linear model that can accommodate continuous and
categorical predictors. Under "Statistics", choose "Fit models -
Linear model".:

Label the model as "Model1". Type in the Model formulae as
depicted below. Main effects, both continuous and categorical appear
as HAPPY=STRESS+FIT+SALARY+GENDER. Interaction terms are constructed
with the colon (GENDER:STRESS). So the full model is:
HAPPY=STRESS+FIT+SALARY+GENDER+GENDER:STRESS


Since we have a categorical predictor in the model, we might want to
display the output as an ANOVA table rather as Regression output.
Click "Models - ANOVA table":


In Type II tests, the order of entry of the predictors is important.
After accounting for STRESS and FIT, SALARY is a significant predictor of
HAPPY. After accounting for STRESS, FIT, and SALARY, GENDER is a
significant predictor of HAPPY. After accounting for all four
predictors, the interaction between GENDER and STRESS is not statistically
significant.
Next Time
We will continue our exploration of the R commander window system in
our third and final part in this series.
References
Dalgaard, Peter (2002). Introductory Statistics with R.
Springer: New York.
Fox, John (2003). The R Commander: A Basic-Statistics GUI for R.
http://www.socsci.mcmaster.ca/jfox/Misc/Rcmdr/index.html
|