Skip Navigation Links

Page One

Campus Computing News

Almost Gone

Summer Hours

Advance Fee Fraud Alert

How Does "Intellectual Property" Hamper Technology?

RSS Matters

The Network Connection

Link of the Month

WWW@UNT.EDU

Short Courses

IRC News

Staff Activities

Subscribe to Benchmarks Online
    

Research and Statistical Support - University of North Texas

RSS Matters

Link to the last RSS article here: R for the Windows Platform: Installation and Configuration - Ed.

R Commander: A Simple Windows Interface for R on the Windows Platform

By Dr. Rich Herrington, Research and Statistical Support Services Manager

This month we demonstrate how to utilize the R library "Rcmd" (R Commander) - a simple menu system for R on the Windows Platform.  The following is an excerpt from the R website http://www.r-project.org  - "R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.  R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity. One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control. R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License in source code form. It compiles and runs out of the box on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux). It also compiles and runs on Windows 9x/NT/2000 and MacOS" (from Introduction).

Starting R in the SDI Mode

To use the Rcmdr library, we will want to configure the R system window to have a "single document interface" mode.  Right mouse click the R icon on your desktop and select properties.  Add " --sdi  " to the Target field after the Rgui.exe statement (see below).  Click apply, then ok.  Double click the R icon on your desktop to start the R system.

R system window configuration example

To load a library in R, select Packages from the main menu bar and select Load Package

 R console example

Select the Rcmdr library in the package selection window.  The following menu system will appear:

R menu  example

The Rcmdr or "R Commander" library is a simple menu system based on Tcl/Tk.  You can read more about Rcmdr at http://www.socsci.mcmaster.ca/jfox/Misc/Rcmdr/index.html .

To import an SPSS data set:

R import example

Give the imported data set a working session name (i.e. Survey), then browse the file system to select your SPSS data set.  We will use survey data that was collected from an undergraduate statistics class. 

R import example

R Commander screen

Editing and Viewing of the data can be accomplished by selecting the "Edit data set" button or the "View data set" button:

Editing/Viewing data example

This survey data was collected as part of an undergraduate course in applied statistics.  The survey was collected using the UNT "Zope" server using the open source package "QSurvey".  You can read more about collecting surveys on this system at: http://www.unt.edu/rss/class/survey/QSurvey.html   If you are interested learning how to collect surveys on the UNT Zope server you can sign up for the "New Survey Technologies" Short Course - details at:  http://www.unt.edu/training/shortcrs.htm  The actual survey can be found at: http://kryton.cc.unt.edu:8080/psy3610_survey/QSurvey1  For example, the first page of questions:

QSurvey example

The first step in the modeling process is to set up a model.  An obvious choice might be ANOVA or a Regression Model.  Suppose we wish to predict "current general happiness" from "income", "stress", and "physical fitness":

QSurvey example

QSurvey example

QSurvey example

QSurvey example

Many of the functions in Rcmdr depend on setting up a model first.  To set up a regression model we go to the menu bar and select "Statistics-Fit models-Linear regression":

R Commander screen

Linear Regression screen

Select a response variable:  HAPPY, then select the predictor variables:  STRESS, FIT, and SALARY.  The regression output will appear in the R console:

The image “file:///C:/Documents%20and%20Settings/Richherr/Desktop/rss.ht112.jpg” cannot be displayed, because it contains errors.

In the output above, we see that SALARY has a positive association with general happiness:  the unstandardized beta coefficient of .75046 has a significance value of p=.002.  STRESS and FITNESS do not appear to be statistically significant (the p values - Pr(>| t |) - are not less than .05).  Part of conducting regression analysis should be checking residual diagnostics and checking for outliers.

The image “file:///C:/Documents%20and%20Settings/Richherr/Desktop/rss.ht121.jpg” cannot be displayed, because it contains errors.

The diagnostic plots on the residuals indicate that the residuals are approximately normally distributed (upper right panel - we want a straight line through the dots).  However, two values are marked as potential outliers with Cook's outlier diagnostic (influential observations).  Case 10 and 15 should be considered for removal:

diagnostic plots

A more informative view of influential observations can be gained by looking at an influence plot:

choosing an influence plot

The mouse can be used to identify the case numbers that are potential outliers.   After identifying outliers with the interactive crosshair, click the "Stop" button in the upper left graphics window, to return control back the windowing system.

using your mouse

Graphics screen

We can remove the data points identified by creating a new data set called "Survey2".  Using the Rcmdr log window type the following script and highlight this script and click the "Submit" button in the lower left portion of the Rcmdr window:

R comander window

This short script removes those cases that are outliers, but retains all other observations.  In this case we have removed case number 10 and 15.  Next, we attach the "Survey2" data set so that it is the active working data set:

R Commander window

selecting a data set

Next, we need to setup a new active model.  Go to the menu bar once again and select, "Statistics-Fit models - Linear Regression" and select HAPPY as the outcome variable and STRESS, FIT, and SALARY as the predictor variables.  The following output is obtained in the R Console:

R Console output

It would appear that even with the two outliers removed, that the relationship between SALARY and HAPPY remains.  Scatter-plot matrices are usually a useful for checking the linearity of the relationships between the predictor and outcome variables. 

Choosing Scatter Plot screenScatter Plot choices

Select the variables:  HAPPY, STRESS, FIT, and SALARY.  Select "Plot by groups" and select GENDER.  The "Plot by groups" allows one to look for sub-populations in the data.  That is, are the relationships between the pairs of variables different for different groups?

Scatter Plots

In the figure above, relationships between pairs of variables are seen at the intersection of the rows and columns for the 16 panels.  For example, the relationship between STRESS and SALARY seem to be different for men and women.  there is a positive relationship between STRESS and SALARY for women (increasing levels of stress are associated with increasing levels of salary), whereas, for men there is a negative relationship (increasing levels of salary are associated with decreasing levels of stress).  However, for both men and women, increasing levels of SALARY are associated with increasing levels of HAPPY (positive relationship).  Additionally, the relationship between HAPPY and STRESS seem to be different for men and women. For women, HAPPY does not seem to change as a function of STRESS.  For men, as STRESS increases, HAPPY declines.  On the basis of this new information, for our original model, we would want to add GENDER as a predictor, and add an interaction term between GENDER and STRESS.  Since GENDER is a categorical predictor, we will want to choose a linear model that can accommodate continuous and categorical predictors.  Under "Statistics", choose "Fit models - Linear model".:

R Commander screen

Label the model as "Model1".  Type in the Model formulae as depicted below.  Main effects, both continuous and categorical appear as HAPPY=STRESS+FIT+SALARY+GENDER.  Interaction terms are constructed with the colon (GENDER:STRESS).  So the full model is:  HAPPY=STRESS+FIT+SALARY+GENDER+GENDER:STRESS

Linear Model Choices

Output to R Console

Since we have a categorical predictor in the model, we might want to display the output as an ANOVA table rather as Regression output.  Click "Models - ANOVA table":

Choosing ANOVA

Anova output

In Type II tests, the order of entry of the predictors is important.  After accounting for STRESS and FIT, SALARY is a significant predictor of HAPPY.  After accounting for STRESS, FIT, and SALARY, GENDER is a significant predictor of HAPPY.  After accounting for all four predictors, the interaction between GENDER and STRESS is not statistically significant.

Next Time

We will continue our exploration of the R commander window system in our third and final part in this series.

References

Dalgaard, Peter (2002).  Introductory Statistics with R.  Springer:  New York.

Fox, John (2003).  The R Commander: A Basic-Statistics GUI for R.  http://www.socsci.mcmaster.ca/jfox/Misc/Rcmdr/index.html