
RSS MattersLink to the last RSS article here: R for the Windows Platform: Installation and Configuration  Ed. R Commander: A Simple Windows Interface for R on the Windows PlatformBy Dr. Rich Herrington, Research and Statistical Support Services ManagerThis month we demonstrate how to utilize the R library "Rcmd" (R Commander)  a simple menu system for R on the Windows Platform. The following is an excerpt from the R website http://www.rproject.org  "R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, timeseries analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity. One of R's strengths is the ease with which welldesigned publicationquality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control. R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License in source code form. It compiles and runs out of the box on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux). It also compiles and runs on Windows 9x/NT/2000 and MacOS" (from Introduction). Starting R in the SDI Mode To use the Rcmdr library, we will want to configure the R system window to have a "single document interface" mode. Right mouse click the R icon on your desktop and select properties. Add " sdi " to the Target field after the Rgui.exe statement (see below). Click apply, then ok. Double click the R icon on your desktop to start the R system.
To load a library in R, select Packages from the main menu bar and select Load Package.
Select the Rcmdr library in the package selection window. The following menu system will appear:
The Rcmdr or "R Commander" library is a simple menu system based on Tcl/Tk. You can read more about Rcmdr at http://www.socsci.mcmaster.ca/jfox/Misc/Rcmdr/index.html . To import an SPSS data set:
Give the imported data set a working session name (i.e. Survey), then browse the file system to select your SPSS data set. We will use survey data that was collected from an undergraduate statistics class. Editing and Viewing of the data can be accomplished by selecting the "Edit data set" button or the "View data set" button: This survey data was collected as part of an undergraduate course in applied statistics. The survey was collected using the UNT "Zope" server using the open source package "QSurvey". You can read more about collecting surveys on this system at: http://www.unt.edu/rss/class/survey/QSurvey.html If you are interested learning how to collect surveys on the UNT Zope server you can sign up for the "New Survey Technologies" Short Course  details at: http://www.unt.edu/training/shortcrs.htm The actual survey can be found at: http://kryton.cc.unt.edu:8080/psy3610_survey/QSurvey1 For example, the first page of questions: The first step in the modeling process is to set up a model. An obvious choice might be ANOVA or a Regression Model. Suppose we wish to predict "current general happiness" from "income", "stress", and "physical fitness": Many of the functions in Rcmdr depend on setting up a model first. To set up a regression model we go to the menu bar and select "StatisticsFit modelsLinear regression":
Select a response variable: HAPPY, then select the predictor variables: STRESS, FIT, and SALARY. The regression output will appear in the R console:
In the output above, we see that SALARY has a positive association with general happiness: the unstandardized beta coefficient of .75046 has a significance value of p=.002. STRESS and FITNESS do not appear to be statistically significant (the p values  Pr(> t )  are not less than .05). Part of conducting regression analysis should be checking residual diagnostics and checking for outliers.
The diagnostic plots on the residuals indicate that the residuals are approximately normally distributed (upper right panel  we want a straight line through the dots). However, two values are marked as potential outliers with Cook's outlier diagnostic (influential observations). Case 10 and 15 should be considered for removal: A more informative view of influential observations can be gained by looking at an influence plot: The mouse can be used to identify the case numbers that are potential outliers. After identifying outliers with the interactive crosshair, click the "Stop" button in the upper left graphics window, to return control back the windowing system. We can remove the data points identified by creating a new data set called "Survey2". Using the Rcmdr log window type the following script and highlight this script and click the "Submit" button in the lower left portion of the Rcmdr window: This short script removes those cases that are outliers, but retains all other observations. In this case we have removed case number 10 and 15. Next, we attach the "Survey2" data set so that it is the active working data set: Next, we need to setup a new active model. Go to the menu bar once again and select, "StatisticsFit models  Linear Regression" and select HAPPY as the outcome variable and STRESS, FIT, and SALARY as the predictor variables. The following output is obtained in the R Console: It would appear that even with the two outliers removed, that the relationship between SALARY and HAPPY remains. Scatterplot matrices are usually a useful for checking the linearity of the relationships between the predictor and outcome variables. Select the variables: HAPPY, STRESS, FIT, and SALARY. Select "Plot by groups" and select GENDER. The "Plot by groups" allows one to look for subpopulations in the data. That is, are the relationships between the pairs of variables different for different groups? In the figure above, relationships between pairs of variables are seen at the intersection of the rows and columns for the 16 panels. For example, the relationship between STRESS and SALARY seem to be different for men and women. there is a positive relationship between STRESS and SALARY for women (increasing levels of stress are associated with increasing levels of salary), whereas, for men there is a negative relationship (increasing levels of salary are associated with decreasing levels of stress). However, for both men and women, increasing levels of SALARY are associated with increasing levels of HAPPY (positive relationship). Additionally, the relationship between HAPPY and STRESS seem to be different for men and women. For women, HAPPY does not seem to change as a function of STRESS. For men, as STRESS increases, HAPPY declines. On the basis of this new information, for our original model, we would want to add GENDER as a predictor, and add an interaction term between GENDER and STRESS. Since GENDER is a categorical predictor, we will want to choose a linear model that can accommodate continuous and categorical predictors. Under "Statistics", choose "Fit models  Linear model".: Label the model as "Model1". Type in the Model formulae as depicted below. Main effects, both continuous and categorical appear as HAPPY=STRESS+FIT+SALARY+GENDER. Interaction terms are constructed with the colon (GENDER:STRESS). So the full model is: HAPPY=STRESS+FIT+SALARY+GENDER+GENDER:STRESS Since we have a categorical predictor in the model, we might want to display the output as an ANOVA table rather as Regression output. Click "Models  ANOVA table": In Type II tests, the order of entry of the predictors is important. After accounting for STRESS and FIT, SALARY is a significant predictor of HAPPY. After accounting for STRESS, FIT, and SALARY, GENDER is a significant predictor of HAPPY. After accounting for all four predictors, the interaction between GENDER and STRESS is not statistically significant. Next Time We will continue our exploration of the R commander window system in our third and final part in this series. References Dalgaard, Peter (2002). Introductory Statistics with R. Springer: New York. Fox, John (2003). The R Commander: A BasicStatistics GUI for R. http://www.socsci.mcmaster.ca/jfox/Misc/Rcmdr/index.html
