Page One

Campus Computing News

Time to Renew PRAS Subscriptions

New Telecom Service Request Procedures

GroupWise E-Mail Issues

Windows 98: Personally I Like it!!

Statistical Computing Tips: S-Plus

The Network Connection

List of the Month

WWW@UNT.EDU

Short Courses

IRC News

Staff Activities

Shift Key

    

Statistical Computing Tips: S-Plus

By Rich Herrington, Research and Statistical Support Services Consultant

The Research and Statistical Support Group in Academic Computing Services will be supporting the S-Plus statistical package this fall. Additionally, short courses on S-Plus will be provided. In this column, we give a brief introduction to this product.

An object oriented programming environment for statistics and data analysis

S-Plus is a value-added commercial version (sold by MathSoft, Inc.) of the programming environment "S" developed by Richard A. Becker, John M. Chambers, and Allan R. Wilks of AT&T Bell Laboratories Statistics Research Department (now Lucent Technologies). S-Plus has an integrated suite of software facilities for data manipulation, computation, and graphical display.

Included in this array of facilities are among others (version 4.5):

  1. An elegant object oriented programming language that includes an enormous amount of functionality (over 2,000 functions). This programming language includes conditionals, loops, and user defined recursive functions;

  2. State of the art exploratory graphical algorithms such as lowess (locally weighted regression smoothing) and trellis graphics (methods for visualizing multivariate data), interactive 2-D and 3-D scatterplot brushing techniques;

  3. Robust methods such as LTS regression (least trimmed regression), LAD regression (least absolute deviation regression, M estimates, and nonparametric regression (Alternating Conditional Expectations, Projection Pursuit Regression);

  4. Modern methods of statistical modeling (general bootstrap and jacknife routines, linear and nonlinear mixed effects models, an object-oriented matrix library, GAM modeling - Generalized Additive Model, and GLM modeling - Generalized Linear Models).

S-Plus (as of version 4.0) includes a graphical interface to most of the data manipulation and statistical algorithms available in S-Plus (for a demonstration of the graphical capabilities of S-Plus refer to http://www.mathsoft.com/splus/splsprod/splsdes.html). However, the real power of S-Plus lies in its programmability through the command interface. The S-Plus language is interactive, as each command is executed as they are entered. The S-Plus language is based on the use of functions to perform calculations, set system options, manipulate graphical objects, fit statistical models, etc. Variables can refer to scalar values, vectors, matrices, or other forms (lists - an arbitrary collection of scalars, vectors, matrices or other objects).

Most importantly, the S-Plus language is an object oriented programming language. Read "Object-Oriented Programming in S-PLUS" for a discussion of object oriented programming in S-Plus.

The S-Plus User Community

Perhaps the most valuable feature of S-Plus is the active user community, which provides S-Plus code and programming advice on the S-news mailing list (to subscribe, send an electronic message to s-news-request@utstat.toronto.edu with the message subscribe). A fairly large collection of freeware S-Plus code can be found on the statlib Web server. Moreover, S-Plus seems to be a popular choice among applied statisticians for the development of new statistical algorithms. S-Plus functions are usually provided by authors, free of charge. For example, selections of texts that include the complete S-Plus implementation for their statistical algorithms are:

  • "Applied Wavelet Analysis with S-PLUS" by Andrew Bruce and Hong-Ye Gao MathSoft Data Analysis Products Division Springer-Verlag, New York, NY, 1996.

  • "Statistical Tools for Nonlinear Regression: A Practical Guide with S-PLUS Examples" by S. Huet, A. Bouvier, M.-A. Gruet, and E. Joliet Springer-Verlag, New York, NY, 1996.

  • "Algorithms, Routines and S Functions for Robust Statistics" by A. Marazzi Chapman & Hall, 1993.

  • "Smoothing Techniques with Implementation in S" by W. Haerdle Springer, 1991.

General references on the S language in general are:

  • "The New S Language" by Richard A. Becker, John M. Chambers, and Allan R. Wilks Wadsworth & Brooks/Cole, Pacific Grove, CA, 1988 Available through the MathSoft Seattle office.

  • "Statistical Models In S" by John Chambers and Trevor Hastie Chapman and Hall, 1992 Available through the MathSoft Seattle office.

A more complete listing of third party texts on S-Plus is available at http://www.mathsoft.com/splsprod/Biblio.htm.

An Advantage for Educators

An advantage of the S or S-Plus language for educators is the existence of the public domain, S clone software: "R". R is collaborative project whose purpose is to develop a freeware system for statistical computation and graphics. R has been heavily influenced by S, and the resulting language is very similar in appearance to S. However, the underlying implementation and semantics are derived from the programming language "Scheme". In fact, the majority of S code (as described in the general references above) will run in R unchanged. There is a WIN95/NT version of R available for download at http://www.stat.math.ethz.ch/R-CRAN/bin/ms-windows/win-32/rw0613b.zip (Note: Some memory resident programs such as virus checking software or previously loaded DLL files will interfere with this WIN95/NT version of R. Hitting CTL-ALT-DELETE will bring up a menu whereby one can selectively shut down these programs to find the offending program).

Unlike S-Plus, R is not a value-added commercial application, and lacks the GUI interface of S-Plus 4.0, 4.5. R only provides a command interface and a graphical interface. Additionally, it lacks quite a bit of statistical functionality of its commercial counterpart. However, R is an ongoing project and its functionality increases steadily. For the purposes of a course on statistical programming, R cannot be beat for its value and breadth (also R can be more efficient than S-Plus in terms of utilization of memory). More importantly, R source code and binaries exist for a number of platforms: UNIX, LINUX, Macintosh, and WIN95/NT. At some point in the future the Research and Statistical Support Office will support a UNIX version of R on Sol. n