|
RSS MattersBy Craig Henderson, Research and Statistical Support ServicesA Report on the S-Plus User's Conference, October 1999I recently had the privilege of attending the S-Plus User's Conference in New Orleans. This was a very intimate gathering; there were approximately 300 attendees, and the major software developers were present. The informal atmosphere allowed for easy access to users of all levels. I would like to report to you some of what I was able to attend while present. While I was there I was able to spend some time at the S-Plus User's Conference. There were three main emphases throughout the conference, one geared toward communicating some of the new developments of S-Plus 2000, one geared toward introducing the S-Plus StatServer, and one geared toward communicating new statistical developments. Some of the biggest names in applied statistics were present such as Rob Tibshirani, Frank Harrell, William Venables, and Brian Ripley. I found the conference to be a very informative and helpful experience. S-Plus 2000Much development has gone into expanding the menu capabilities in S-Plus 2000. This has been done primarily to make the software more user-friendly for new users. The statistics dialog boxes have been redesigned to make more available from the GUI interface. In addition, tips of the day are available at start-up and data tips are available from the mouse cursor. The idea behind these developments is to give S-Plus 2000 "an Office 98 look and feel". However, the vast majority of users and developers agreed that to really use S-Plus as it was intended to be used, knowledge of the command language is essential. There have also been developments of new graphical procedures and improved statistics. New graphics available in S-Plus 2000 include HLOC plots, nonlinear curve fitting plots, combined vertical/horizontal error bar charts, and Smith charts. The smoothing options available have also been expanded. A strength of S-Plus has always been the development of advanced, modern statistical techniques. S-Plus 2000 has lived up to this reputation, contributing state of the art linear and nonlinear mixed effects modeling and updated survival analysis. S-Plus 2000 also has a discriminant function analysis option that has not been previously available. However, it was advised in another session to use logistic regression rather than discriminant analysis, since the assumptions of discriminate analysis are so restrictive. S-Plus 2000 has also incorporated some of the more popular user-written libraries into its core features. This includes Venables and Ripley's MASS library and Frank Harrell's hmisc and design libraries. Other developments include more examples in the available documentation, and HTML on-line help, available under the S-Plus object hierarchy. S-Plus 2000 has also addressed some historical performance problems. By default, S-Plus now checks to see whether your system is an Intel Pentium processor, and if so, uses Intel's Math Kernel Library BLAS routines (in SHOME/cmd/mkl_intf.dll). These routines are optimized for Intel Pentiums, and thus significant speed-up should be observed in certain S-Plus operations (such as matrix multiplication) that call BLAS routines. Significant speed-up of certain operations can be obtained when using a Pentium multi-processor machine. StatServerS-Plus advertises their as follows "StatServer is a statistical data mining system for distributing analyses and graphics to decision-makers. It is designed to help non-statistician users discover patterns and trends hidden in corporate databases. StatServer enables better, more informed decision making throughout organizations." I attended one session on StatServer conducted by a research scientist for Merck Research Laboratories. The dominant perspective for their division in employing StatServer is "Instead of the statistician as 'qualified specialist' doing data analysis for us, we prefer advice and tools from statisticians in order to do it ourselves." Having a desktop computer running Windows 98 with Excel and Netscape, their clients are able to submit jobs to the StatServer. The StatServer, which uses S-Plus as the processing engine, in turn outputs editable output, graphics, and tables to the web browser of the local computer. Merck laboratories was then able to focus their efforts on automating routines useful to their customers, and allowing their customers to do the analyses themselves, as opposed to hiring a full-time statistical consultant. This model seemed useful to me for large consulting firms, but I did not see much application for most professors functioning in the academic environment. Statistical DevelopmentPerhaps the most useful portion of the conference for me was the information presented on developments of statistical techniques. I was able to attend a one day workshop delivered by Brian Ripley that was very instructive. He built on the theme of his book, Modern Data Analysis in S-Plus, to demonstrate how modern analysis techniques such as smoothing splines and automated transformations could be used in the context of regression, robust regression, mixed effect models, etc. Rather than belaboring the details, I will forward you to Brian Ripley's Website in which he has the slides from his presentation along with the S-Plus scripts used to run the analyses. The URL is: http://www.stats.ox.ac.uk/pub/bdr/NewOrleans99/. Tim Hesterberg also presented some very interesting work on improving the performance of bootstrapping by sampling from the empirical distribution with unequal probabilities. He presented data that suggests that bootstrap tilting has a 17:1 iteration advantage versus bootstrap empirical limits and a 37:1 advantage versus BCa limits. His paper and software for doing bootstrap tilting is available at http://www.statsci.com/Hesterberg/tilting. Jose Pinheiro from Bell Laboratories also presented a paper on the mixed effects and nonlinear mixed effects modules available in S-Plus. These modules allow the user great flexibility in analyzing mixed effects models, including the ability to analyze repeated measures, longitudinal data, growth curves, and multilevel modeling. The thing that particularly stands out about these modules is the ability to do nonlinear modeling. The NLME library, which contains the software to do these analyses, is available from http://nlme.stat.wisc.edu. But I won't kid myself, the best part about the trip was spending the weekend in New Orleans. I enjoyed some great jazz, some great food, and a cultural experience that only New Orleans can offer. It was even uncharacteristically cool for that time of year. Jealous? As always, if I can be of any assistance, please contact me at 565-2140, or email craigh@unt.edu. |