Page One

Campus Computing News

Remedy: Take the Cure

GroupWise Document Management: Storing Documents

Loads O'Links

RSS Matters

The Network Connection

List of the Month

WWW@UNT.EDU

Short Courses

IRC News

Staff Activities

Subscribe to Benchmarks Online
    

RSS Matters

Resampling Based Statistics in S-Plus

By Rich Herrington, Research and Statistical Support Services

This month we take a look at the bootstrap resampling capabilities of S-Plus. S-Plus has general bootstrapping functionality available so that nearly all statistical functions and expressions can be bootstrapped. S-Plus provides both parametric and nonparametric bootstrap confidence intervals.

From the main menu bar, we access the resampling menu from: Statistics - Resample - Bootstrap.

wpe1.jpg (20885 bytes)  

The menu for the Bootstrap facilities has five entry areas for initializing the Bootstrap analysis: Model, Options, Results, Plot, and Jack After Boot. Each of these option tabs are initialized with default values. However, the critical entry field which does not have a default entry is the Expression entry field. Entering an expression to bootstrap can be tricky as this assumes that the user has some knowledge of the syntax of the S-Plus language. 

wpe2.jpg (18869 bytes)

One way of avoiding having detailed knowledge of the syntax used to generate a particular analysis, is to generate the analysis before hand from the drop down menu system. Once this analysis has been run, the syntax used to generate the analysis is displayed. Essentially, the drop down menu system generates the syntax as entry fields are filled in. After an analysis is run from the menu system, this syntax can be saved, cut and pasted back into the Expression entry field. In the following example we will perform a four-group MANOVA with four dependent measures.

Example

The data set we will use for our analysis will have four groups: a control group and three experimental groups (c1, e1, e2, e3). We see a screen capture of the object browser and the data worksheet:

wpe3.jpg (84452 bytes)

From the main menu bar select: Statistics - Multivariate - MANOVA. Select the Create Formula tab. Fill out the create formula tab with the following specifics. First select q1 through q4 and click Add Response. Then select group and click Add Main Effect:

wpe4.jpg (23478 bytes)]

Select OK to return to the previous menu. Select OK once more to actually run the analysis. In the report window we see the following:

wpe5.jpg (16534 bytes)

The calling function is listed under Call. Copy the manova(formula.....) and paste this into your Commands window. Use the summary function to summarize the call to the manova function.  Assign this summary to an object, man.out, for example:

wpe7.jpg (14157 bytes)

Typing man.out by itself displays the contents of this object.names displays the components of this list.  We have six components to this list. To extract the fifth element "Stats". We have to index the list in the following fashion:

wpe8.jpg (33146 bytes)

We see that wilks lambda (.9240) is the second index for the fifth element of the list, man.out. So the complete calling function to the bootstrap function will be:

wpe9.jpg (5994 bytes)

This calling function returns a value of .9240 for wilks lambda for this particular data set. We need to copy this function call: summary(manova.......))[[5]][2], into the Expression window on the bootstrap menu.

wpeA.jpg (19130 bytes)

wpeB.jpg (19280 bytes)

For the Options tab we need to select the grouping variable and how many bootstrap iterations we need:

wpeC.jpg (18178 bytes)

For the Results tab we select empirical percentiles:

wpeD.jpg (16953 bytes)

For the Plot tab we select Normal Quantile-Quantile to see how well the sampling distribution matches with "normal distribution" theory.

wpeE.jpg (14390 bytes)

Selecting OK generates the following report window:

wpeF.jpg (23646 bytes)

And the following plots:

wpe10.jpg (16274 bytes)

wpe11.jpg (13916 bytes)

We see that the empirically resampled sampling distribution for wilks lambda follows normal theory fairly closely except for the right tail region. We see that the upper and lower cut-offs for the 2.5/97.5th and 5/95th percentiles both contain the observed value of wilks lambda. We take this as a failure to reject the null hypothesis for wilks lambda. In general the BCa percentiles will be more accurate than the empirical percentiles.

Further Reading

Davison, A.C. and Hinkley, D.V. (1997). Bootstrap Methods and Their Application. Cambridge University Press.

Efron, B. and Tibshirani, R. J. (1993). An Introduction to the Bootstrap. San Francisco: Chapman & Hall.