Last month we
examined using the bootstrap and robust estimation to calculate statistical power, this
month we explore the use of the smoothed bootstrap with small data sets. The GNU S
language, "R" is used to implement this procedure. R is a statistical
programming environment that is a clone of the S and S-Plus language developed at Lucent
Technologies. In the following document we illustrate the use of a GNU Web interface to
the R engine on the "rss" server, http://rss.acs.unt.edu/cgi-bin/R/Rprog.
This GNU Web interface is a derivative of the "Rcgi" Perl scripts available for
download from the CRAN website, http://www.cran.r-project.org
(the main "R" website). Scripts can be submitted interactively,
edited, and be re-submitted with changed parameters by selecting the hypertext link
buttons that appear below the figures. For example, clicking the "Run
Program" button below samples 1000 random numbers from a normal distribution,
then uses nonparametric density estimation to fit a density curve to the data. To
view any text output, scroll to the bottom of the browser window. To view the
density curve, select the "Display Graphic" link. The script can be edited
and resubmitted by changing the script in the form window and then selecting
"Run the R Program". Selecting the browser "back page"
button will return the reader to this document.
TheDisadvantages
of Using Small Sample Sizes with the Bootstrap
In the nonparametric bootstrap, samples are drawn from a discrete set.This can be a serious disadvantage in small sample
sizes in that spurious fine structure, in the original data, may be faithfully reproduced
in the simulated data that has not occurred in the population.Difficulties can arise if the goal of the simulation is
to produce samples that have the underlying true structure of the observed
data without having spurious details arise from random effects.Another concern is that with small samples,
with only a few values to select from, the bootstrap samples will underestimate the true
variability.Statisticians generally regard
the use of the bootstrap with sample sizes less than 10 as too small to rely on (Chernick,
1999).
The Smoothed Bootstrap
One approach to dealing with the discreteness of the empirical distribution function
with small sample sizes, is to smooth the empirical distribuion function and then resample
from the smoothed empirical distribution function.It
has been shown that the nonparametric bootstrap is improved in non-smooth cases, such as
the median (Fernholz, 1993).Even though the
smoothed bootstrap was considered early on by bootstrap researchers, there was
little evidence to indicate under which conditions smoothing would be beneficial (Hall P.,
DiCiccio, T. & Romano, J, 1989; Silverman, B.W., & Young, G.A., 1987).Recent research on the smoothed bootstrap
demonstrates that for samll sample sizes, with proper kernel bandwith selection, smoothing
the empirical distribution function can yeild a first-order reduction in coverage error
for the one-sided percentile method.The
one-sided percentile method, based on the smoothed bootstrap with an optimally chosen
bandwith, becomes asymptotically as accurate as either the bootstrap t or the accelerated
bias correction (Bca) methods(Polansky, A.M.
& Schucany, W.R., 1997).Similar
arguments show that second-order corrections can be realized for first-order correct
confidence intervals such as the two-sided percentile method intervals and bootstrap t
intervals (Polansky, 2001).The smoothed
bootstrap can also decrease coverage error for finite samples as well (Polansky, 2000).That is, type I error for small sample sizes can
be reduced by smoothing the empirical distribution function, when using the Percentile
Bootstrap method for calculating confidence intervals. It is important for the present
study, to note that Fernholz (1993, 1997) proved that by smoothing the empirical
distribution function with an appropriate kernel, the variance and the mean square error
of certain statistical functionals can be reduced.A
functional is a mapping that assigns a real value to a function.Examples of functionals are the parameters of
distribution functions, including the mean, the variance, the skewness and the kurtosis of
the distribution.Other examples include
sample quantiles, some L-estimators, and M-estimators (Fernholz, 1997).Specifically, Fernholz demonstrates that a smaller
variance is achieved when the influence function is either discontinuous (such as in the
median) or piecewise linear with convexity towards the x-axis (such as in the Huber and
biweight type M-estimators).Essentially,
the smoothed bootstrap can be used to improve overall performance (decrease bias, MSE of
estimators) in small sample sizes.Brown,
Hall, and Young (2001) show that for the median, that smoothing increases efficiency for
normal data over that of the conventional median. The algorithm of the smoothed
bootstrap is outlined in Silverman, B.W. (1986, page 141).The basic idea is to set:
where each X consist of the bootstrap observations from a
bootstrap sample; eis a random deviate from a probability
density function K ; and his
a smoothing parameter that can be calculated from the sample moments (e.g. standard
deviation).The probability density Kis
often referred to as the kernel.A
natural candidate for the kernel is the Gaussian distribution (normal distribution).If a Gaussian kernel is selected, then an optimal
smoothing parameter can be estimated from the data (a so-called plug-in
estimate of h ):
where s is estimated from the sample data.hwill work well if the population is normally
distributed, but it may oversmooth if the population is either multi-modal or skewed (the sample estimate of s
is not a resistant measure).Silverman
(1986, page 46) reports that for heavily skewed data, hwill oversmooth, but that
the formula is remarkably insensitive to kurtosis within the t family of
distributions.To give some
sense of how the smoothing operation effects a skew distribution, The figures below, show
an exponential distribution of sample size 50.The
top panel displays little or no smoothing, and the bottom panel displays much greater
smoothing.It is evident that most of the
skewness in the original data set is gone, and tends toward a symmetric distribution
resembling a normal distribution.However, if
the point is to only smooth local irregularities, but retain the overall shape of the
distribution, oversmoothing will mis-represent the underlying population distribution.
The Smoothed Bootstrap
Implemented in the "S" Language
The following S code set implements the
smoothed bootstrap using two different kernels (K), and with various different
window estimators (h). Twenty numbers are sampled from the normal
distribution; this becomes our population that we will resample from. The following
code chooses a Gaussian kernel with a standard smoothing parameter for the smoothed
bootstrap. The population mean is estimated using the sample mean. One might
change the following code to explore the effects of: 1) using different sample
sizes, 2) using different numbers of bootstrap samples; 3) using either the variance
corrected or uncorrected versions of the kernels, 3) using robust estimators with
the smoothed bootstrap, rather than the mean, and 4) using different window estimators
combined with different kernels.
Results
Running the S code listed above produces the following text and graphics:
The preceeding code uses the variance adjusted Gaussian kernel to smooth the empirical
distribution function (rdensity3). We see that the average mean of the 500 bootstrap
samples is virtually identical to the original sample: .568. The average variance of
the 500 bootstrap samples, underestimates the original sample variance by a substantial
amount: .885 versus the original population variance of 1.00. The 200th bootstrap
sample is extracted to plot against the original sample to see how well the shape of the
200th bootstrap sample, with an N=500, captures the shape of the original sample. As
can be seen below, the larger sample size smooths the discreetness of the original sample
while retaining the overall shape. These results also suggest that the variance
corrected version of the Gaussian kernel "overadjusts" for the variance
introduced by the smoothing parameter. Further simulation results using the
unadjusted version of the Gaussian kernel (rdensity5) show that the unadjusted version
does very well in recapturing the population variance. Try re-running the S code
above, replacing the sample size with N=10 (use rnorm(10) instead of rnorm(20) in
the beginning of the program), and the kernel density estimator with "rdensity5"
instead of using "rdensity3" (x.new.boot<-rdensity3(orig.data=xdata,
samp.size,
num.boot.samp, window=1). Compare the mean and variance of the original data (xdata)
with the mean and variance of all 500 bootstrap samples. How close are they compared
to the results displayed above?
Conclusions
The bootstrap methodology allows the performance of classical and robust estimators to
be evaluated in real world data sets. However, much still remains unknown about the
finite sample properties of the bootstrap.In
particular, small sample sizes can cause the bootstrap to fail, and give poor error
coverage for type I errors, for a number of the more popular methods for calculating
confidence intervals (Bca, studentized t bootstrap, and the percentile bootstrap).In contrast, the smoothed bootstrap holds promise
for a number of robust estimators (median, L-estimators, M-estimators, quantile
estimators), in small sample settings (i.e. approximately N< 15).However, it is evident that the proper selection
of the smoothing parameter (h ) is
important so that oversmoothing or undersmoothing does not occur.Like robust estimators, smoothing the empirical
distribution function can reduce the impact of heavy tails on a location estimatorl.Optimal selection of the smoothing parameter, h , is important so that undersmoothing or
oversmoothing does not occur.Various
approaches have been tried, for example, adaptive estimators (Silverman, 1986, page 48).Robust estimators such as M-estimators, optimally
downweight the tails according to a statistical critierion (maximum likelihood) for a
given set of tuning constants.Tuning
constants are recommended that work well with a wide range of distributions found in real
data (Hoaglin, Moesteller, & Tukey, 1983).The
combined use of the smoothed bootstrap with an M-estimator as a location estimate, calls
for an optimal combination of the tuning constants (e.g. k ) for robust location, and the smoothing
parameter (e.g. h ) for the smoothed
bootstrap.One computational approach
towards this goal, would be to use various combinations of hand
k , and choose the combination that
produces the shortest possible confidence intervals while minimizing the coverage error
under the null hypothesis .Used in this way, the parameters hand
kbecome
calibration coefficients (Polansky, 2001, page 822). Polansky (2001) reports theortetical
results and simulation results that support this approach for the choice of hin
small sample sizes (N<20 ). In
summary, the bootstrap has become such a important tool, both theoretically and
application-wise, that it has led Peter Hall, an emminent figure in the bootstrap research
field, to comment, The bootstrap has had a great impact on the practice of
statistics, to the extent that the property of being bootstrappable might well be added to
those of efficiency, robustness and ease of computation, as a fundamentally desirable
property for statistical procedures in general (Brown, 2001).
References
Brown, B.M., Hall, P., & Young, G.A. (2001).The
smoothed median and the bootstrap. Biometrika, 88(2), 519-534.
Chernick, Michael, R. (1999).Bootstrap
Methods: A Practitioner Guide, John Wiley and Sons Inc.: New York.
Fernholz, L.T. (1993). Smoothed Versions of Statistical Functionals. In:
New Directions in Statistical Data Analysis and Robustness. Eds: S.
Morgenthaler, E. Ronchetti, W.A. Stahel. Birkhauser Verlag, Bosten.
Fernholz,
L.T. (1997). Reducing the variance by smoothing. Journal of Statistical
Planning and Inference, 57, 29-38.
Hall P., DiCiccio, T. & Romano, J. (1989). On smoothing and the bootstrap. The
Annals of Statistics, 2, 692-704.
Hoaglin, D. C., Mosteller, F., & Tukey, J.W. (1983). Understanding robust and
exploratory data analysis. New York: Wiley.
Polansky, A.M. (2001). Bandwidth selection for the smoothed bootstrap percentile
method.Computational Statistics and Data
Analsysis, 36, 333-349.
Polansky, A.M. and Schucany, W.R.(1997).Kernel
Smoothing to Improve Bootstrap Confidence Intervals. J.R. Statist. Soc. B, 59(4), 821-838.
Polansky,A.M. (2000).Stabilizing
bootstrap-t confidence intervals for small samples.The
Canadian Journal of Statistics, 28, 501-516.
Silverman, B.W. (1986). Density estimation for statistics and data analysis. Chapman
and Hall, London.
Silverman, B.W., & Young, G.A. (1987). The bootstrap: To smooth or not
to smooth. Biometrika, 74, 469-479.