|
|
|
RSS Matters
An Introduction to Robust Measures of Location Using GNU S-Plus
By Dr. Rich Herrington, Research and Statistical Support
Consultant
This month we demonstrate the use of robust estimators of location using the GNU S-Plus language, "R". R is
a statistical programming environment that is a clone of the S and S-Plus language developed at Lucent
Technologies. In the following document we will illustrate the use of a GNU Web interface to the R engine
on the "rss" server, http://rss.acs.unt.edu/cgi-bin/R/Rprog. This GNU Web
interface is a derivative of the "Rcgi" Perl scripts available for download from the CRAN Website,
http://www.cran.r-project.org (the main "R" Website). Scripts
can be submitted interactively, edited, and re-submitted with changed parameters by selecting the hypertext
link buttons that appear below the figures. For example, clicking the button below:
opens a new window with a "browser entry form" where the program code that has been submitted is
displayed. The script can be edited and re-submitted to produce a new program output. Scrolling down the
browser window displays text from the program execution. Selecting the "Display Graphic" link will open
another browser window where graphics will be displayed. Readers are encouraged to change program
parameters to see what the effect will be on results.
Introduction to Robust Estimation
Conventional wisdom has often promoted the view that standard ANOVA techniques are robust to non-normality.
However, this view is with respect to type I error (Wilcox, 1998). When it is assumed that there are no
differences between groups in a group difference testing setting, then the probability level corresponding
to the critical cut-off score, used to reject the null hypothesis, is found to be close to the nominal
level of .05. However, many statistical journals have pointed out that standard methods are not robust when
differences exist (Hample, 1973; Tukey, 1960). As early as 1960, it was known that slight deviations away
from normality could have a large negative impact on power whenever means were being compared, and that
popular measures of effect size could misleading (Tukey, 1960). Later, a theory of robustness was developed
by Huber (1964) and Hampel (1968). Today, there is a well established mathematical foundation for dealing
with these issues(Huber, 1981; Rousseeuw & Leroy, 1987). Moreover, basic coverage of the theory and the
use of computer software in performing robust analyses can be found in introductory textbooks (Rand Wilcox,
1997, 2001).
Dealing with Outliers
It is often assumed in the social sciences that data conform to a normal distribution.Numerous studies have
examined real world data sets for conformity to normality, and have strongly questioned this assumption
(Hampel, 1973; Tukey, 1960; Micceri, 1989; Stigler, 1977).Sometimes we may believe that a normal
distribution is a good approximation to the data, and at other times we may believe this to be only a rough
approximation.Two approaches have been taken to incorporate this reality.One approach is a two-stage
process whereby influential observations are identified and removed from the data. So-called outlier
analysis involves the calculation of leverage and influence statistics to help identify influential
observations (Rousseeuw & Leroy, 1987). The other approach, robust estimation, involves calculating
estimators that are relatively insensitive to the tails of a data distribution, but which conform to normal
theory approximation at the center of the data distribution. These robust estimators are somewhere between
a nonparametric or distribution free approach, and a parametric approach. Consequently, a robust approach
distinguishes between plausible distributions the data may come from, unlike a nonparametric approach,
which treats all possible distributions as equal.The positive aspect of this is that robust estimators are
very nearly as efficient (very nearly optimal estimators) as the best possible estimators(Huber, 1981). It
is possible to get a sense of how much you can violate the normality assumption before inferences are
compromised.
Symmetric and Asymmetric Distributions
Historically, statisticians have focused on estimators that assume symmetry in the population. The reason
for this is that estimators of location are best understood when a distributions natural candidates for
location all nearly coincide (e.g. mean, median, mode).Additionally, when a distribution is treated in a
symmetric way so that no bias arises, a trade off is not needed between bias and variability (e.g.
M-estimators with odd influence functions are unbiased estimators whenever the distribution is symmetric).
Moreover, whenever a distribution is admitted as skew, there is some question as to what measure of
location we are trying to estimate.That is, asymmetric distributions do not have a natural location
parameter as the center of symmetry, of a symmetric distribution (Hoaglin, Mosteller, & Tukey, 1983).
It is a common practice to re-express the data, such as in a functional transformation (e.g.
log-transformation), so that the data more nearly resembles a symmetric distribution.Often, if the
departure from symmetry is not too large, it is found that estimators that rely on symmetry are still
satisfactory (Hoaglin, Mosteller, & Tukey, 1983). The use of quantile-quantile plots can aid in the
assessment of skewness (see below). In the case of M-estimators for location, we would like the M-estimate
to be an unbiased, robust estimate of the population mean.This goal can be realized in the case of a
symmetric distribution.
The Contaminated Normal
For example, Tukey (1960) showed that classical estimators are quite sensitive to distributions which have
heavy tails. The approach Tukey took was to sample from a continuous distribution called the contaminated
normal (CN). The contaminated normal is a mixture of two normal distributions, one of which has a large
variance; the other distribution is standard normal. The contaminated normal has tails which are heavier,
or thicker, than the normal distribution.This can be illustrated by the use of quantile-quantile plots. The
empirical quantiles of a data set are graphed against the theoretical quantiles of a reference distribution
(i.e. normal distribution). Deviations away from the straight line indicate deviations away from the
reference distribution. In the figure below, the quantile-quantile plot illustrates a heavy-tailed
distribution.
Robust estimators are considered resistant if small changes in many of the observations or large changes in
only a few data points have a small effect on its value. For example, the median is considered an example
of a resistant measure of location, while the mean is not. In the figure below, the sampling distributions
of the mean and median are plotted when sampling from the contaminated normal distribution (CN). Sampling
occurred from the CN distribution where there is a 90% probability of sampling from N(0, 1) and 10%
probability of sampling from N(0, 10) and the population mean for the CN is zero. Notice that there is
substantially more variability in our estimate of the population mean when using the sample mean to
estimate the population mean, than when using the sample median to estimate the population mean. Also, the
sample median is closer to the population mean of zero, than is the sample mean.
The Trimmed Mean
One problem with the median however is that its value is determined by only 1 or 2 values in the data set
information is lost.The trimmed mean represents a compromise between the mean and the median (Huber, 1981).
The trimmed mean is computed by putting the observations in order. Next, trim the numbers by removing the
d largest and d smallest observations, and then compute the average of the remaining numbers.
d can be between 0 and n/2. Trimming enough data gives the sample median.Rules of thumb are that
20%-25% (d=.2*n) trimming works well in a wide range of settings(Wilcox, 1997).Another approach to
selecting the trimming amount is to calculate the mean for 0, .10, .20 and then use the trimming value that
corresponds to the smallest standard error (Leger and Romano, 1990).
M-Estimators
The trimmed mean is based on a preset amount of trimming. A different approach is to determine empirically
the amount of trimming necessary. If the data come from a normal distribution, then light or no trimming is
necessary. If the data come from a heavy tailed distribution, then a heavier amount of trimming is desired
in both tails. If the distribution has a heavy right tail, then more trimming might be desired from the
right tail; or if the distribution has a heavy left tail, more trimming from the left tail might be
appropriate.Essentially, M-estimators accomplish this appropriate amount of trimming by meeting certain
statistical criterion for what is considered a good estimator (i.e. maximum-likelihood principle). For the
M-estimator, the degree of trimming is determined by a trimming constant, k.
Desirable Properties of a Robust Estimator
A good robust estimator is asymptotically consistent and unbiased (the estimator converges on the true
population value as sample size increases). Additionally, a good robust estimator should be efficient when
the underlying distribution is normal, but still be relatively efficient when the tails of the distribution
deviate from normality. That is, the variance of the sampling distribution for the estimator should be
small whether we are sampling from a normal or non-normal distribution.When sampling data from a normal
distribution, the mean is a minimum variance estimator.That is, the mean is considered an optimal estimator
because the variance of its sampling distribution is as small as possible assuming an underlying normal
distribution. While the mean is an optimal estimator, it does not possess other characteristics which are
associated with a good estimator. Whenever sampling from a non-normal distribution, the mean can lose many
of the properties which make it an optimal estimator. Efficient estimators exist for situations where
non-normality is present. These estimators are refereed to as robust estimators.
Comparing Estimators - Asymptotic Relative Efficiency
Efficiency refers to the variance of the sampling distribution for the estimator. High efficiency
estimators have small variance in the sampling distribution for the estimator.Efficiency will affect the
power of a test procedure in that less variance in the sampling distribution for the estimator being
tested, will lead to higher power for the statistical test. here are two ways of viewing efficiency. Finite
sample efficiency refers to the variance of the sampling distribution for the estimator as it is applied in
small sample settings. Asymptotic efficiency refers to the way an estimator performs as the sample size
gets larger. It is a common practice to compare estimators to one another using Asymptotic Relative
Efficiency (ARE). For a fixed underlying distribution, we define the Relative Efficiency (RE) of one
estimator to another estimator as the ratio of the two variances of the estimators, and ARE is the
asymptotic value of RE as the sample size goes to infinity. For example, to compare the efficiencies of the
mean and median, one would sample from a fixed underlying distribution and fixed sample size (i.e. normal
distribution), then divide the variance of the median into the variance of the mean. As the sample size
increases, this ratio will converge to the ARE of the two estimators. In this way, estimators can be
compared with respect to the different types on non-normality that is found in data analysis settings.
Robustness Properties: High Breakdown and Resistance
High breakdown is the largest percentage of data points that can be arbitrarily changed and not unduly
influence the estimator (e.g. location parameter). For example, the median has 50% breakdown.That is, for
100 rank ordered data points, the first 49 points can be changed arbitrarily such that the values are still
less than the median, and the median will not change. The mean is not considered a robust estimator because
changing one observation arbitrarily can greatly influence the mean. This implies that the mean has a
breakdown of (1/n) x 100. As n increases, the breakdown of the mean linearly decreases in an unbounded
fashion. In comparison, the median has a much higher breakdown than the mean, and as such, is considered a
more robust estimate of location.
A Comparison of Four Robust Estimators of Location
The median has a breakdown of 50%. The trimmed mean has a breakdown that corresponds to the degree of
trimming that is utilized. For example, a 20% trimmed mean has a breakdown of 20%. The mean has a breakdown
of (1/n)x100, where n is the sample size. For Huber type estimators, the breakdown will depend on the
trimming constant k. In the figure below, the sampling distributions of the sample mean, sample trimmed
mean, sample M-estimator, and the sample median are plotted. Sampling occurred from the CN distribution
where there is a 90% probability of sampling from N(0, 1) and 10% probability of sampling from N(0, 10)
with a mean of zero. We see that the sample median, sample m-estimate and sample trimmed mean are all
considerably closer to the population mean of zero. Additionally, there is less variability in these
estimates, than the sample mean.
References
Hample, F.R. (1973). Robust estimation: A condensed partial survey. Z.
Wahrscheinlichkeitstheorie and Verw. Gebiete, 27, 87-172.
Hampel, F. R., E. M. Ronchetti, P. J. Rousseuw and W. A. Stahel (1986).
Robust Statistics. Wiley, New York.
Hoaglin, D. C., Mosteller, F., & Tukey, J.W. (1983). Understanding
robust and exploratory data analysis. New York: Wiley.
Huber, P. J. (1981). Robust statistics, Wiley, New York.
Leger, C. & Romano, J.P. (1990). Bootstrap adaptive estimation: the
trimmed-mean example. The Canadian Journal of Statistics, 18(4), 297-314.
Micceri, T. (1989). The unicorn, the normal curve, and other improbable
creatures. Psychological Bulletin, 105(1), 156-166.
Rousseeuw, P.J. and Leroy, A.M. (1987), Robust Regression and Outlier
Detection, John Wiley, New York.
Sawilowsky, S. S. & Blair, R. C. (1992). A more realistic look at the
robustness of Type II error properties of the t-test to departures from population normality.
Psychological Bulletin, 111, 352-360.
Stigler, S.M. (1977). Do robust estimators work with real data?
Annals of Statistics, 5, 1055-1098.
Tukey, J. (1960). A Survey of Sampling From Contaminated Distributions. In:
Contributions to Probability and Statistics. Olkin, I. (Ed). Standford University Press: Stanford,
California.
Wilcox, Rand R. (2001). Fundamentals of Modern Statistical Methods.
Springer-Verlag, New York.
Wilcox, Rand R. (1998). How Many Discoveries have Been Lost by ignoring
Modern Statistical Methods? American Psychologist, Vol 53, No. 3, 300-314.
|