BackBack

 

Record: 1
Title:Do You See What I Mean? Indices of Central Tendency.
Author(s):Streiner, David L
Source:Canadian Journal of Psychiatry; Nov2000, Vol. 45 Issue 9, p833, 4p, 1 graph
Document Type:Article
Subject(s):RESEARCH -- Methodology
PSYCHIATRY -- Research
Abstract:There are many indices of the middle, or central tendency, of a set of numbers, including the mode, median, and mean. Indeed, there are, several "means," of which the arithmetic mean is only one. When data are skewed, or when there are outliers at one or both ends of the distribution that may distort the results, "robust" estimators of the mean, such as the trimmed mean or the bisquare weight mean, give better results than does the arithmetic mean. If the data reflect growth over time, the geometric mean is a more accurate reflection of the middle point than are other indices, and in determining sample size when the sample size varies among groups, the harmonic mean is the one of choice. Finally, this paper discusses the difference between the lay and statistical use of the term "average" and how this difference can lead to problems in interpretation.(Can J Psychiatry 2000;45:833-836)Key Words: mean, median, mode, average, statistics, harmonic mean, geometric mean, robust estimators, biweight mean, trimmed mean [ABSTRACT FROM AUTHOR]
Full Text Word Count:3141
ISSN:0706-7437
Accession Number:3937297
Persistent link to this record: http://search.epnet.com/direct.asp?an=3937297&db=pbh
Cut and Paste: <A href="http://search.epnet.com/direct.asp?an=3937297&db=pbh">Do You See What I Mean? Indices of Central Tendency.</A>
Database: Psychology and Behavioral Sciences Collection

Section: Research Methods in Psychiatry
DO YOU SEE WHAT I MEAN? INDICES OF CENTRAL TENDENCY



There are many indices of the middle, or central tendency, of a set of numbers, including the mode, median, and mean. Indeed, there are, several "means," of which the arithmetic mean is only one. When data are skewed, or when there are outliers at one or both ends of the distribution that may distort the results, "robust" estimators of the mean, such as the trimmed mean or the bisquare weight mean, give better results than does the arithmetic mean. If the data reflect growth over time, the geometric mean is a more accurate reflection of the middle point than are other indices, and in determining sample size when the sample size varies among groups, the harmonic mean is the one of choice. Finally, this paper discusses the difference between the lay and statistical use of the term "average" and how this difference can lead to problems in interpretation.

(Can J Psychiatry 2000;45:833-836)

Key Words: mean, median, mode, average, statistics, harmonic mean, geometric mean, robust estimators, biweight mean, trimmed mean

Arecent headline in a leading Canadian paper stated, with unfeigned horror, that one-half of the hospitals in Toronto were below average. Needless to say, this brings to mind Garrison Keillor's fabled town of Lake Wobegon, where "the women are strong, the men are beautiful, and all the children are above average." It also illustrates how difficult it is to come to grips with the seemingly simple concept of "average." The purpose of this article is threefold: to try to explain the difference between the lay and the statistical concepts of average, to review the more common indices of average, and to introduce some lesser-known but very useful ways of calculating averages.

The Lay and Statistical Definitions

To the statistician, "average" has a very definite meaning: a number, determined through some arithmetic operations, that best describes the middle--or "central tendency"--of a group of numbers. So, if 2 statisticians are given the same set of numbers and told to compute the same measure of central tendency, they will come up with identical answers (assuming they make no computational errors along the way a tenuous assumption, at best). This number will divide the original group of numbers roughly in half, although as we'll see, "half" can refer to the number of numbers, or to the "weight" of the numbers, or to some other criterion. In any case, we would be very surprised if all of the numbers fell above or below this index of central tendency; it wouldn't be central then. To the lay person though, "average" has a very different meaning, implying an "ordinary standard" (1) or "good enough." In this light, neither the headline nor Keillor's description are so surprising; they do actually reflect a psychological truth--people may believe that all (or none) of the hospitals meet their standards for adequate care, irrespective of the fact that some hospitals are better than others, and all of the children in Lake Wobegon are doing more, or better, than just "good enough" work.

The difficulty arises when we confuse these meanings of "average." Using the statistical definition, one-half of the physicians graduating each year are below average. Even if we were to extend medical school by another 5 years to cram more facts into their already-overstuffed brains and raise the mean score on their qualifying exams by 50 points, one-half of the students will fall below this new mean. This is an inescapable fact of the strict definition of "average," and no amount of training (or cheating) will get around it. If, on the other hand, we mean "average" as "meeting acceptable standards," then it is quite conceivable that all of the graduates are above this criterion (it is equally possible, logically, that all are below it). The difference is best illustrated by the mathematical dictum that people have, on average, fewer than 2 legs. Since nobody has more than 2 legs, the amputation of even I leg reduces the average in the population to something less than 2 legs. The major problem occurs when a study defines "average" using statistical criteria but interprets it according to lay criteria, or vice versa.

The Usual Suspects--Mean, Median, and Mode

When we use the term "mean" or "average" statistically, what occurs to most people (including statisticians) is to add up all of the numbers and divide by the number of numbers, n; in statistical notation:

Multiple line equation(s) can not be represented in ASCII text.

where sigma (the Greek capital letter sigma) means "to sum." Strictly speaking, this is called the "arithmetic mean" (AM), to differentiate it from some other means we'll discuss later, and is often referred to by the cognoscenti as "X-bar." Because many word processors have difficulty producing X (with a bar over it), a new symbol to denote the mean has been introduced: M, which adequately accommodates the machines invented to serve us.

If the distribution of the numbers is symmetrical, then the number of numbers above and below the mean is equal. But, more generally, the mean is the average of the "weight" or magnitude of the numbers. That is, the mean of 4, 5, and 9 is 6. In this case, 4 is two units below 6 and 5 is one unit below, balancing the 9, which is three units above. As we shall see, this can be a disadvantage of the AM in some situations.

In some instances, numbers are used to indicate ranks of objects or people rather than absolute amounts; the statistical term for these data is ordinal, reflecting the fact that the numbers put the observations in rank order. We cannot assume, however, that the distances between successive ranks are equal. For example, breast cancer is usually staged as I, II, III, and IV, but the amount of disease progression from stage I to stage II is not necessarily the same as between stages II and III or between III and IV. Consequently, it doesn't make sense to talk of the "average" stage of cancer of 100 women. For these data, we use a different index of central tendency, called the median. The median is the number (or rank) such that 50% of the cases fall above it and 50% below.

In Statistics 101 we learn that we use the mean when the data have equal intervals between successive numbers (interval or ratio data--let's not worry about the distinction) and the median for ordinal data. Now that we've learned the rules, we can break them. If the data deviate from symmetry to a large degree (that is, they are skewed, with a much longer tail at one end of the distribution than the other), then the mean may give a misleading representation of the "average." For example, if 9 physicians each nets $100 000 yearly and a 10th earns $300 000, then the average is $120 000, which is 20% greater than the average for 90% of the sample. In this case, the median is a more accurate reflection of the central tendency than is the mean. The take-home message is that when the data are highly skewed, use the median.

Conversely, most of the scales used in psychiatry to measure affective states, such as the Beck Depression Inventory (BDI) or the State-Trait Anxiety Inventory, represent in fact, ordinal data. That is, we cannot assume that the increase in the underlying depression reflected in a difference in BDI score from 5 to 10 is the same as the difference between 20 and 25. Most of the time, the measure of central tendency used with these scales is the mean, and unless the data are highly skewed, this choice is appropriate. So when, with ordinal data, can we switch from using the median (cancer staging) to the mean (the BDI)? The answer is a definite "We don't know." A rule of thumb is that if the scale yields more than 10 possible numbers, it's fairly safe to use the mean. Indeed, it's often used with Likert scales (which rank responses; for example, strongly agree, agree, neutral, disagree, strongly disagree) that have 5 or 7 points, as long as the distribution of scores is relatively normal or flat. The reason people prefer the mean to the median is that you can do much more with it; many more statistical tests use the mean than use the median, so we can perform more sophisticated analyses.

Last (and definitely least), the mode is used with nominal (that is, named) data, such as sex, diagnosis, race, eye colour, political affiliation, and the like, where there is no rank ordering among the categories (sexists and racists to the contrary, notwithstanding). The mode is simply the most frequently used category and suffers from 3 major problems. First, it is very unstable; unless one category is clearly dominant, adding new subjects to the sample can shift the mode from one category to another quite unpredictably. Second, 2 or more categories often have nearly equal numbers of subjects, so that assigning primacy of place to just one of them is artificial. Finally, even fewer statistical tests involve the mode than the median; once we've specified the most frequently used category, there is little else we can do with it.

Variations on a Theme--Geometric and Harmonic Means

It is so ingrained to think of the AM when the term "average" is used that many people find it difficult to believe that there are, in fact, several different means and that, in some situations, these alternatives are preferable. One of these, the geometric mean (GM), is defined as:

Multiple line equation(s) can not be represented in ASCII text.

where H means the product of all the Xs (that is, X[sub1] x X[sub2] x ... x X[subn]) and n before the radical indicates the nth root. For example, the geometric mean of 3, 4, and 5 is:

Multiple line equation(s) can not be represented in ASCII text.

The GM finds most use when we look at data that increase exponentially over time. For example, Figure 1 shows the Medicare expenditure for outpatient hospital services in the US in 1980, 1985, and 1990. If we had data only for the years 1980 and 1990 and used the AM to estimate the 1985 expenditure, the result would be the value indicated by the open circle. Using the GM, however the result would be the filled square, which is much closer to the actual amount, represented by the line. The AM assumes that the change between 1980 and 1990 is best approximated by a straight line (the equal interval assumption of interval and ratio data), whereas in reality it follows an exponential growth curve. Because the GM is always smaller than the AM (except when all the values are the same, in which case the AM and GM give identical results), it gives a more accurate estimate with data of this sort, which we often encounter in looking at changes in populations over time (remember Malthus?). Be aware, though, that if any of the numbers is 0, your computer will have an in-farct because the root of 0 is indeterminate; and if any number is negative, the results won't make sense (if an odd number of values is negative, an even more massive infarct will occur).

The harmonic mean (M[subH]) is defined as:

Multiple line equation(s) can not be represented in ASCII text.

Using the same numbers as before, the M[subH] of 3, 4, and 5 is:

Multiple line equation(s) can not be represented in ASCII text.

which is smaller than both the AM of 4 and the GM of 3.91. This is always the case: unless all the numbers are identical (in which case the 3 means are identical), the AM yields the largest number, the M[subH] the smallest, and the GM is somewhere in between. It is used primarily in figuring out the mean sample size across groups, where each group has a different number of subjects.

Robust Estimators of the Mean

Let's take a closer look at some of the implicit assumptions of the various means and the median. With the mean, each number contributes equally to the final result. In other words, it is as if we multiplied each number by 1. As we saw, though, this can be a disadvantage when the data are skewed very extreme values can pull the mean away from where the bulk of the data are. To get around this problem, we use the median, or the one central value that divides the number of data values into equal halves. In this case, it is tantamount to multiplying that single value of the median by 1, and all of the other numbers by 0. The disadvantage here is that we are disregarding almost all of the data. Even if we were to add 1000 to each of the values above the median or multiply the most extreme values by l 000 000, the median itself would not shift; 50% of the values would still be above and 50% below it. This definitely seems to be an overreaction to the problem of the mean.

Over the years, several variants of the mean have emerged, called robust estimators, because extreme values influence them much less than they influence the AM; that is, they are "robust" against the effects of long tails on one or both sides of the distribution. They fall intermediately between the mean and the median: not all numbers are given a weight of 1, as is the case with the mean. By the same token, however, they make use of all or most of the data, which the median does not.

The simplest robust estimator is the trimmed mean. "Trimmed" refers to the fact that the extreme 5% or 10% of the data are discarded. One form of this statistic is symmetric: both ends of the distribution are trimmed equally. If the distribution itself is roughly symmetrical, then the trimmed mean and the AM will yield equivalent estimates. However, the standard deviation (SD) of the trimmed mean will be smaller (sometimes quite a bit smaller) than that of the AM. Consequently, because the SD is part of the error term in most statistical tests, you may find significance with the trimmed mean when you don't with the more traditional mean. Another form of the trimmed mean deletes data only from one end of the distribution, which is useful if the data are highly skewed. We often find this when there is a "barrier" at one end, where numbers cannot be smaller or larger than some value, but no barrier at the other end. For example, there is a natural lower limit on the number of days in hospital or in jail (that is, 0) but no upper limit. it is quite common to find that most people in a study cluster around a relatively low number of days, but a few people have been in hospital or behind bars for extended periods of time, highly skewing the data to the right (that is, toward longer times). It doesn't make sense to trim the lower end of the distribution, only the upper end. In this case, the trimmed mean and AM would differ from each other, and again the SD of the former is lower than that of the latter.

With the trimmed mean, we give 90% or 95% of the data a weight of 1 and assign a weight of 0 to the trimmed values. There is no reason for the weights to be either 0 or 1; they can take any value between these extremes. This is the rationale behind a large class of robust estimators, of which the best known is the bisquare weight mean, also referred to as the biweight mean (2). The name comes from the way it is calculated. First, we determine how much each value deviates from the AM, and then we square the result. If the squared deviation is less than some criterion (that is, it is near the mean), then the results are squared again (hence, "bisquare"), and this becomes the weight for that value; the nearer to the mean, the closer the weight is to 1. If the deviation exceeds the criterion, then the weight is 0. The mean is then recalculated, using the original values multiplied by their weights. The procedure is repeated using this new estimate of the mean until further iterations have a minimal effect on the mean. In essence, this (and other robust estimators) gives higher weight to numbers that are near the mean and lower (or no) weight to numbers that deviate considerably from it.

There are 2 limitations of the biweight mean. First, the need to iterate makes it quite labour intensive; it must, therefore be done on a computer or programmable calculator. Second, the user must supply the values for 2 constants used in the formula and also determine when the change in the mean from one iteration to the next is small enough to stop. Mosteller and Tukey provide guidelines for these decisions (2), but others have criticized the subjectivity involved (3). In most situations, though, different values for the constants do not affect the final results very much.

Summary

Measures of central tendency should not be used blindly, nor should we necessarily trust the one printed out by a computer program. The choice depends on the type of data (nominal, ordinal, or interval and ratio); the distribution of the data (symmetrical or skewed); the presence or absence of outliers; and whether the trend is linear, exponential, or some other shape. Nothing takes the place of looking at the data, knowing what they represent, and using good judgement.

Manuscript received February and accepted May 2000.

This is the 18th article in the series on Research Methods in Psychiatry. For previous articles please see Can J Psychiatry 1990;35:616-20, 1991;36:357-62, 1993;38:9-13, 1993;38:140-8, 1994;39:135-40, 1994;39:191-6, 1995;40:60-6, 1995;40:439-44; 1996;41:137-43, 1996;41:491-7, 1996;41:498-502, 1997;42:388-94, 1998;43:173-9, 1998;43:411-5, 1998;43:737-41, 1998;43:837-42, and 1999;44:175-9.

[sup1]Director, Kunin-Lunenfeld Applied Research Unit, Baycrest Centre for Geriatric Care; Professor, Department of Psychiatry, University of Toronto, Toronto Ontario.

Address for correspondence: Dr DL Streiner, Kunin-Lunenfeld Applied Research Unit, Baycrest Centre for Geriatric Care, 3560 Bathurst Street, Toronto, ON M6A 2E1

e-mail: dstreiner@rotman-baycrest.on.ca

GRAPH: Figure 1. Medicare expenditures in 1980, 1985, and 1990, showing estimates of 1985 based on the arithmetic and geometric means.

References

1. Barber K, editor. The Canadian Oxford dictionary. Toronto: Oxford University Press; 1998.

2. Mosteller F, Tukey J. Data analysis and regression: a second course in statistics. Reading (MA): Addison-Wesley Publishing; 1977.

3. Glickman ME, Noether M. An examination of cross-specialty linkage applied to the resource-based relative value scale. Med Care 1997;35:843-66.

~~~~~~~~

By David L Streiner, PhD[sup1]

Adapted by PhD[sup1]


Copyright of Canadian Journal of Psychiatry is the property of Canadian Psychiatric Association and its content may not be copied or e-mailed to multiple sites or posted to a listserv without the copyright holder`s express written permission. However, users may print, download, or e-mail articles for individual use.
Source: Canadian Journal of Psychiatry, Nov2000, Vol. 45 Issue 9, p833, 4p
Item: 3937297
 
Top of Page

BackBack