## Monday, May 24, 2010

### Effective Sample Size

It is Monday morning and you are traveling from home to work, a 20 miles journey, when you hit a traffic jam. Things are slow, traffic is doing about 20 mph; you start getting anxious about that meeting they scheduled for 8. After 10 miles of slow traffic the road suddenly clears up and you decide to travel the remaining 10 miles at a speed of 80 mph. You are late to the meeting and tell your boss what happened, she listen quietly and asks, what was your average speed?

Some people, when they hear the word average, automatically think of adding the two numbers, 20+80, and dividing the result by 2. The arithmetic mea of these two speeds is 50 mph. However, miles per hour (mph) is a ratio of two quantities for which the arithmetic average is not appropriate. The first 10 miles, traveling @20 mph, took you 30 minutes, while the last 10 miles, traveling @80 mph, took you 7.5 minutes. Since you traveled 20 miles in 37.5 minutes your average speed is 20*(60/37.5) = 32 mph.

The "average" of these two speeds is given not by the arithmetic mean but by the harmonic mean: the product of the two numbers divided by the arithmetic average of the two. The "average" speed is then [20x80] / [(20+80)/2] = 32. In general, for a set of n positive numbers, the harmonic mean H resembles the arithmetic average but in an inverted sort of way,

The harmonic mean is useful for determining the effective sample size when comparing two populations means. This is because the precision of the average, X̄, is given by the standard deviation, which is inversely related to the sample size. The effective sample size for comparing two means with sample sizes n1 and n2 is given by

Let's say you are conducting a study two compare two products, A and B, using a two-sample t-test, and you take a random sample of 4 product A, and a random sample of 12 product B units (since you want to be "sure" about product B you take "extra" samples). The effective sample size for comparing the average of the 4 product A samples with the average of the 12 product B samples is then [4x12]/[(4+12)/2] = 6. What this means is that your study with 16 samples is equivalent to a study with 6 samples each from populations A and B, for a total of 12 samples. In other words, you are using 4 extra samples.

As you can see, having a balanced number of samples (n1=n2) per population is not just a statistical nicety, but can save you materials, time, and money.