Standard Deviation: Why divided by (n-1) and not n?


Suppose that I am interested in the number of hours per day that high school students in North America spend doing their mathematics homework. The “population” of interest is all high school students in North America, a very large number of people. Lets call this number N. My real interest is the mean and standard deviation of this population. When talking about a population statisticians usually use Greek letters to designate these quantities, so the mean of the population is written , ( is the Greek letter mu). Likewise the standard deviation is , ( is the Greek letter sigma). Notice that here the denomonator in the calculation is N.

Rather than trying to deal with this large population a statistician would usually select a “sample” of students, say n of them, and perform calculations on this smaller data set to estimate mu and sigma. Here n might be 25 or 30 or 100 or maybe even 1000, but certainly much smaller than N. To estimate mu it seems natural to use , the mean of the sample. Likewise to estimate sigma it seems reasonable to use , but this quantity tends to underestimate sigma, particularly for small n. For this and other technical reasons the quantity is a usually preferred as the estimator to use for sigma.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s