Ref: https://www.quora.com/What-is-the-difference-between-99-and-95-confidence-interval

Confidence intervals are a little bit tricky in a sense that people don’t define what they really mean by confidence interval.

Now let me tell you a scenario using which you can start understanding CIs on a very basic level.

**Imagine you want to find the mean height of all the people in a particular US state. **

You could go to each person in that particular State and ask for their height, or you can do the smarter thing by taking a sample of lets say 1000 people in the state.

Then you use the mean of their heights(**Estimated Mean**) to estimate average of heights in the state (**True Mean**)

But you being the true statistician you are are not satisfied.

So You think of spicing it up a little.

You come up with a strategy. Now you take N random samples (Or you may break the sample you already have into smaller pieces.)

The thing you want to use here is the **Central limit Theorum. **

Now CLT is a big topic in itself. But I will try to define it in an intuitive sense here.

It simply says that if I take the mean of many random samples and plot them on a distribution then that distribution is normal.

You probably would have seen the **Normal Distribution** by now. The bell curve function is a **Probability Density function**(PDF).

By saying PDF I mean that it provides the probability density. Density is the term to focus on here. The dark blue region in the graph estimates the probability that X(For our example it is the sample mean) will lie in the limits define by the start and end of the Dark Blue region.

So knowing all this you become curious. So you take the mean of the sample means and use it to estimate the true mean now.

Also you find out the Standard deviation of the N sample means you have(Lets call it SD).

**You want to find intervals on X axis that have a 95% probablity of occuring.**

So you use the** 65-95-99.7 % Rule. **

68% probability within 1 Standard deviation from the mean

95% proabibility within 2 SD from the mean.

99.7% probability within 3 SD from the mean.

The reason that 95% CI and 99.7% CI are so popular is just because they are within 2 and 3 SD and are easy to calculate. Also they make sense.

Now when I say that I estimate the true mean to be X(The Mean of the N Sample Means) with a confidence interval of [X-2SD,X+2SD], you are literally saying that:

It is with 95% probability that the true population mean is within these CI limits.

**When you take 99% CI you essentially increase the probability and cast a wider net.**

**So that is the standard theoritical way on how you think about the CIs.**

**In Practice **you don’t split your sample into N samples or take N samples(Since your data is expensive or there might be time constraints) and just use this formula to calculate your CI.

– Here Xbar is the sample mean(mean of the 1000 heights sample you took).

– Z is the no of standard deviations away from the sample mean(2 for 95%, 3 for 99.7%) – ** level of confidence** you want

– s is the standard deviation in the sample.

– n is the size of the sample.

**This formula just says that sample means are distributed with mean Xbar and Standard deviation sigma/(root(n)) as per the CLT.**

Now we use Xbar here. and use that to calculate the confidence interval.

By confidence interval we mean that if you were to repeatedly sample the same population, each time calculating a 95% CI, then 95% of those intervals will contain the true population mean *μ.*

The dotted line in this figure is the true population mean *μ. *See how some of these intervals don’t contain the true population mean *μ. *

Hope that clear things up a little bit for you. 🙂