The Central Limit Theorem allows us to make inferences about a population (i.e., estimate the mean proportion) by taking "small" samples from that population. There are some requirements as to when we can apply the Central Limit Theorem:

For all samples of size \(n\gt30\) drawn randomly from any population with a mean \(\mu\) and a standard deviation \(\sigma\), the sampling distribution of \(\bar{x}\) (the sample means) can be approximated by a normal distribution with mean \(\mu\) and standard deviation \(\sigma /\sqrt{n}\).

If the population is normally distributed (or approximately normal), the sampling distribution of sample means is normally distributed for any sample size \(n\), even less than 30.

This leads to the following:

Given a population that has a mean \(\mu\), a standard deviation \(\sigma\), and simple random samples (SRS) of the same size \(n\) are chosen from the population, then:

Case 1: When the population is normally distributed and simple random samples of any size \(n\) are taken, or

Case 2: When the population is not normally distributed, but the SRS is of size \(n\gt30\), then

Case 3: If \(n\le 30\) and the population distribution is not known to be normal, then the sample distribution of \(\bar{x}\) cannot be approximated with a normal distribution. Other methods may be required in these situations.

The Standard z-score

Recall the standard z-score for a single item is given by:

Consider the geometric distribution with parameter \(\lambda=0.3\). The discrete distribution has the following probability density histogram

This is certainly not a normal distribution. Let's take a random sample from this distribution of size \(n=40\). We would expect mostly small values, and rarely a number greater than 12.

7

4

1

1

0

2

0

5

2

3

1

2

7

7

0

3

1

1

0

2

0

1

0

7

2

7

8

2

1

2

0

1

1

3

1

1

4

0

0

0

The sample mean is \(\bar{x}=2.25\). Now let's take 5000 samples of size 40 and make a histogram of all 5000 sample means. Here's what we get:

Notice how the distribution of the sample means looks fairly normal, nothing like the original distribution. If we increase our sample size \(n\), the distribution will even look more normal! For the 5000 sample means above we have \(\mu_{\bar{x}}=2.323\), and \(\sigma_{\bar{x}}=0.434\). The actual population mean and standard deviation of the geometric distribution is \(\mu=2.333\) and \(\sigma=2.789\). The mean of the sampling distribution, \(\mu_{\bar{x}}\), and the population mean \(\mu\), are nearly identical. AND, \(\sigma_{\bar{x}}=\frac{\sigma}{\sqrt{n}}=\frac{2.789}{\sqrt{40}}\approx 0.441\) which again is nearly identical to the calculated value for \(\sigma_{\bar{x}}\).

That's the Central Limit Theorem!!

Example 1

A.C. Nielsen reported that children between the ages of 2 and 5 watch an average of 25.2 hours of television per week. Assume the distribution is normally distributed and the standard deviation is 3 hours.

a) Find the probability that if a single child is randomly selected, the mean number of hours they watch television is greater than 27 hours.

b) If 20 children between the ages of 2 and 5 are randomly selected, find the probability that the sample mean of all 20 children is greater than 27 hours of television watched.

a) Here we have \(\mu=25.2\) and \(\sigma=3\). We are to find \(P(x\gt 27)\). We can either use equation (\ref{zscore1}) and look up the z-score in a table, or simply use the calculator:

This is a fairly likely probability; there is about a 27% probability that a randomly selected child will watch more than 27 hours of television each week.

b) Since it was stated to assume the population is normally distributed, we can use the CLT using either equation (\ref{zscore2}) or the calculator:

Here we see it's very unlikely (about 0.36%), to have a randomly selected group of 20 children aged 2 to 5 have a mean number of television watching per week of over 27 hours. Either we found a group of kids that watch WAY too much TV, or the reported mean of 27 hours per week might not be entirely correct.

Example 2

The average age of a vehicle registered in the United States is 8 years, or 96 months. Assume the standard deviation is 16 months.

a) Assuming the ages are normally distributed, find the probability that a randomly selected vehicle will have an age between 90 and 100 months.

b) A random sample of 40 vehicles is selected, find the probability that the mean of their age is between 90 and 100 months.

c) Find the probability that a sample of 10 cars has a mean age of more than 9 years.

a) We can either use equation (\ref{zscore1}) and a normal table, or use our calculator:
\begin{align*} P(90\lt x\lt 100)&=\text{normalCDF(}90,100,96,16\text{)} \\ P &\approx 0.245 \end{align*}

b) Normality is not mentioned, but \(n \gt 30\), so we can use the CLT:
\begin{align*}P(90\lt x\lt100)&=\text{normalCDF(}90,100,16/\sqrt{40}\text{)}\\ P &\approx 0.934 \end{align*}

c) Normality isn't given and we have \(n \lt 30\). We can't apply the CLT and do any calculations.

Example 3

If, under a certain assumption, the probability of an observed event is extremely small, say \(p\lt0.05\), we conclude that the assumption is probably not correct. A machine fills 12-oz water bottles in a bottling plant, and has a standard deviation of 0.2 oz. If it is determined that the machine is over-filling or under-filling bottles, the machine needs to be shut down and recalibrated. Let's assume the machine is filling bottles with a mean volume of 12 oz.

a) Assuming a normal distribution, a single bottle of water is tested and found to have a mean of 12.3 oz. Should the bottling line be shut down and recalibrated?

b) A random sample of 20 bottles is found to have a mean of 11.5 oz. Is recalibration necessary?

c) A random sample of 50 bottles is found to have a mean of 12.06 oz. Is recalibration necessary?

a: Since we can assume normality, we can calculate
\begin{align}p(x\ge 12.3)&=\text{normalCDF(}12.3,\infty,12, 0.2\text{)}\\ &\approx0.0668 \nonumber \end{align}
This means the probability of a single bottle having a volume of 12.3 oz or greater is about 6.7%, not an unusual outcome.

b: We are not told if the distribution is normal and the sample size is less than 30, we can't apply the CLT to this part. Other methods would have to be used.

c: Our sample size of \(n=50\) is greater than 30 so we can apply the CLT to find \(P(\bar{x}\ge 12.06)\). First find the test statistic:
\begin{align}t&=\frac{\bar{x}-\mu}{\sigma / \sqrt{n}}\\&= \frac{12.06-12.0}{0.2/\sqrt{50}} \nonumber \\ &\approx 2.1213 \label{tvalue}\end{align}

Using (\ref{tvalue}) above, we can calculate the probability
\begin{align*}P(\bar{x}\ge12.06)&=\text{tCDF(}2.1213,\infty,49\text{)}\\&\approx 0.0195\end{align*}

Since this is less than 0.05 we conclude that the machine must not have a mean of 12.0, is over filling, and should be shut down and recalibrated.