Week 6 - Central limit theorem Flashcards
The Central Limit Theorem is a very powerful statement in statistics, saying that as you take more and more
samples from a random variable, the distribution of the means of the samples (If you completed the lesson titled
“The Mean of Means”, you will recognize this as “the sampling distribution of the sample means”) will approximate
a normal distribution. This is true regardless of the original distribution of the random variable (if the number of data
points in each sample is 30 or more)! In fact, as demonstrated in the video above, even a discrete random variable
with a pretty odd distribution will output an approximately normal distribution from the means of enough samples.
Central limit theorem
Formally, the CLT says:
If samples of size n are drawn at random from any population with a finite mean and standard deviation, then
the sampling distribution of the sample means, x, approximates a normal distribution as n increases.
If you collect many samples from an ordinary random variable, and calculate the mean of each sample, then
the means will be distributed in an approximate bell-curve, and the “mean of means” will be the same as the
mean of the population. The larger the size of the samples you collect, the more closely the distribution of
their means will approximate a normal distribution.
Notes to remember:
- As long as your sample size is 30 or greater, you may assume the distribution of the sample means to be
approximately normal, meaning that you can calculate the probability that the mean of a single sample of size
30 or greater will occur by using the z-score of the mean. - The mean of the distribution created from many sample means approaches the mean of the population.
Formally: µx = µ - The standard deviation of the distribution of the means is estimated by dividing the standard deviation of the
population by the square root of the sample size. Formally: sx = ps
n - Use the notation x(x-bar) rather than the random variable x to indicate that the random variable you are
describing is a sample mean.
As long as your sample size is 30 or greater, you may assume the distribution of the sample means to be
approximately normal, meaning that you can calculate the probability that the mean of a single sample of size
30 or greater will occur by using the z-score of the mean.
r
The mean of the distribution created from many sample means approaches the mean of the population.
Formally: µx = µ
r
The standard deviation of the distribution of the means is estimated by dividing the standard deviation of the
population by the square root of the sample size. Formally: sx = ps
n
r
Use the notation x(x-bar) rather than the random variable x to indicate that the random variable you are
describing is a sample mean
r
z table
analyze
Mack asked 42 fellow high-school students how much they spent for lunch, on average. According to his research
online, the amount spent for lunch by high school students nation wide has µ = $15, with s = $9. What is the
probability that Mack’s random sample will result within $0.01 of the national average?
There are a few important facts to note here:
* Mack’s sample is 42 students, since 42 30, he can safely assume that the distribution of his sample is
approximately normal, according to the Central Limit Theorem.
* The range we are considering is $14.99 to $15.01, since that represents $0.01 above and below the mean.
* The mean of the sample should approximate the mean of the population, in other words µx = µ
* The standard deviation of Mack’s sample, sx, can be calculated as sx = ps
n
, where n = 42
Let’s start by finding the standard deviation of the sample, sx:
sx = 9
p
42
= 9
6.48
sx = 1.389
Since Mack’s sample of 42 samples can be assumed to be normally distributed, and since we now know the standard
deviation of the sample, 1.39, we can calculate the z-scores of the range using Z = xµx
sx :
Z1 = 15.0115.00
1.389 = +0.01
Z2 = 14.9915.00
1.389 = 0.01
Finally, we look up Z1 and Z2 on the Z-score probability table to get a range of 50.4% to 49.6% = 0.80%
The time it takes a student to complete the mid-term for Algebra II is a bi-modal distribution with µ = 1 hr and
s = 1 hr. During the month of June, Professor Spence administers the test 64 times. What is the probability that the
average mid-term completion time for students during the month of June exceeds 48 minutes?
Important facts:
* There are more than 30 samples, so the Central Limit Theorem applies
* The mean of the sample should approximate the mean of the population, in other words µx = µ
* The standard deviation of Professor Spence’s sample, sx, can be calculated as sx = ps
n
, where n = 64 (the
number of tests/samples)
* 48 minutes is the same as 48
60 = 0.8 hrs, so the range we are interested in is x > 0.8 hrs
First calculate the standard deviation of the sample, using sx = ps
n
:
sx = 1
p
64
sx = 0.125
Since the sample is normally distributed, according to the CLT, we can use the standard deviation of the sample to
calculate the z-score of the minimum value in the relevant range, 0.80 hrs:
Z = 0.801
0.125 = 1.60
Finally, we use the z-score probability reference above to correlate the z-score of -1.60 to the probability of a value
greater than that
P(Z 1.6) = .9452 or 94.52%
Evan price-checked 123 online auction sellers to record their average asking price for his favorite game. According
to a major nation price-checking site, the national average online auction cost for the game is $35.00 with a standard
deviation of $3.00. Evan found the prices less than $34.86 on average. How likely is this result?
Since there are more than 30 samples (123 > 30), we can apply the CLT theorem and treat the sample as a normal
distribution.
The standard deviation of the sample is: sx = p
3
123 = 3
11.09 = .27
The z-score for Evan’s price point of $34.86 is:
Z = 34.8635
.27 = .14
.27 = 0.518
Consulting the z-score probability table, we learn that the area under the normal curve less than 0.52 is .3015 or
30.15%
30.15%
The likelihood of 123 samples having a mean of $34.86 is approximately 30.15%
What is the Central Limit Theorem? How does the Central Limit Theorem relate other distributions to the normal
distribution?
The Central Limit Theorem says that the larger the sample size, the more the mean of multiple samples will represent
a normal distribution. Since that is true regardless of the original distribution, the CLT can be used to effect a bridge
between other types of distributions and a normal distribution.
The time it takes to drive from Cheyenne WY to Denver CO has a µ of 1 hr and s of 15 minutes. Over the course of
a month, a hig
The sample mean, µx is the same as the population mean: 1 hr = 60 mins.
The sample standard deviation is 15
p
mins
55 = 15
7.42 = 2.02 min
The 55 trips made by the patrolman exceed the minimum sample size of 30 required to apply the CLT, so we may
assume the sample means to be normally distributed.
The z-score of the patrolman’s average time is: 6060
2.02 = 0
2.02 = 0
According to the z-score percentage reference, a z-score of 0 corresponds to .50 or 50%
There is a 50% probability that the patrolman’s mean travel time is greater than 60 mins.
Abbi polls 95 high school students for their GPA. According to the school, the average GPA of high school students
has a mean of 3.0, and a standard deviation of .5. What is the probability that Abbi’s random sample will have a
mean within 0.01 of the population.
The sample mean of the 95 polled G.P.A. scores is the same as the population mean: 3.0
The sample standard deviation is p.5
95 = .5
9.75 = .05
The 95 sampled G.P.A.’s exceed the minimum sample size of 30, so we may apply the CLT.
The z-scores of the minimum and maximum values in the range of interest, 2.99 to 3.01 is:
Z1 = 2.993.00
.05 = .01
.05 = 0.2
Z2 = 3.013.00
.05 = .01
.05 = +0.2
Referring to the z-score reference table, the z-scores -0.2 and 0.2 cover a range of apx. 15.86%
A recipe website has calculated that the time it takes to cook Sunday dinner has µ of 1 hour with s of 25 minutes.
Over the course of a month, 172 users report their time spent cooking Saturday dinner, what is the probability that
the average user reports spending less than 45 minutes cooking dinner?
The sample mean, µx is the same as the population mean: 1 hr = 60 mins.
The sample standard deviation is 25
p
mins
172 = 25
13.11 = 1.91 min
The 172 users reporting cooking times exceed the minimum sample size of 30 required to apply the CLT, so we may
assume the sample means to be normally distributed.
The z-score of the average reported cooking time is: 4560
1.91 = 15
1.91 = 7.85
According to the z-score percentage reference, a z-score of -7.85 corresponds to 0%.
There is essentially zero probability that 172 users would average only 45 mins.