Week 2 - Lesson 5.2 The Density Curve of the Normal Distribution Flashcards
an idealized representation of a distribution in which the area under the curve is defined to be 1.this need not be normal, but the normal density curve will be the most useful to us.
Density curve
We already know from the Empirical Rule that approximately 2
3 of the data in a normal distribution lies within 1
standard deviation of the mean. With a normal density curve, this means that about 68% of the total area under the curve is within z-scores of ±1. Look at the following three density curves:
-read
You may have noticed that the density curve changes shape at two points in each of our examples. These are the
points where the curve changes concavity. Starting from the mean and heading outward to the left and right, the
curve is concave down. (It looks like a mountain, or ’n’ shape.) After passing these points, the curve is concave
up. (It looks like a valley, or ’u’ shape.) The points at which the curve changes from being concave up to being
concave down are called the inflection points. On a normal density curve, these inflection points are always exactly
one standard deviation away from the mean.
-read
Example: Estimate the standard deviation of the distribution represented by the following histogram.
This distribution is fairly normal, so we could draw a density curve to approximate it as follows:
Now estimate the inflection points as shown below:
It appears that the mean is about 0.5 and that the x-coordinates of the inflection points are about 0.45 and 0.55,
respectively. This would lead to an estimate of about 0.05 for the standard deviation.
The actual statistics for this distribution are as follows:
s ⇡ 0.04988
x ⇡ 0.4997
We can verify these figures by using the expectations from the Empirical Rule. In the following graph, we have
highlighted the bins that are contained within one standard deviation of the mean.
If you estimate the relative frequencies from each bin, their total is remarkably close to 68%. Make sure to divide
the relative frequencies from the bins on the ends by 2 when performing your calculation.
While it is convenient to estimate areas under a normal curve using the Empirical Rule, we often need more precise
methods to calculate these areas. Luckily, we can use formulas or technology to help us with the calculations.
-read
All normal distributions have the same basic shape, and therefore, rescaling and re-centering can be implemented
to change any normal distributions to one with a mean of 0 and a standard deviation of 1. This configuration is
referred to as a standard normal distribution. In a standard normal distribution, the variable along the horizontal
axis is the z-score. This score is another measure of the performance of an individual score in a population. To
review, the z-score measures how many standard deviations a score is away from the mean. The z-score of the term
x in a population distribution whose mean is µ and whose standard deviation is s is given by: z = xµ
s . Since s is
always positive, z will be positive when x is greater than µ and negative when x is less than µ. A z-score of 0 means
that the term has the same value as the mean. The value of z is the number of standard deviations the given value of
x is above or below the mean.
-read
Example: On a nationwide math test, the mean was 65 and the standard deviation was 10. If Robert scored 81, what
was his z-score?
Z=1.6
Example: On a college entrance exam, the mean was 70 and the standard deviation was 8. If Helen’s z-score was
1.5, what was her exam score?
X=58
Now you will see how z-scores are used to determine the probability of an event.
Suppose you were to toss 8 coins 256 times. The following figure shows the histogram and the approximating
normal curve for the experiment. The random variable represents the number of tails obtained
The blue section of the graph represents the probability that exactly 3 of the coins turned up tails. One way to
determine this is by the following:
Geometrically, this probability represents the area of the blue shaded bar divided by the total area of the bars. The
area of the blue shaded bar is approximately equal to the area under the normal curve from 2.5 to 3.5.
Since areas under normal curves correspond to the probability of an event occurring, a special normal distribution
table is used to calculate the probabilities. This table can be found in any statistics book, but it is seldom used today.
The following is an example of a table of z-scores and a brief explanation of how it works:
The values inside the given table represent the areas under the standard normal curve for values between 0 and
the relative z-score. For example, to determine the area under the curve between z-scores of 0 and 2.36, look in
the intersecting cell for the row labeled 2.3 and the column labeled 0.06. The area under the curve is 0.4909. To
determine the area between 0 and a negative value, look in the intersecting cell of the row and column which sums
to the absolute value of the number in question. For example, the area under the curve between 1.3 and 0 is equal
to the area under the curve between 1.3 and 0, so look at the cell that is the intersection of the 1.3 row and the 0.00
column. (The area is 0.4032.)
It is extremely important, especially when you first start with these calculations, that you get in the habit of relating
it to the normal distribution by drawing a sketch of the situation. In this case, simply draw a sketch of a standard
normal curve with the appropriate region shaded and labeled.
-read
Example: Find the probability of choosing a value that is greater than z = 0.528. Before even using the table, first
draw a sketch and estimate the probability. This z-score is just below the mean, so the answer should be more than
0.5.
Next, read the table to find the correct probability for the data below this z-score. We must first round this z-score
to 0.53, so this will slightly under-estimate the probability, but it is the best we can do using the table. The table
returns a value of 0.50.2019 = 0.2981 as the area below this z-score. Because the area under the density curve is
equal to 1, we can subtract this value from 1 to find the correct probability of about 0.7019.
What about values between two z-scores? While it is an interesting and worthwhile exercise to do this using a table,
it is so much simpler using software or a graphing calculator.
Example: Find P(2.60 < z < 1.30)
This probability can be calculated as follows:
P(2.60 < z < 1.30) = P(z < 1.30)P(z < 2.60) = 0.90320.0047 = 0.8985
It can also be found using the TI-83/84 calculator. Use the ’normalcdf(2.60, 1.30, 0, 1)’ command, and the
calculator will return the result 0.898538. The syntax for this command is ’normalcdf(min, max, µ, s)’. When
using this command, you do not need to first standardize. You can use the mean and standard deviation of the given
distribution.
Technology Note: The ’normalcdf(’ Command on the TI-83/84 Calculator
Your graphing calculator has already been programmed to calculate probabilities for a normal density curve using
what is called a cumulative density function. The command you will use is found in the DISTR menu, which you
can bring up by pressing [2ND][DISTR]
Press [2] to select the ’normalcdf(’ command, which has a syntax of ’normalcdf(lower bound, upper bound, mean,
standard deviation)’.
The command has been programmed so that if you do not specify a mean and standard deviation, it will default to
the standard normal curve, with µ = 0 and s = 1.
For example, entering ’normalcdf(1, 1)’ will specify the area within one standard deviation of the mean, which we
already know to be approximately 0.68.
-read
’Normalcdf (a,b,µ,s)’ gives values of the cumulative normal density function. In other words, it gives the probability of an event occurring between x = a and x = b, or the area under the probability density curve between the
vertical lines x = a and x = b, where the normal distribution has a mean of µ and a standard deviation of s. If µ and
s are not specified, it is assumed that µ = 0 and s = 1
-read
Example: Find the probability that x < 1.58.
The calculator command must have both an upper and lower bound. Technically, though, the density curve does not
have a lower bound, as it continues infinitely in both directions. We do know, however, that a very small percentage
of the data is below 3 standard deviations to the left of the mean. Use 3 as the lower bound and see what answer
you get.
The answer is fairly accurate, but you must remember that there is really still some area under the probability density
curve, even though it is just a little, that we are leaving out if we stop at 3. If you look at the z-table, you can see
that we are, in fact, leaving out about 0.50.4987 = 0.0013. Next, try going out to 4 and 5.
Once we get to 5, the answer is quite accurate. Since we cannot really capture all the data, entering a sufficiently
small value should be enough for any reasonable degree of accuracy. A quick and easy way to handle this is to enter
99999 (or “a bunch of nines”). It really doesn’t matter exactly how many nines you enter. The difference between
five and six nines will be beyond the accuracy that even your calculator can display