Data Unit 6 Test Flashcards
In the previous unit, we looked primarily at the individual outcomes of probability experiments. We will now turn our focus on models that show the distribution for all possible outcomes of an experiment
Creating distribution charts
Random Variable
A function that maps a numerical value to the outcome of an experiment, usually X.
The random variable is used to account for all the possible outcomes of an experiment.
Example:
: If X is the number rolled on a die, there are 6 possible values for x.
: If X is the number rolled on a die, there are 6 possible values for x.
X all possibilities, x each case
Probability Distribution — The set of all possible values of a random variable and the corresponding probabilities.
Example: Let X be the random variable that counts the number of heads in3 flips of a coin.
of Heads, x
P(X=x)
0 - TTT
1/8 = 0.125
1 - TTH, HTT, THT
3/8 = 0.375
2 - HHT, HTH, THH
3/8 = 0.375
3 - HHH
1/8 = 0.125
Note: The sum of the probabilities is always equal to
1`
Example: Let X be the random variable defined by the sum of the top faces of two fair dice. Construct a probability distribution table and probability histogram.
Sum of dice, x
P(X=x)
2
1/36
3
2/36
4
3/36
5
4/36
6
5/36
7
6/36
8
5/36
9
4/36
10
3/36
11
2/36
12
1/36
An expectation or expected value, E(X), is the predicted average of all possible outcomes of a probability experiment.
E(X) = X(P(x)) … for all outcomes
Example: For the dice example,
E(X) = 2(1/36) + 3(2/36) + 4(3/36) + … 11(2/36) + 12(1/36)
E(X) = 7
Example: If you rolled a pair of dice 360 times, how many times would you expect to get a 9? What do you expect you will get when you roll a pair of dice?
P(9) = 4/36 = 1/9
1/9 x 360 = 40 times.
Example: If you flipped a coin 4 times, how many heads would you expect?
of Heads, x
P(X=x)
0
1/16
1
4/16
2
6/16
3
4/16
4
1/16
E(X) = 0(1/16) + 1(4/16) + 2(6/16) + 3(4/16) + 4(1/16)
All tails … all heads
E(X) = 2
Example: A carnival game has the following rules. If you roll a 4 on fair die, you get $3; otherwise, you lose $1. (Pay $1, get $4 back — Quadruple your money!) Is this a fair game?
x
P(X=x)
1
-1
1/6
2
-2
1/6
3
-1
1/6
4
+3
1/6
5
-1
1/6
6
-1
1/6
Here, the outcomes are with respect to the amount of money that can be won or lost.
E(X) = (-1)(5)(1/6) + 3(1/6)
E(X) = -1/3
-$0.33/game
Therefore, not a fair game because in a fair game E(X) = 0.
You need to break even to have an even
Even chance to win or lose
Binomial Distributions
Need to know the difference between binomial and hypergeometric
A binomial distribution looks at the distribution of the outcomes of several trials of a probability experiment in which the:
1) trials are independent
2) only two outcomes for each trial is success or failure
3) The probability of success or failure for each independent trial is unchanged with each trial.
Repeated Trials — A stochastic process in which:
a) experiments are identical
b) experiments are independent
Bernoulli Trials —
repeated trials that have exactly 2 outcomes (Success or failure)
Ex. A bag contains 3 yellow marbles and 4 blue marbles. Three marbles are selected at random from a bag, one at a time with replacement. Construct a probability distribution table for the number of blue marbles in the sample.
Here, there are 3 repeated, independent trials of the same experiment in which there are only 2 outcomes;
success(picking a blue marble) or
failure (picking a yellow one).
Since there is a replacement after each pick, the probability of success is 4/7 and the probability of failure is 3/7, both remaining unchanged. Since all of the trials are independent, we can apply the product rule for independent events.
of blue marbles, x
P(X=x)
0 (all yellow)
(3 C 0) (4/7)^0 (3/7)^3 = 0.079
1
(3 C 1) (4/7)^1 (3/7)^2 = 0.315
2
(3 C 2) (4/7)^2 (3/7)^1 = 0.419
3
(3 C 3) (4/7)^3 (3/7)^0 = 0.187
E(X) = 0(0.079) + 1(0.315) + 2(0.419) + 3(0.187)
E(X) = 1.714, therefore 2 marbles
ROUND depending on which has the highest probability
ROUND UP to 2 marbles because they have the highest probability (0.419 is bigger than 0.315)
in binomial distribution…
- Do Choose, the success bracket, then the failure
- multiply x by probability for each option and add them up
- ROUND depending on which has the highest probability
in binomial, choose when the
Choose when the success trial occurs
robability in a Binomial Distribution formula:
P(X=x) = (n C x)p^xq^n-x
Where p is the probability of success, q is the probability of failure, q = 1 —p
n = # of trials
x = trial we are currently on
The Expectation for a Binomial Distribution of n number of independent trials:
Don’t need the table
E(X) = np
For the marble example above:
E(X) = 3(4/7)
=1.714
round to 2 because its probability is higher
Ex. A test consists of 10 multiple-choice questions each with 5 possible answers, only one of which is correct. If answers are chosen at random, whatis the probability that
a) 6 answers are correct?
P(X=6) = (10 C 6)(1/5)^6(1/5)^4 = 0.0055
x = 6, n = 10, p = 1/5 correct, q = 4/5 failure
6/10 = 60%, pick where you’re right, success of 1/5 - 6 times, fail of 4/5 - 4 times
Ex. A test consists of 10 multiple-choice questions each with 5 possible answers, only one of which is correct. If answers are chosen at random, whatis the probability that:
b) A guesser would get at least 20% (or 2 out of 10)
Indirect
1 - (none correct - 1 correct)
q = 1 - (10 C 0)(1/5)^0(4/5)^10 - (10 C 1)(1/5)^1(4/5)^9
= 0.624
Ex. A basketball player makes 75% of her free throws. What is the expected number of free throws she will make on her next 10 shots?
We know this is a binomial distribution because independent variables (taking one shot does not affect the other -> equal opportunities) have no need for a table, can use E(X) = np
E(X) = np = 10 x 7.5 = 7.5
But she doesn’t shoot 0.5 of a free throw, so is it closer to 7 or 8?
P(X=7) = (10 C 7)(0.75)^7(0.25)^3 = 0.25
P(X=8) = (10 C 8)(0.75)^8(0.25)^2 = 0.282
Therefore, 8 free throws since it has a higher probability.
Geometric Distributions
Subset of Binomial Distribution
Opening Exercise: A bag contains 1 green marble, 3 red marbles, 4 blue marbles, and 8 yellow marbles. If marbles are drawn at random, with replacement, find :
a) the probability that a red marble is selected 3 times in 7 picks.
1) Two outcomes — red or not red
2) Independent > replacement
3) Success and failure don’t change with each trial
n=7, x=3, p=3/16 - success, q=13/16 - failure
P(X = 3) = (7 C 3) (3/16)^3 (13/16)^4 = 0.101 approx 10%
Opening Exercise: A bag contains 1 green marble, 3 red marbles, 4 blue marbles, and 8 yellow marbles. If marbles are drawn at random, with replacement, find :
b) the number of times you would expect to see a blue marble in 28 picks
E(X) = 28(4/16) = 7
Opening Exercise: A bag contains 1 green marble, 3 red marbles, 4 blue marbles, and 8 yellow marbles. If marbles are drawn at random, with replacement, find :
c) the probability that the first green marble occurs on the eighth pick
waiting time: P(X =7)= (15/16)^7 (1/16)^1 = 0.04
Waiting until the 8th pick
The first 7 picks are not green
We don’t need choose brackets to decide when it will happen
Waiting Time -
The number of trials before the first success in a set of Bernoulli trials
Geometric Distribution Formulas — Waiting Time:
P(X = x) = q^x p^1
E(X) = q/p
Waiting time =
geometric
Ex. The probability that a professional billiards player sinks a ball is 0.9. Assuming the probability remains the same throughout her turn,
a) What is the probability that she will miss on the third shot of her turn?
P(X =3) =(0.9)^2 (0.1)^1 = 0.081
Waiting for 3rd shot, first 2 success, 3rd shot miss
Ex. The probability that a professional billiards player sinks a ball is 0.9. Assuming the probability remains the same throughout her turn,
b) What is the probability she won’t miss for 4 shots?
P(A) = (0.9)^4 = 0.6561
“Double negative” - success
Sink 4, 4 successes, product rule for independent events
Ex. The probability that a professional billiards player sinks a ball is 0.9. Assuming the probability remains the same throughout her turn,
c) What is the expected waiting time for a missed shot?
E(X) = q/p = 0.9/0.1 = 9
When she is going to miss a shot
Opposite thinking “missed shot”
p = success is MISSING
q = failure is SCORING
E(X) = q/p
do not
do not overthink this formula - if you do it backwards, the calculator will show you a long decimal, just flip it
Hypergeometric Distributions
Opposite of binomial distribution
Hypergeometric distributions are used for sampling without
replacement . In other words, it is a distribution having a number of dependent trials of which success and failure are the only two outcomes. Since the trials are dependent, then the probability of success(and failure) changes with each trial.
You pick a marble out of a bag and it stays out of the bag
Ex. In a computer chip factory, each chip manufactured has a 3.5% chance of being defective. A batch of 20 chips is chosen at random. What is the probability that, at most, 10% of the chips chosen are defective?
0.10 X 20 = 2 -> AT MOST
OR 0,1,2 -> CASES
Note: even though you are selecting without replacement, you use the binomial distribution because each chip is equally likely to be defective.
P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2)
= (20 C 0) (0.035)^0 (0.965)^20 + (20 C 1) (0.035)^1 (0.965)^19 +
(20 C 2) (0.035)^2 (0.965)^18
= 0.9687
Ex. Five cards are drawn from a shuffled deck of regular playing cards (without replacement). Construct a probability distribution table for the number of spades in the hand.
of spades, x
P(X = x)
0
(13 C 0)(39 C 5)/(52 C 5) = 0.2215
1
(13 C 1)(39 C 4)/(52 C 5) = 0.4114
2
(13 C 2)(39 C 3)/(52 C 5) = 0.2743
3
(13 C 3)(39 C 2)/(52 C 5) = 0.0815
4
(13 C 4)(39 C 1)/(52 C 5) = 0.0107
5
(13 C 5)(39 C 0)/(52 C 5) = 0.000495
Expected value E(X) — In a hypergeometric distribution,
the probability of success (or failure) in a sample should be proportional to the population.
Ex. How many spades would you expect in a hand of 5 cards?
population = sample
13/52 = x/5
*cross multiply
52x = 65
x = 65/52
x = 1.25
Look at the table
P(1) = 41%
P(2) = 27%
Expect to get 1 spade in a hand of 5 cards.
Ex. In the Spring, the Ministry of the Environment caught and tagged 500 raccoons in a wilderness area. The raccoons were released after being vaccinated against rabies. To estimate the raccoon population in the area, the ministry caught 40 raccoons during the summer. Of these, 15 had tags.
a) Determine whether this situation can be modelled with a hypergeometric distribution.
Yes! > Dependent, no replacement (once tagged, they are set aside), probability of success changes
b) Estimate the raccoon population in the wilderness area.
tags/total, population = sample
500/x = 15/40
15x = 20000
x = 1333.33
Up to now, we have been examining discrete probability distribution models, such as the uniform, binomial, geometric and hyper-geometric distributions. We will now consider data better represented by a
continuous model.
Positively Skewed
The hump is to the left (something was big, positive, and decreased over time)
“tail” is to the right (trails off into fewer children)
Ex. # of children in a Canadian family
Positive part deals with the mean skews it toward extreme values
Negatively Skewed
The hump is to the right (something was small, negative, and increased over time)
“Tail” is to the left (younger age of boomers)
Negative part deals with the mean skews the mean toward extreme values
Bimodal Distribution
2 modes
Ex. shoe sizes and height of population
Height/weight of men and women
Exponential Distribution (not really tested on)
Predicts the waiting time between consecutive events in any random sequence of events.
Ex. Car accidents at a particular intersection and time between calls at a company’s switchboard
The Normal Distribution
“Bell curved”
A continuous distribution that ideally is symmetric about
the mean and has the same mean, median, and mode.
The total area under the curve is equal to 1.
The Standard Deviation (co) is the distance from the mean to the point of inflection.
Ex. Heights of males/females, weights of males/females, gas consumption, IQ, leaf sizes, admissions to U of T, # of raisins in a box of cereal.
Class averages
In a normal distribution:
68% of the data lies within one standard deviation of the mean
95% of the data lies within two standard deviations of the mean
99.7% of the data lies within three standard deviations of the mean
within one standard deviation of the mean
68% of the data lies
within two standard deviations of the mean
95% of the data lies
within three standard deviations of the mean
99.7% of the data lies
Giselle is 168 cm tall. In her high school, boys’ heights are normally distributed with a mean of 174 cm and a standard deviation of 6 cm. What is the probability that the first boy Giselle meets at school tomorrow will be taller than she is?
Begin by making a diagram that has the mean in the middle and extend the graph by placing the standard deviations
By doing this we can see that Giselle falls nicely on the SD line
N(174, 6^2)
In order to find out who is TALLER than Giselle,
P(x>168) as 168 is Giselle’s height
We can use 50% because 174 is the mean so from 174 and onwards we get 50%
However, the interval Giselle falls into is not accounted for yet but we know that it is apart of the two intervals with 68 percent of the data
Thus, we can go 50% + 68%/2 = 50% + 34%
= 84%
The probability that the first boy Giselle meets is taller then her is approx. 84%
Example 1 is an example in which the area under the normal curve was taken from a standard deviation line in which case the percentages listed above could be used to find the probability. If the area under the normal curve has to be taken when the value does not fall on a distribution line, then we need to find the
z-scores of the values and use the z-score chart to find the corresponding probability.
Note that the area listed in the chart with respect to each z-score is the area under the normal curve taken from the very
left edge of the curve to the value.
“Less than”
from the left
“Greater than”
use 1 - the values from the left
a) P(z < -0.78)
= 0.2177
b) P(z > 1.53)
= 1 - P(z < 1.53)
= 1 - 0.9370
= 0.0630
c) P(-1.00 < z < 1.50)
= P(z < 1.50) - P(z< - 1.00)
= 0.9370 - 0.1587
= 0.7745
The above probabilities only work for N(0,1^2). Data with any other mean and SD needs to be converted to a mean of 0 and standard deviation of 1 using a
standardization process (z-scores)
Example 3:
Find the probability that a first-year student picked at random at a university got between 70 and 80 as an overall average if the mean was 72 and the standard deviation was 5.
N(72,5^2)
Draw a bell shape with 72 in the middle and 3 SDs on each side
Recognize that 70 and 80 do not fall on the line
Complete a z-score for both 70 and 80
z-score: for 70 =
Z = x - mean/SD
Z = 70 - 72/5
Z = -0.4
Z = 0.3446
z-score: for 80
Z = x - mean/SD
Z = 80 - 72/5
Z = 1.6
Z = 0.9452
Go up to 80 and subtract up to 70
P(70 <x <80)
= P(x < 80) - P(x < 70)
=P(z < 1.6) - P(z < - 0.4)
= 0.9452 - 0.3446
= 0.6006
z-score calculation
Z = x - mean/SD
when it asks
for
EXPECT
and it gives you a sample
DO THE HYPERGEOMETRUC AND CROSS
MULRILTL
- A bag contains two red, five black, and four
green marbles. Four marbles are selected at
random, without replacement. Calculate
a) the probability that all four are black
(5 C 0)(5 C 4)/(11 C 4) =0.0152
- A bag contains two red, five black, and four
green marbles. Four marbles are selected at
random, without replacement. Calculate
b) the probability that exactly two are green
(4C2)(7C2)/(11C4) = 0.3813
- A bag contains two red, five black, and four
green marbles. Four marbles are selected at
random, without replacement. Calculate
c) the probability that exactly two are green
and none are red
(4C2)(5C2)/(11C4) = 0.1818
- A bag contains two red, five black, and four
green marbles. Four marbles are selected at
random, without replacement. Calculate
d) the expected numbers of red, black, and
green marbles
red: 2/11 = x/4
11x = 8
x = 0.7272
black: 5/11 = x/4
x = 1.818
green: 4/11 = x/4
x = 1.455
In the following equation: P = (9 C 2) (2/5)^2 (3/5)^7 , identify the:
Number of trials:
Number of successes:
Probability of failure:
Number of trials: 9
Number of successes: 2
Probability of failure: 3/5
Determine the type of continuous distribution (normal, positively or negatively skewed, bimodal) that would best represent the following situations?
a) The heights of Ontarians.
b) The age of people who have grey hair.
c) The shoe size of NBA players.
d) The age of all people at SDSS.
a) The heights of Ontarians. Bimodal
b) The age of people who have grey hair. Negatively Skewed
c) The shoe size of NBA players. Normal
d) The age of all people at SDSS. Positively Skewed
A bag of marbles contains 18 red, 7 green, and 23 blue marbles. If marbles are chosen at random and replaced, determine the probability that a green marble is not chosen until the 10th try?
Success: choosing green marble, 9 failures then success
Total: 45 marbles (Waiting until the 10th try for success -> geometric)
q (not choosing green) = 41/58
p (choosing green) = 7/58
(41/48)^9 (7/48)^1 = 0.0353 OR 3.53%
Therefore, the probability that a green marble is not chosen until the 10th try is 3.53%.
Kaitlyn plays a game at the Markham Fair. If she rolls a 2 or 3 with a fair, six-sided die, then she wins n^2 dollars. If she rolls anything else, then she will lose n dollars. By calculating the expected payout for this game, determine if this is a fair game or not.
on dice
Outcome in $
P(X=x)
1
-$1
1/6
2
+$4
1/6
3
+$9
1/6
4
-$4
1/6
5
-$5
1/6
6
-$6
1/6
P(E) = (-1)(1/6) + (4)(1/6) + (9)(1/6) + (-4)(1/6) + (-5)(1/6) + (-6)(1/6)
= -0.5
not a fair game
The probability that Oscar makes a foul shot in basketball is 3/17. Oscar attempts 4 foul shots in today’s game. Let X be the random variable defined as the number of successful shots Oscar gets in 4 attempts.
a) Complete the following probability distribution table for x
x
P(X=x)
0
(4C0) (3/17)^0 (14/17)^4 = 0.45996
1
(4C1) (3/17)^1 (14/17)^3 = 0.3942
2
(4C2) (3/17)^2 (14/17)^2 = 0.1267
3
(4C3) (3/17)^3 (14/17)^1 = 0.0181
4
(4C4) (3/17)^4 (14/17)^0 = 0.00097
The probability that Oscar makes a foul shot in basketball is 3/17. Oscar attempts 4 foul shots in today’s game. Let X be the random variable defined as the number of successful shots Oscar gets in 4 attempts.
b) Find the expected number of shots Oscar gets in if he attempts ten shots
E(X) = np
= 10(3/17)
= 1.765 -> 1 or 2 expected shots made?
P(1) = 0.3942 P(2) = 0.1267
Therefore, making 1 shot has a higher probability than making 2 so the expected # of shots made in 10 attempts is 1.
To win a game of chance using a 12-sided die, you must roll a 6 or a 10.
a) If the game is played 150 times, how many wins are expected?
E(X) = np
= 150(2/12) = 25
Therefore, 25 wins are expected if the game is played 150 times.
To win a game of chance using a 12-sided die, you must roll a 6 or a 10.
b) How many games are you expected to wait before winning?
E(X) = q/p
= 10/12 ÷ 2/12 -> cross out the 12s
= 10/2
= 5
q = probability of losing p = probability of winning
Five cards are drawn from a shuffled deck of regular playing cards (without replacement).
a) Construct a probability distribution table for the number of face cards in the hand.
of face cards, x
P(X = x)
0
(12C0)(40C5)/(52C5) = 0.2532
1
(12C1)(40C4)/(52C5) = 0.42197
2
(12C2)(40C3)/(52C5) = 0.2509
3
(12C3)(40C2)/(52C5) = 0.0660
4
(12C4)(40C1)/(52C5) = 0.0076
5
(12C5)(40C0)/(52C5) 0.00030
Five cards are drawn from a shuffled deck of regular playing cards (without replacement).
b) How many face cards would you expect in a hand of 5 cards?
12/52 = x/5 P(1 face card) = 0.42197 P(2 face cards) = 0.2509
52x = 60
x = 60/52
x = 1.1538 -> 1 or 2?
Therefore, the probability of 1 face card is higher than getting 2, I would expect 1 face card in a hand of 5.
The LSAT law school entry test has a mean score of 150 and a standard deviation of 10.
Without using z-scores, determine the probability that a law school candidate selected at random will score at least 140 on the test. (Draw a diagram to illustrate).
Mean = 150
SD = 4
P(x > 140)
= 50% + 68/2
= 50% + 34%
= 84%
At Trendy Trinkets, the number of trinkets in each box sold is normally distributed. The mean number of trinkets in a package is 35 with a standard deviation of 4. A box of trinkets will only be sold if there are between 33 and 38 trinkets in the box. What percentage of boxes would you expect to be rejected for sale?
Mean = 35
SD = 4
Zscores
38 - 35/4
= 0.75
33 - 35/4
= -.050
P(33>x>38) -> will be sold
= P(x<38) - P(x<33)
= P(z<0.75) - P(z< -0.50)
= 0.7734 - 0.3085
= 0.4649 OR 46.49% <- # of boxes sold
100% - 46.49% = 53.51%
Therefore, I would expect that 53.51% of the boxes would be rejected for sale.
PERFORMANCE PROBLEM: Field Goalie
A high school field goal kicker has an 85% chance of kicking a successful field goal within 30 yards. His success rate drops to 67% outside of 30 yards. If his team had field goal opportunities of 15 yards, 27 yards, 33 yards, 39 yards, and 41 yards, what is the probability that the kicker made 80% of his field goals?
15yd, 27yd = Within 30yd -> 85% 2 opportunities
33yd, 39yd, 41yd = Outside 30yd -> 67% 3 opportunities
Case #1: Within 30 yards (85%)
Case #2: Outside 30 yards (67%)
5 opportunities -> 4 makes and 1 miss 4/5 (80%)
Case #1: 1 success and 1 failure within 30yd
Case #2: 2 success outside 30yd and 1 failure
Case #1
(2C1)(0.85)^1(0.15)^1 x (3C3)(0.67)^3
= 0.255 x 0.30076
= 0.0767
Case 2
(2C2)(0.85)^2 x (3C2)(0.67)^2(0.33)^1
= 0.7225 x 0.4444
= 0.3211
Case 1 + Case 2
= 0.0767 + 0.3211
= 0.3978 or 39.78%
Therefore, the probability that the kicker made 80% of his field goals is 39.78%.