Stats Exam #2 Flashcards

1
Q

When interpreting a graph, what is the correct order: variable () versus variable ()

A

Y versus X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

There are four things to look for in scatterplots. What are they?

A

Direction ( + or -)
Form (linear/non)
Strength (strong/weak)
Unusual Features (outliers, groups, clusters)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the three Correlation Conditions?

A

Quantitative Variables (2 quants)
Straight Enough
No outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the Linear Model equation? Interpret its components.

A

Y hat = B0 + B1X
Y- hat: predicted value
B0: y-intercept
B1X: slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The Linear Model is the model that “______” the data.

A

best fits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

True or False: The line of “best fit” has the LEAST error.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the Residual equation, and what does it do?

A

Residual = observed value - predicted value
e = y - y hat
This explains the errors in the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If the residual model fits well, residuals will all be close to what number?

A

0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Ex: A popular food item is said to have 31 grams of protein and 36.6 grams of fat. It actually has 22 grams of fat. If X = grams of protein, Y = grams of fat, and the line of fit for the module is Fat hat = 8.4 + 0.91(protein), calculate the residual for this observation & interpret.

A

y hat = 8.4 + 0.91(31) = 36.6
e = 22 - 36 = (-14.6)

Actual data is 14.6 grams of fat less than what the model predicts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

True or False: Line with the MOST residual value is the linear model.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The std. dev. of the model is the distance from y-bar. When finding std. dev. what is done to the residual values?

A

They are squared to make all values positive. Best fitting will have the least amount of squared residuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Interpret b0 and b1.

A

b0 = y-intercept and is where the line crosses the y-axis.

b1 = slope of the line that explains how rapidly y hat changes as a result of x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Interpret this linear module’s slope. Fat hat = 8.4 + 0.91(protein)

A

b1 = 0.91(protein)
For every additional gram of protein, expect there to be an additional 0.91 grams of fat, on average.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Interpret this linear module’s y-intercept. Fat hat = 8.4 + 0.91(protein)

A

b0 = 8.4 grams of fat
An item with 0 grams of protein would expect to have 8.4 grams of fat.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

True or False: the X variable can be practical or non-practical.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Ex:
y = avg. home game attendance per year.
x = number of wins per year

Is X practical?

A

No, because no professional team has ever lost every home game.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Ex:
y = total # of hours on the internet per month
x = # of Facebook friends

Is X practical?

A

Yes, because one can be on the internet and not use Facebook.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do you find b1? Will slope direction/sign match correlation coefficient sign?

A

Correlation times standard deviation of (y var/x var).

b1 = r(Sy/Sx)

Yes, the signs will match. A negative r makes for a negative slope and vice versa.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How do you find b0?

A

b0 = y bar (avg. of y) minus b1 (slope) times x bar (avg. of x)

b0 = y bar - (b1 times x bar)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the four conditions for regression?

A

Quantitative Variables (2 quants)
Straight Enough
No Outliers
Does the Plot Thicken?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

The value of r is/is not affected by variable placements?

A

It is! The Explanatory/predictor variable = x
Response variable = y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Ex: Since 1980, yearly average mortgage interest rates have fluctuated from a low of under 6% to a high of over 14%.
r = -0.8400 Sig. of prob = 0.0001
Mortgages = 220.893 - 7.775(interest)
Is there a relationship between the amount of money people borrow and the interest rate that is offered? What would you expect the relationship to look like?

(assume they pass the straight enough and no outliers conditions)

A

These variables pass the quantitative condition. X = interest Y = mortgage

The correlation shows that this module is negative, strong, and linear. The sig. prob shows that this data is statistically significant as it is less than 0.05.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Ex Continued: Since 1980, yearly average mortgage interest rates have fluctuated from a low of under 6% to a high of over 14%.
r = -0.8400 Sig. of prob = 0.0001
Mortgages = 220.893 - 7.775(interest)

Interpret the data.

A

b1 = -7.775
On avg., for every additional increase in the interest rate, expect to see the mortgage decrease by 7.775 billion.

b0 = 220.893 billion.
When the i-rate is 0, expect to have a mortgage of 220.893 billion. There is no practical interpretation of x bc the I-rate has never hit 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Ex: y hat = 220.893 - 7.775x
Observation #21 is y = 168.2 ($billions) and x = 7.9 (%)
Calculate the predicted value associated with this observation and interpret. Calculate the residual for this observation and interpret.

A

y hat = 220.893 - 7.775(7.9) = 159.47
When the I-rate is 7.9%, one can expect the avg. mortgage to be 159.47.

e = 168.2 - 159.47 = 8.73
The model underpredicted. The actual mortgage was $8.73 billion more on average than expected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

True or False: Residuals have not been modeled by the regression equation.

A

True.

e = y - y hat OR residual = data - module

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

When observing a scatter plot of the residuals, do we want to see a pattern?

A

No.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Regression has 4 conditions. What are they?

A

If data passes Quantitative, straight enough, and no outliers conditions, make the module.
The Does the plot thicken condition concerns residual plots and whether or not they change/have patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

If the regression line fit all the data perfectly, the standard deviation of residuals (Se) should be what number?

A

0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Squared correlation (r)^2 = R^2.
R^2 gives the proportion of the data’s variance that is _______ for by the model.

A

accounted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

For the popular food ex:
Correlation chart shows
Fat(g) Protein(g)
Fat(g) 1.00 0.76
Protein(g) 0.76 1.00

Find the R^2 and interpret

A

R^2 = (0.76)^2 = 0.58

This can be found in the Summary of Fit Chart.
It means that 0.58 or 58% of the data is accounted for by the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

R^2 = 0 : No variance in data is in the model. All in residuals

R^2 = 1.0 : All of the variance in the data is captured by the model.

In the popular food ex, 58% of the variation in total fat (y) is associated with/explained by the variation in protein content (x).

A

no answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Ex: Verify that the square of the correlation coefficient (r) is RSquare. Interpret its value in the context of these data.

Correlations
Mortgages Interest Rate
Mortgages 1.0000 -0.8400
Interest Rate -0.8400 1.0000

Summary of Fit:
RSquare: 0.705573
RSquare Adj: 0.693305
Root Mean of Square Error: 13.21113
Mean of Response: 151.8731
Observations (or Sum Wgts): 26

A

r = square root of RSquare
r = square root of 0.705573 = 0.84

70.56% of the variation of mortgages is explained by the variation of interest rate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Ex: A graph shows that Max wind speed = 1031.24 - 0.975(central pressure).

To work backwards with RSquare to find the correlation you take the square root of what? To find the direction of correlation (r) what do you look for?

A

RSquare

Ex:
RSquare = 0.81
r^2 = square root of 0.81 = 0.9

Because the slope is negative and you use correlation to find slope, assume correlation is also negative. r = (-0.9)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

True or False: the value of RSquare should be withheld from the audience.

A

False. It should be reported.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is it called when you use a regression equation outside of the context by plugging in a value of x that is outside of the range of values for the data?

A

An extrapolation. This can be very dangerous and inaccurate information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

True or False: While an r value must be between -1 and 1, the slope of a regression line can by any value.

A

True! When b1 has a slope other than 0.0, it indicates some linear association between x & y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Always initially assume regression is by random chance. What can you use to finalize this answer?

A

P-value & 0.05 threshold.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

If an audience has two spinners that are different, should there be any statistically significant association?

A

No! It should be by random chance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Threshold says that if the P-value is “greater than 0.05” the association _________. If the P-value is “less than 0.05,” the association _______.

A

is by random chance; statistically significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Ex: Data for a linear fit of y = 6.2276 - 0.317x shows a probability of 0.1284. Is this by random chance or no?

A

Because this is greater than 0.05, this data is by random chance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

True or False: Residual plots can sometimes expose more subtle curved relationships in data than the original scatterplot.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Ex: What is the relationship between Run Time (minutes) and Budget (millions)? Which variable should be x and which should be y? Assume all four conditions are met.

b1 = 0.7144001
b0 = -31.38695
RSquare = 0.154156

A

X = run time; y = budget

Regression Analysis: Budget hat = -31.38695 + 0.714001(run time)

In the case of this data, 15.4% of the budget variance is explained by run time. 85% of the budget variance is explained by other variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Can doing multiple, smaller analysis on the same data set create a stronger line of fit?

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What are extrapolations?

A

Predicted values outside of the range of data available. They are questionable assumptions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

True or False: Though dangerous, extrapolation is sometimes necessary.

A

True. Regression, Extrapolation, and “forecasting” are better options that guessing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Because the correlation coefficient is used to create the slope, outliers can _______ a regression analysis.

A

strongly influence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Correlation does NOT mean causation. When there is a high RSquare, do changes in x cause variation in y?

A

Not necessarily.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Ex: In a given data set of doctors/person and avg. life expectancy, RSquare is 0.629. In the set TVs/person vs. avg. life expectancy, RSquare is 0.725. Does this mean TVs are better for health than doctors?

A

No! This just means there are lurking variables affecting the data. For example, in places with higher standards of living, many people who have longer life expectancies have more doctors AND tvs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What do randomized samples attempt to do?

A

Reduce bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

How do we learn details about the population?

A

Through the average of the population or a sample statistic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Population parameter is whatever we’re analyzing about the population. It can be a mean, std. deviation, percentile, percent, etc. How is this found?

A

Through taking a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Sample should be as ________ of the population as possible.

A

representative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

Notation is very important. Name the statistic and parameter denotations.
Stat. Parameter
Mean:
Std. Dev:
Correlation:
Regression Coefficient:
Proportion:

A

Stat. Parameter
Mean: y bar M (mu)
Std. Dev: s sigma
Correlation: r p (row)
Regression Co: b B (beta)
Proportion: p hat p (pi)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

What matters when taking a sample? (two answers)

A

Size & collection process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

The sampling frame is a list or collection of things/individuals from which the sample was ________.

A

drawn/collected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

Sampling frame can be biased if it misses part of the population. Provide an example.

A

Kroger takes sample from ppl w/ Kroger cards. This excludes customers w/o cards. The sampling frame is strictly customers w/ the loyalty card.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

The population is the what of the sample?

A

entire group of individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

Sampling frame is the what of the sample?

A

List of all individuals from which the sample was drawn (who is eligible to be sampled).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

The sample design is the what of the sample?

A

Method used to draw the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

The sample of data is what?

A

Those actually chosen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

What does it mean for a sample to be representative of the population?

A

Good, unbiased

61
Q

What is voluntary response bias?

A

This happens when a group is invited to respond but only those who participate are counted in the study. The sample only includes those with a strong interest in the topic.

62
Q

Ex: Statistics 201 samples students opinions on a range of issues and uses the data for a class project. Students aren’t required to respond. All data from students who had strong enough reason to participate in the study are included. What bias is found here?

A

Voluntary Response Bias

63
Q

What is Convenience Sampling?

A

Convenience sampling is only representative of individuals w/ high interest in the topic OR they are of convenience for the study (in the relative area, social media follower, etc.)

64
Q

Ex: Study using one’s Instagram followers about the person OR a table is set up in SU cafeteria about SU food options. What bias is found here?

A

Convenience Sampling

65
Q

What is undercoverage?

A

Undercoverage is a when a portion of the population is not sampled or has less representation than original population.

66
Q

Ex: The US census sends a survey to every household in America. What bias is found here?

A

Undercoverage is present because the homeless population has been left out. They might have no access to the study, no way to send back the survey, no household, etc.

67
Q

What is nonresponse bias?

A

This happens when those that don’t respond to the survey differ in opinions from those that do.

68
Q

Where is nonresponse bias predominantly found?

A

Telephone surveys

69
Q

What is response bias?

A

Response bias occurs when external influences affect a candidate’s answer.

70
Q

Wording the question to lean in one direction, people not wanting to admit flaws, people hiding personal facts, and people not wanting to admit to illegal acts are all examples of what kind of bias?

A

Response bias

71
Q

What are the four main sampling designs and one combo?

A

Simple Random Sample, Systematic Sample, Stratified Sample, Cluster Sample, and Multistage Sample

72
Q

What is a simple random sample?

A

Elements of a population are enumerated with equally likely chances of selection. Use this when there is easy access to the entire population.

73
Q

Drawing straws, random # generator, etc. are examples of what type of sampling method?

A

SRS

74
Q

What is Systematic sampling?

A

Every kth element sampled.

75
Q

Ex: The police set up a roadblock where every 5th car is stopped and searched. What kind of sampling design is this?

A

Systematic

76
Q

What is stratified sampling?

A

Population put into distinct groups known as Strata. Then a SRS is done within each strata. Completed when we know there are groups that are different from each other within the population such as gender, income, ethnicity, etc.

77
Q

What is cluster sampling?

A

Cluster sampling is used for non-identifiable strata (similar groups). The clusters are collected at random then a census is done within each.

78
Q

Ex: You have 100,000 bags of M&Ms (200/bag) = 20 million M&Ms. You want to know how many are blue. Using a sample size of 1000, what design should you use?

A

No need to stratify because they are all plain M&Ms. Make each bag a cluster; then SRS 5 of the bags & complete a census.

79
Q

What is multistage sampling?

A

A design that contains several methods of sampling.

80
Q

Most surveys for professional polling a combination of Stratified, cluster, & SRS methods for sampling. What method is this?

A

Multistage

81
Q

Ex: Police officers stop every 10th car and check the driver for evidence of alcohol use. This is what sampling design?

A

Systematic

82
Q

Ex: An academic officer enters 25 randomly chosen classrooms and surveys all students about whether their instructors dress well. This is what sampling design?

A

Cluster

83
Q

Ex: Students on both the main campus and the agricultural campus are randomly selected and asked how easy it is to park when coming to school here at UT. What sampling method is this?

A

Stratified

84
Q

Ex: TSA’s method for determining which airline passengers are selected for additional security screening is known as what design method?

A

Multistage: Random, Stratified, etc.

85
Q

Occasions with random phenomenon are called _______. The value of the random phenomenon are called ________. A combo of outcomes are known as an _______.

A

trials; outcomes; event

86
Q

True or False: the collection of ALL possible outcomes is called the sample space and denoted (S).

A

True

87
Q

If individual trials are independent, then the outcome of one trial ______ influence/change the outcome of another trial.

A

DOES NOT; ex: rolling die or flipping coins

88
Q

The Law of Large Numbers (LLN) says that in the long-run, relative frequency of repeated independent events gets closer to a single value. What is a ‘single value’ in the case of probability?

A

the probability of the event happening

89
Q

True or False: The law of averages says that if you flipped a coin four times and get heads, the fifth flip will also be heads.

A

False; it says it “must” be tails

90
Q

Subjective/Personal Probabilities are based on what?

A

non-mathematical calculations

91
Q

There are times when all possible outcomes are NOT equally likely. Name an example.

A

Winning the lottery

92
Q

If outcomes are equally likely, they can use the probability (an event) equation.

P(an event) = (# of outcomes in A)/(# of possible outcomes)

What is the probability of flipping heads in a coin toss? What about rolling a 6 on a die?

A

P(head) = 1/2 = 0.5 or 50%
P(6) = 1/6 = 0.1667 o 16.67%

93
Q

Ex: You have one of every denomination of US paper currency. What is the probability that you select a bill with a portrait of a president?

S = {$1, $2, $5, $10, $20, $50, $100}

Hint 1, 2, 5, 20, and 50 are presidential bills.

A

P(President) = 5/7 = 0.714 or 71.4% chance

94
Q

Ex: you flip a fair coin 4 times. There are 16 equally likely possible outcomes. What is the probability that you get at least three heads?

A

P(at least 3) = 5/16 = 0.3125 or 31.25%

95
Q

There are a few requirements for probabilities. For example, a probability must be between what two integers?

A

0 and 1

96
Q

For any event A, 0 < x < 1.

If P(A) = 0, event A ___________.
If P(A) = 1, event A _____________.

A

can NEVER happen; will happen with ABSOLUTE certainty

97
Q

The Probability assignment rule says that S represents the set of all possible outcomes. If this is the case, what should P(S) equal?

A

P(S) = 1.
ex: roll a fair six-sided die. the possible outcomes are S = {1,2,3,4,5,6}. Therefore, P(S) represents the probability of getting 1-6. Since these are the only options, P(S) = 1 or 100% chance.

98
Q

The Complement rule says that a set of outcomes NOT in event A is the complement of A. What is the equation for this probability?

A

P(not in A) = 1 - P(A)

P(A) = 1 - P(not in A)

Therefore, P(A) + P(not in A) = 1

99
Q

Ex: You have one of every denomination of US paper currency. What is the probability that you do not select a bill with a portrait of a president?

S = {$1, $2, $5, $10, $20, $50, $100}

Hint 1, 2, 5, 20, and 50 are presidential bills.

A

P(president) = 5/7 so P(not president) = 1 - 5/7 = 2/7 or 0.286

28.6% chance of not picking a presidential bill

100
Q

An event with no outcomes in common that cannot simultaneously occur is called what?

A

Disjoint or mutually exclusive

101
Q

In order for the addition rule of probability to work, the events must be disjoint/mutually exclusive.

Ex: roll a single die. Let
Event A = 5 or more; Event B = 2 or less; Event C = even #

  1. Are events A & B disjoint?
  2. Are events A & C disjoint?
A
  1. Yes, because you cannot roll a 5,6, 2, or 1 all in one role.
  2. No because rolling a 6 satisfies both events
102
Q

For two disjoint events, A & B, the probability that one OR the other occurs is the _____ of both events’ probabilities.

A

Sum.

P(A or B) = P(A) + P(B)

103
Q

For an individual trial to be independent, the result of one trial cannot ________ the outcome of another trial.

A

change/influence

ex: coin flips are independent

104
Q

The multiplication rule calculates the probability of more than one thing occurring. For example, flipping two heads in a row. To use the multiplication rule, the events must be _______ of one another.

A

independent

105
Q

The multiplication rule states that the possibility of both events, A & B, occurring is the _______ of their probabilities.

A

product

Ex: P(A and B) = P(A) times P(B)

106
Q

Ex: The probability of an airline not losing your luggage is 0.993. What is the probability of the airline never losing your luggage on 3 flights?

A

P(not lost) = 0.993
P(not lost on 3 flights) = 0.993 x 0.993 x 0.993
OR (0.993)^3 = 0.979

107
Q

Ex: The probability of an airline not losing your luggage is 0.993. What is the probability of the airline losing your luggage at least once on 3 flights?

A

P(at least one) = 1 - P(none)

P(none) = (0.993)^3 = 0.979

P(at least one) = 1 - 0.979 = 0.021

108
Q

Ex: An opinion polling org contacts their respondents by phone. The probability of contacting someone using this method is 0.76. Assume this method is independent.

A.) The previous call didn’t make contact. What’s the probability the next call will?
B.) Probability the interviewer successfully contacts the next two callers.
C.) Probability the interviewer’s first contact is the third call.
D.) Probability the interviewer makes at least one contact among 5 calls.

A

A.) P(contact) = 0.76 & independent, so 0.76

B.) P(contact) = 0.76 P(next 2) = 0.76^2 = 0.5776

C.) P(no contact) = 1- 0.76 = 0.24
P(third success) = 1 - (0.24)(0.24)(0.76) = 0.0438

D.) P(at least one) = 1 - P(none)
P(none) = 1 - 0.76 = 0.24
P(at least one) = 1 - (0.24)^5 = 1 - 0.000796 = 0.999204

109
Q

Are these events independent? Mutually exclusive?

Event A = stock market increases by 100 points next Tuesday
Event B = Knox gets less than 0.5’ rain next Tuesday

A

Not disjoint, but independent!

110
Q

Are these events independent? Mutually exclusive?

Event A = high temps on next 6/1 in Knox will be over 95 degrees.
Event B = Ice-cream truck comes by your house but is out of your favorite flavor next 6/1.

A

Dependent & not mutually exclusive

111
Q

To estimate a whole population, what situations can we look at regarding population parameters?

A

Situations where the population parameter is known!

112
Q

Ex: It’s known a lottery has 100 ping-pong balls each being white or gold. You must get 6 gold to win. You’re told the last draw was [w,w,w,g,g,w]. Estimate the proportion of 100 balls that are gold.

A

The only reasonable guess is 2/6 or 33-34.

113
Q

What theorems are fundamental in impacting our ability to statistically guess?

A

The Sampling Distributions and Central Limit Theorems.

114
Q

Properties of Sampling Distribution for a Proportion says that Event X represents a “success” of a categorical variable. Successes are defined by the context of the question, so are they always a good thing?

A

No! Success is not always the favorable outcome. For example, you could be sampling cancer patients. A patient with cancer would be a yes, but this is not the favorable outcome.

115
Q

The sample proportion is denoted with a p-hat. The equation is # of successes/sample size or X/N.

A

p hat = x/n

116
Q

What is the Sampling distribution and what does it tell us about the sample proportion?

A

It is the distribution of ALL possible sample proportions, and it explains the shape (center & spread)

117
Q

P hat is the sample proportion. The Central Limit Theorem (CLT) explains the expected value of p hat and the standard deviation of p hat. What are they?

A

The expected value of P hat is p, the true population proportion. This is the center.

m(p hat) = p

The Standard deviation of p hat is the typical different between what we measure and the true population proportion. This is the spread.

SD(p hat) = the square root of (pxq)/n where q = 1 - p; p = true population proportion

118
Q

True or False: the shape of the sampling distribution is approximately normal.

A

True! (when CLT is met)

119
Q

Ex: NCHS found that out of all 3,945,102 U.S. live births in 1998, 11.6% were premature. The avg. of all p hat (sample proportions) should be close to what?

A

11.6%

120
Q

Ex: NCHS found that out of all 3,945,102 U.S. live births in 1998, 11.6% were premature. If the population parameter of premature births is 11.6%, then the true population proportion should be p = ______

For a sampling distribution of a sample proportion of sample size (n = 100):

The expected value is:
The Std. Dev. is:

A

p = 0.116
m(p hat) = p = 0.116

q = 0.884
std. dev. = square root of (0.116 x 0.884)/100 =
0.032

121
Q

True of False: Sample variability should never be accounted for.

A

False! It must be considered

122
Q

Ex: Calculate the std. dev. of p hat when p = 0.75 and n = 50

A

SD(p hat) = square root of (0.75x 0.25)/50 = 0.0612

More samples = smaller Std. dev.
Less samples = bigger std. dev

123
Q

CLT implies sampling distributions are approximately normal meaning the distribution should be ______

A

A normal distribution! The more samples makes for a better approximation

124
Q

There are three assumptions and conditions to check for with sampling distributions of proportions. What are they?

A

Randomization: the sample should be a simple random sample of the population

10% condition: the sample size, n, must be no larger than 10% of the population size

Success/Failure Condition: the sample size has to be big enough so that both np and nq are at least 10.

125
Q

Ex: NCHS found that out of all 3,945,102 U.S. live births in 1998, 11.6% were premature. Thus the population proportion of premature births is 0.116. Suppose we sampled 925 of these births and 124 were premature. What is p hat (sample proportion)? What is std. dev?

This was random sample, 925/3,945,102 = 0.000234 which is way less than 10%, and
np = 0.116(925) = 107.3 & nq = 0.884(925) = 817.7 both of which are larger than 10. All conditions passed.

A

p hat = x / n = 125/925 = 0.134 or 13.4%

std. dev. = square root of (0.116x0.884)/925 = 0.011

The margin of error is generally taken to be +/- 2 standard deviations. In this case 0.011 or 1.1 x 2 = 2.2. If the true proportion were 0.116, 95% of samples from n=925 would have sample proportions ranging 0.094 to 0.138.

z-score = (observed value - expected)/std. dev
z-score = (0.134-0.116)/0.011 = 1.71 Std. devs. which is not unusually large

126
Q

Ex: 52% of voters plan to vote ‘yes’ on budget. SRS 300 voters. What might be the % of yes voters in poll?

Random? yes; srs 10%? yes np = 156 nq = 144

A

p = 52% p = 0.52 q = 0.48 n = 300
p = 0.52 –> center std. dev. = square root of (0.52x0.48)/300 = 0.029 –> spread

127
Q

Ex: A company produces AAA batteries. It’s known that 2% defective. If UT got 25,000 batteries how many might not work?

A

np = 25,000(0.002) = 50 nq = 25,000(0.98) = 24950 m(p hat) = p = 0.002

std. dev. = square root of (0.002x0.98)/25000 = 0.0002826

128
Q

True or False: the CLT states that the distribution of ALL possible averages of a population is nearly normal.

A

True

129
Q

There are three assumptions and conditions to check for when sampling distribution of means. What are they, and what do they look for?

A

Random Condition: SRS of population

10% condition: sample size, n, must be no larger than 10% of population size

Large enough condition: the more unimodal and symmetric the population, the less samples you need. As a rule of thumb:
n = 25 is usually large enough for most pops
n = 100 may be required if population is very skewed

130
Q

The normal model is a close approximation of sampling distribution of the sample mean for smaller samples when the population is close to the ______.

A

normal distribution

131
Q

The more non-normal the distribution of population is, the _____ the sample size needs to be to have an approximately normal distribution of sample mean.

A

larger

132
Q

True or False: the smaller n is, the closer the sample avg. is to the population mean (Law of large numbers)

A

false; the larger

133
Q

CLT says that the sampling distribution of any mean or population is approximately _______.

For proportions: clt is centered at population prop., p. where m(p hat) = p

For means: clt is centered at population mean, m. where m(y bar) = M

A

normal

134
Q

The equation for std. dev. of a sample proportion is SD(p hat) = square root of (pxq)/n. What is the equation for std. dev. of a sample mean?

A

SD(y bar) = sigma/square root of n
where sigma = pop. std. deviation

135
Q

Ex: Mr. Brady needs to build a balcony capable of supporting the weight of 100 people. (n = 100) He wonders is he makes a balcony designed to support 19,000 lbs, will it be sufficient? A survey of over 8000 adults implies that the mean weight of U.S. adults aged 20 and above is 176 with a standard deviation of 61. [m = 176; sigma = 61]
First check conditions:

Random: yes (100 random ppl)
10%: yes (100 is much less than 10% of US population)
Large enough: yes (100 is enough)

A

Because the conditions are met, we can model the average weight of 100 people as a normal distribution. The true pop avg. m(y bar) = 176.
Std. dev. = 61/square root of 100 = 6.1

y bar = N(176, 6.1) where 176 is the center and 6.1 is the spread

190 is over two standard deviations from the mean at 2.295, so the probability of the patio holding is 0.989 or 98.9%. There is only a 1.1% chance it doesn’t hold, so it is safe for Mr. Brady to take the chance.

136
Q

If the correlation coefficient is equal to 0,50, what is RSquare equal to?

A

0.25

137
Q

If a residual is positive, which was larger?

Intercept
Slope
Actual
Predicted

A

Actual

138
Q

Find the predicted value for a regression equation where:
b0 = 10
b1 = 2
Obs. value of x = 9

A

28

139
Q

For each one unit increase in X, we expect Y to increase by b1 units, on average.

Residual interpretation
Intercept interpretation
Rsquared interpretation
Slope interpretation

A

Slope interpretation

140
Q

True or False: Adding together all the residuals from a regression plot will sum to zero, on average.

A

true

141
Q

When a sample has characteristics which correspond to characteristics of the population, the sample is said to be ___________.

A

representative

142
Q

True or False: Response bias is always possible in a survey that requires an answer from a human being.

A

True

143
Q

Ex: You are heading to the LLN Casino for Spring break and plan to play their world-famous slot machine. The slot machine at the LLN Casino has three wheels that spin when you pull the lever. When pulled, each wheel spins and stops on one of the symbols. Assume that the outcome of each wheel is independent of the outcomes of the other 2 wheels.

Each wheel has 10 equally likely symbols: 4 skulls, 3 lemons, 2 Cherries, and 1 bell.

If you play, what is the probability that you get a lemon, a cherry, and a bell?

A

0.006

144
Q

Suppose the probability of the bus arriving on time is 0.58. For the next 4 stops, what is the probability the bus is on time for at least one of the stops? Assume independence.

A

0.969

145
Q

True or False: In probability, the value of each trial is called a consequence.

A

False; it is an outcome

146
Q

True or False: All other things being equal, the larger the sample size, the smaller the standard deviation of the sampling distribution.

A

True

147
Q

True or False: the Success/Failure condition states that your measurement must be binary, either a success or a failure.

A

False; states that you need at least 10 of each.

148
Q

According to the Central Limit Theorem, which of the following will happen to the distribution of the sample mean as the sample size increases?

The mean gets larger
The distribution gets less normal
The distribution gets more normal
The mean gets smaller

A

The distribution gets more normal

149
Q

The Central Limit Theorem relates to which of the following conditions?

10% condition
Randomization
Nearly Normal Condition

A

Nearly Normal Condition

150
Q

The sampling distribution of the sample mean will always have the same _____ as the original distribution?

Shape
Standard deviation
Mean

A

Mean