Stats Exam #2 Flashcards by Amelia Yager

When interpreting a graph, what is the correct order: variable () versus variable ()

Y versus X

How well did you know this?

Not at all

Perfectly

There are four things to look for in scatterplots. What are they?

Direction ( + or -)
Form (linear/non)
Strength (strong/weak)
Unusual Features (outliers, groups, clusters)

How well did you know this?

Not at all

Perfectly

What are the three Correlation Conditions?

Quantitative Variables (2 quants)
Straight Enough
No outliers

How well did you know this?

Not at all

Perfectly

What is the Linear Model equation? Interpret its components.

Y hat = B0 + B1X
Y- hat: predicted value
B0: y-intercept
B1X: slope

How well did you know this?

Not at all

Perfectly

The Linear Model is the model that “______” the data.

best fits

How well did you know this?

Not at all

Perfectly

True or False: The line of “best fit” has the LEAST error.

True

How well did you know this?

Not at all

Perfectly

What is the Residual equation, and what does it do?

Residual = observed value - predicted value
e = y - y hat
This explains the errors in the model.

How well did you know this?

Not at all

Perfectly

If the residual model fits well, residuals will all be close to what number?

How well did you know this?

Not at all

Perfectly

Ex: A popular food item is said to have 31 grams of protein and 36.6 grams of fat. It actually has 22 grams of fat. If X = grams of protein, Y = grams of fat, and the line of fit for the module is Fat hat = 8.4 + 0.91(protein), calculate the residual for this observation & interpret.

y hat = 8.4 + 0.91(31) = 36.6
e = 22 - 36 = (-14.6)

Actual data is 14.6 grams of fat less than what the model predicts.

How well did you know this?

Not at all

Perfectly

True or False: Line with the MOST residual value is the linear model.

False

How well did you know this?

Not at all

Perfectly

The std. dev. of the model is the distance from y-bar. When finding std. dev. what is done to the residual values?

They are squared to make all values positive. Best fitting will have the least amount of squared residuals.

How well did you know this?

Not at all

Perfectly

Interpret b0 and b1.

b0 = y-intercept and is where the line crosses the y-axis.

b1 = slope of the line that explains how rapidly y hat changes as a result of x.

How well did you know this?

Not at all

Perfectly

Interpret this linear module’s slope. Fat hat = 8.4 + 0.91(protein)

b1 = 0.91(protein)
For every additional gram of protein, expect there to be an additional 0.91 grams of fat, on average.

How well did you know this?

Not at all

Perfectly

Interpret this linear module’s y-intercept. Fat hat = 8.4 + 0.91(protein)

b0 = 8.4 grams of fat
An item with 0 grams of protein would expect to have 8.4 grams of fat.

How well did you know this?

Not at all

Perfectly

True or False: the X variable can be practical or non-practical.

True

How well did you know this?

Not at all

Perfectly

Ex:
y = avg. home game attendance per year.
x = number of wins per year

Is X practical?

No, because no professional team has ever lost every home game.

How well did you know this?

Not at all

Perfectly

Ex:
y = total # of hours on the internet per month
x = # of Facebook friends

Is X practical?

Yes, because one can be on the internet and not use Facebook.

How well did you know this?

Not at all

Perfectly

How do you find b1? Will slope direction/sign match correlation coefficient sign?

Correlation times standard deviation of (y var/x var).

b1 = r(Sy/Sx)

Yes, the signs will match. A negative r makes for a negative slope and vice versa.

How well did you know this?

Not at all

Perfectly

How do you find b0?

b0 = y bar (avg. of y) minus b1 (slope) times x bar (avg. of x)

b0 = y bar - (b1 times x bar)

How well did you know this?

Not at all

Perfectly

What are the four conditions for regression?

Quantitative Variables (2 quants)
Straight Enough
No Outliers
Does the Plot Thicken?

How well did you know this?

Not at all

Perfectly

The value of r is/is not affected by variable placements?

It is! The Explanatory/predictor variable = x
Response variable = y

How well did you know this?

Not at all

Perfectly

Ex: Since 1980, yearly average mortgage interest rates have fluctuated from a low of under 6% to a high of over 14%.
r = -0.8400 Sig. of prob = 0.0001
Mortgages = 220.893 - 7.775(interest)
Is there a relationship between the amount of money people borrow and the interest rate that is offered? What would you expect the relationship to look like?

(assume they pass the straight enough and no outliers conditions)

These variables pass the quantitative condition. X = interest Y = mortgage

The correlation shows that this module is negative, strong, and linear. The sig. prob shows that this data is statistically significant as it is less than 0.05.

How well did you know this?

Not at all

Perfectly

Ex Continued: Since 1980, yearly average mortgage interest rates have fluctuated from a low of under 6% to a high of over 14%.
r = -0.8400 Sig. of prob = 0.0001
Mortgages = 220.893 - 7.775(interest)

Interpret the data.

b1 = -7.775
On avg., for every additional increase in the interest rate, expect to see the mortgage decrease by 7.775 billion.

b0 = 220.893 billion.
When the i-rate is 0, expect to have a mortgage of 220.893 billion. There is no practical interpretation of x bc the I-rate has never hit 0.

How well did you know this?

Not at all

Perfectly

Ex: y hat = 220.893 - 7.775x
Observation #21 is y = 168.2 ($billions) and x = 7.9 (%)
Calculate the predicted value associated with this observation and interpret. Calculate the residual for this observation and interpret.

y hat = 220.893 - 7.775(7.9) = 159.47
When the I-rate is 7.9%, one can expect the avg. mortgage to be 159.47.

e = 168.2 - 159.47 = 8.73
The model underpredicted. The actual mortgage was $8.73 billion more on average than expected.

How well did you know this?

Not at all

Perfectly

True or False: Residuals have not been modeled by the regression equation.

True. e = y - y hat OR residual = data - module

When observing a scatter plot of the residuals, do we want to see a pattern?

No.

Regression has 4 conditions. What are they?

If data passes Quantitative, straight enough, and no outliers conditions, make the module. The Does the plot thicken condition concerns residual plots and whether or not they change/have patterns.

If the regression line fit all the data perfectly, the standard deviation of residuals (Se) should be what number?

Squared correlation (r)^2 = R^2. R^2 gives the proportion of the data's variance that is _______ for by the model.

accounted

For the popular food ex: Correlation chart shows Fat(g) Protein(g) Fat(g) 1.00 0.76 Protein(g) 0.76 1.00 Find the R^2 and interpret

R^2 = (0.76)^2 = 0.58 This can be found in the Summary of Fit Chart. It means that 0.58 or 58% of the data is accounted for by the model.

R^2 = 0 : No variance in data is in the model. All in residuals R^2 = 1.0 : All of the variance in the data is captured by the model. In the popular food ex, 58% of the variation in total fat (y) is associated with/explained by the variation in protein content (x).

no answer

Ex: Verify that the square of the correlation coefficient (r) is RSquare. Interpret its value in the context of these data. Correlations Mortgages Interest Rate Mortgages 1.0000 -0.8400 Interest Rate -0.8400 1.0000 Summary of Fit: RSquare: 0.705573 RSquare Adj: 0.693305 Root Mean of Square Error: 13.21113 Mean of Response: 151.8731 Observations (or Sum Wgts): 26

r = square root of RSquare r = square root of 0.705573 = 0.84 70.56% of the variation of mortgages is explained by the variation of interest rate.

Ex: A graph shows that Max wind speed = 1031.24 - 0.975(central pressure). To work backwards with RSquare to find the correlation you take the square root of what? To find the direction of correlation (r) what do you look for?

RSquare Ex: RSquare = 0.81 r^2 = square root of 0.81 = 0.9 Because the slope is negative and you use correlation to find slope, assume correlation is also negative. r = (-0.9)

True or False: the value of RSquare should be withheld from the audience.

False. It should be reported.

What is it called when you use a regression equation outside of the context by plugging in a value of x that is outside of the range of values for the data?

An extrapolation. This can be very dangerous and inaccurate information.

True or False: While an r value must be between -1 and 1, the slope of a regression line can by any value.

True! When b1 has a slope other than 0.0, it indicates some linear association between x & y.

Always initially assume regression is by random chance. What can you use to finalize this answer?

P-value & 0.05 threshold.

If an audience has two spinners that are different, should there be any statistically significant association?

No! It should be by random chance.

Threshold says that if the P-value is "greater than 0.05" the association _________. If the P-value is "less than 0.05," the association _______.

is by random chance; statistically significant

Ex: Data for a linear fit of y = 6.2276 - 0.317x shows a probability of 0.1284. Is this by random chance or no?

Because this is greater than 0.05, this data is by random chance.

True or False: Residual plots can sometimes expose more subtle curved relationships in data than the original scatterplot.

True

Ex: What is the relationship between Run Time (minutes) and Budget (millions)? Which variable should be x and which should be y? Assume all four conditions are met. b1 = 0.7144001 b0 = -31.38695 RSquare = 0.154156

X = run time; y = budget Regression Analysis: Budget hat = -31.38695 + 0.714001(run time) In the case of this data, 15.4% of the budget variance is explained by run time. 85% of the budget variance is explained by other variables.

Can doing multiple, smaller analysis on the same data set create a stronger line of fit?

Yes.

What are extrapolations?

Predicted values outside of the range of data available. They are questionable assumptions.

True or False: Though dangerous, extrapolation is sometimes necessary.

True. Regression, Extrapolation, and "forecasting" are better options that guessing.

Because the correlation coefficient is used to create the slope, outliers can _______ a regression analysis.

strongly influence

Correlation does NOT mean causation. When there is a high RSquare, do changes in x cause variation in y?

Not necessarily.

Ex: In a given data set of doctors/person and avg. life expectancy, RSquare is 0.629. In the set TVs/person vs. avg. life expectancy, RSquare is 0.725. Does this mean TVs are better for health than doctors?

No! This just means there are lurking variables affecting the data. For example, in places with higher standards of living, many people who have longer life expectancies have more doctors AND tvs.

What do randomized samples attempt to do?

Reduce bias

How do we learn details about the population?

Through the average of the population or a sample statistic.

Population parameter is whatever we're analyzing about the population. It can be a mean, std. deviation, percentile, percent, etc. How is this found?

Through taking a sample

Sample should be as ________ of the population as possible.

representative

Notation is very important. Name the statistic and parameter denotations. Stat. Parameter Mean: Std. Dev: Correlation: Regression Coefficient: Proportion:

Stat. Parameter Mean: y bar M (mu) Std. Dev: s sigma Correlation: r p (row) Regression Co: b B (beta) Proportion: p hat p (pi)

What matters when taking a sample? (two answers)

Size & collection process

The sampling frame is a list or collection of things/individuals from which the sample was ________.

drawn/collected

Sampling frame can be biased if it misses part of the population. Provide an example.

Kroger takes sample from ppl w/ Kroger cards. This excludes customers w/o cards. The sampling frame is strictly customers w/ the loyalty card.

The population is the what of the sample?

entire group of individuals

Sampling frame is the what of the sample?

List of all individuals from which the sample was drawn (who is eligible to be sampled).

The sample design is the what of the sample?

Method used to draw the sample

The sample of data is what?

Those actually chosen

What does it mean for a sample to be representative of the population?

Good, unbiased

What is voluntary response bias?

This happens when a group is invited to respond but only those who participate are counted in the study. The sample only includes those with a strong interest in the topic.

Ex: Statistics 201 samples students opinions on a range of issues and uses the data for a class project. Students aren't required to respond. All data from students who had strong enough reason to participate in the study are included. What bias is found here?

Voluntary Response Bias

What is Convenience Sampling?

Convenience sampling is only representative of individuals w/ high interest in the topic OR they are of convenience for the study (in the relative area, social media follower, etc.)

Ex: Study using one's Instagram followers about the person OR a table is set up in SU cafeteria about SU food options. What bias is found here?

Convenience Sampling

What is undercoverage?

Undercoverage is a when a portion of the population is not sampled or has less representation than original population.

Ex: The US census sends a survey to every household in America. What bias is found here?

Undercoverage is present because the homeless population has been left out. They might have no access to the study, no way to send back the survey, no household, etc.

What is nonresponse bias?

This happens when those that don't respond to the survey differ in opinions from those that do.

Where is nonresponse bias predominantly found?

Telephone surveys

What is response bias?

Response bias occurs when external influences affect a candidate's answer.

Wording the question to lean in one direction, people not wanting to admit flaws, people hiding personal facts, and people not wanting to admit to illegal acts are all examples of what kind of bias?

Response bias

What are the four main sampling designs and one combo?

Simple Random Sample, Systematic Sample, Stratified Sample, Cluster Sample, and Multistage Sample

What is a simple random sample?

Elements of a population are enumerated with equally likely chances of selection. Use this when there is easy access to the entire population.

Drawing straws, random # generator, etc. are examples of what type of sampling method?

SRS

What is Systematic sampling?

Every kth element sampled.

Ex: The police set up a roadblock where every 5th car is stopped and searched. What kind of sampling design is this?

Systematic

What is stratified sampling?

Population put into distinct groups known as Strata. Then a SRS is done within each strata. Completed when we know there are groups that are different from each other within the population such as gender, income, ethnicity, etc.

What is cluster sampling?

Cluster sampling is used for non-identifiable strata (similar groups). The clusters are collected at random then a census is done within each.

Ex: You have 100,000 bags of M&Ms (200/bag) = 20 million M&Ms. You want to know how many are blue. Using a sample size of 1000, what design should you use?

No need to stratify because they are all plain M&Ms. Make each bag a cluster; then SRS 5 of the bags & complete a census.

What is multistage sampling?

A design that contains several methods of sampling.

Most surveys for professional polling a combination of Stratified, cluster, & SRS methods for sampling. What method is this?

Multistage

Ex: Police officers stop every 10th car and check the driver for evidence of alcohol use. This is what sampling design?

Systematic

Ex: An academic officer enters 25 randomly chosen classrooms and surveys all students about whether their instructors dress well. This is what sampling design?

Cluster

Ex: Students on both the main campus and the agricultural campus are randomly selected and asked how easy it is to park when coming to school here at UT. What sampling method is this?

Stratified

Ex: TSA's method for determining which airline passengers are selected for additional security screening is known as what design method?

Multistage: Random, Stratified, etc.

Occasions with random phenomenon are called _______. The value of the random phenomenon are called ________. A combo of outcomes are known as an _______.

trials; outcomes; event

True or False: the collection of ALL possible outcomes is called the sample space and denoted (S).

True

If individual trials are independent, then the outcome of one trial ______ influence/change the outcome of another trial.

DOES NOT; ex: rolling die or flipping coins

The Law of Large Numbers (LLN) says that in the long-run, relative frequency of repeated independent events gets closer to a single value. What is a 'single value' in the case of probability?

the probability of the event happening

True or False: The law of averages says that if you flipped a coin four times and get heads, the fifth flip will also be heads.

False; it says it "must" be tails

Subjective/Personal Probabilities are based on what?

non-mathematical calculations

There are times when all possible outcomes are NOT equally likely. Name an example.

Winning the lottery

If outcomes are equally likely, they can use the probability (an event) equation. P(an event) = (# of outcomes in A)/(# of possible outcomes) What is the probability of flipping heads in a coin toss? What about rolling a 6 on a die?

P(head) = 1/2 = 0.5 or 50% P(6) = 1/6 = 0.1667 o 16.67%

Ex: You have one of every denomination of US paper currency. What is the probability that you select a bill with a portrait of a president? S = {$1, $2, $5, $10, $20, $50, $100} *Hint* 1, 2, 5, 20, and 50 are presidential bills.

P(President) = 5/7 = 0.714 or 71.4% chance

Ex: you flip a fair coin 4 times. There are 16 equally likely possible outcomes. What is the probability that you get at least three heads?

P(at least 3) = 5/16 = 0.3125 or 31.25%

There are a few requirements for probabilities. For example, a probability must be between what two integers?

0 and 1

For any event A, 0 < x < 1. If P(A) = 0, event A ___________. If P(A) = 1, event A _____________.

can NEVER happen; will happen with ABSOLUTE certainty

The Probability assignment rule says that S represents the set of all possible outcomes. If this is the case, what should P(S) equal?

P(S) = 1. ex: roll a fair six-sided die. the possible outcomes are S = {1,2,3,4,5,6}. Therefore, P(S) represents the probability of getting 1-6. Since these are the only options, P(S) = 1 or 100% chance.

The Complement rule says that a set of outcomes NOT in event A is the complement of A. What is the equation for this probability?

P(not in A) = 1 - P(A) P(A) = 1 - P(not in A) Therefore, P(A) + P(not in A) = 1

Ex: You have one of every denomination of US paper currency. What is the probability that you do not select a bill with a portrait of a president? S = {$1, $2, $5, $10, $20, $50, $100} *Hint* 1, 2, 5, 20, and 50 are presidential bills.

P(president) = 5/7 so P(not president) = 1 - 5/7 = 2/7 or 0.286 28.6% chance of not picking a presidential bill

An event with no outcomes in common that cannot simultaneously occur is called what?

Disjoint or mutually exclusive

In order for the addition rule of probability to work, the events must be disjoint/mutually exclusive. Ex: roll a single die. Let Event A = 5 or more; Event B = 2 or less; Event C = even # 1. Are events A & B disjoint? 2. Are events A & C disjoint?

1. Yes, because you cannot roll a 5,6, 2, or 1 all in one role. 2. No because rolling a 6 satisfies both events

For two disjoint events, A & B, the probability that one OR the other occurs is the _____ of both events' probabilities.

Sum. P(A or B) = P(A) + P(B)

For an individual trial to be independent, the result of one trial cannot ________ the outcome of another trial.

change/influence ex: coin flips are independent

The multiplication rule calculates the probability of more than one thing occurring. For example, flipping two heads in a row. To use the multiplication rule, the events must be _______ of one another.

independent

The multiplication rule states that the possibility of both events, A & B, occurring is the _______ of their probabilities.

product Ex: P(A and B) = P(A) times P(B)

Ex: The probability of an airline not losing your luggage is 0.993. What is the probability of the airline never losing your luggage on 3 flights?

P(not lost) = 0.993 P(not lost on 3 flights) = 0.993 x 0.993 x 0.993 OR (0.993)^3 = 0.979

Ex: The probability of an airline not losing your luggage is 0.993. What is the probability of the airline losing your luggage at least once on 3 flights?

P(at least one) = 1 - P(none) P(none) = (0.993)^3 = 0.979 P(at least one) = 1 - 0.979 = 0.021

Ex: An opinion polling org contacts their respondents by phone. The probability of contacting someone using this method is 0.76. Assume this method is independent. A.) The previous call didn't make contact. What's the probability the next call will? B.) Probability the interviewer successfully contacts the next two callers. C.) Probability the interviewer's first contact is the third call. D.) Probability the interviewer makes at least one contact among 5 calls.

A.) P(contact) = 0.76 & independent, so 0.76 B.) P(contact) = 0.76 P(next 2) = 0.76^2 = 0.5776 C.) P(no contact) = 1- 0.76 = 0.24 P(third success) = 1 - (0.24)(0.24)(0.76) = 0.0438 D.) P(at least one) = 1 - P(none) P(none) = 1 - 0.76 = 0.24 P(at least one) = 1 - (0.24)^5 = 1 - 0.000796 = 0.999204

Are these events independent? Mutually exclusive? Event A = stock market increases by 100 points next Tuesday Event B = Knox gets less than 0.5' rain next Tuesday

Not disjoint, but independent!

Are these events independent? Mutually exclusive? Event A = high temps on next 6/1 in Knox will be over 95 degrees. Event B = Ice-cream truck comes by your house but is out of your favorite flavor next 6/1.

Dependent & not mutually exclusive

To estimate a whole population, what situations can we look at regarding population parameters?

Situations where the population parameter is known!

Ex: It's known a lottery has 100 ping-pong balls each being white or gold. You must get 6 gold to win. You're told the last draw was [w,w,w,g,g,w]. Estimate the proportion of 100 balls that are gold.

The only reasonable guess is 2/6 or 33-34.

What theorems are fundamental in impacting our ability to statistically guess?

The Sampling Distributions and Central Limit Theorems.

Properties of Sampling Distribution for a Proportion says that Event X represents a "success" of a categorical variable. Successes are defined by the context of the question, so are they always a good thing?

No! Success is not always the favorable outcome. For example, you could be sampling cancer patients. A patient with cancer would be a yes, but this is not the favorable outcome.

The sample proportion is denoted with a p-hat. The equation is # of successes/sample size or X/N.

p hat = x/n

What is the Sampling distribution and what does it tell us about the sample proportion?

It is the distribution of ALL possible sample proportions, and it explains the shape (center & spread)

P hat is the sample proportion. The Central Limit Theorem (CLT) explains the expected value of p hat and the standard deviation of p hat. What are they?

The expected value of P hat is p, the true population proportion. This is the center. m(p hat) = p The Standard deviation of p hat is the typical different between what we measure and the true population proportion. This is the spread. SD(p hat) = the square root of (pxq)/n where q = 1 - p; p = true population proportion

True or False: the shape of the sampling distribution is approximately normal.

True! (when CLT is met)

Ex: NCHS found that out of all 3,945,102 U.S. live births in 1998, 11.6% were premature. The avg. of all p hat (sample proportions) should be close to what?

11.6%

Ex: NCHS found that out of all 3,945,102 U.S. live births in 1998, 11.6% were premature. If the population parameter of premature births is 11.6%, then the true population proportion should be p = ______ For a sampling distribution of a sample proportion of sample size (n = 100): The expected value is: The Std. Dev. is:

p = 0.116 m(p hat) = p = 0.116 q = 0.884 std. dev. = square root of (0.116 x 0.884)/100 = 0.032

True of False: Sample variability should never be accounted for.

False! It must be considered

Ex: Calculate the std. dev. of p hat when p = 0.75 and n = 50

SD(p hat) = square root of (0.75x 0.25)/50 = 0.0612 More samples = smaller Std. dev. Less samples = bigger std. dev

CLT implies sampling distributions are approximately normal meaning the distribution should be ______

A normal distribution! The more samples makes for a better approximation

There are three assumptions and conditions to check for with sampling distributions of proportions. What are they?

Randomization: the sample should be a simple random sample of the population 10% condition: the sample size, n, must be no larger than 10% of the population size Success/Failure Condition: the sample size has to be big enough so that both np and nq are at least 10.

Ex: NCHS found that out of all 3,945,102 U.S. live births in 1998, 11.6% were premature. Thus the population proportion of premature births is 0.116. Suppose we sampled 925 of these births and 124 were premature. What is p hat (sample proportion)? What is std. dev? This was random sample, 925/3,945,102 = 0.000234 which is way less than 10%, and np = 0.116(925) = 107.3 & nq = 0.884(925) = 817.7 both of which are larger than 10. All conditions passed.

p hat = x / n = 125/925 = 0.134 or 13.4% std. dev. = square root of (0.116x0.884)/925 = 0.011 The margin of error is generally taken to be +/- 2 standard deviations. In this case 0.011 or 1.1 x 2 = 2.2. If the true proportion were 0.116, 95% of samples from n=925 would have sample proportions ranging 0.094 to 0.138. z-score = (observed value - expected)/std. dev z-score = (0.134-0.116)/0.011 = 1.71 Std. devs. which is not unusually large

Ex: 52% of voters plan to vote 'yes' on budget. SRS 300 voters. What might be the % of yes voters in poll? Random? yes; srs 10%? yes np = 156 nq = 144

p = 52% p = 0.52 q = 0.48 n = 300 p = 0.52 --> center std. dev. = square root of (0.52x0.48)/300 = 0.029 --> spread

Ex: A company produces AAA batteries. It's known that 2% defective. If UT got 25,000 batteries how many might not work?

np = 25,000(0.002) = 50 nq = 25,000(0.98) = 24950 m(p hat) = p = 0.002 std. dev. = square root of (0.002x0.98)/25000 = 0.0002826

True or False: the CLT states that the distribution of ALL possible averages of a population is nearly normal.

True

There are three assumptions and conditions to check for when sampling distribution of means. What are they, and what do they look for?

Random Condition: SRS of population 10% condition: sample size, n, must be no larger than 10% of population size Large enough condition: the more unimodal and symmetric the population, the less samples you need. As a rule of thumb: n = 25 is usually large enough for most pops n = 100 may be required if population is very skewed

The normal model is a close approximation of sampling distribution of the sample mean for smaller samples when the population is close to the ______.

normal distribution

The more non-normal the distribution of population is, the _____ the sample size needs to be to have an approximately normal distribution of sample mean.

larger

True or False: the smaller n is, the closer the sample avg. is to the population mean (Law of large numbers)

false; the larger

CLT says that the sampling distribution of any mean or population is approximately _______. For proportions: clt is centered at population prop., p. where m(p hat) = p For means: clt is centered at population mean, m. where m(y bar) = M

normal

The equation for std. dev. of a sample proportion is SD(p hat) = square root of (pxq)/n. What is the equation for std. dev. of a sample mean?

SD(y bar) = sigma/square root of n where sigma = pop. std. deviation

Ex: Mr. Brady needs to build a balcony capable of supporting the weight of 100 people. (n = 100) He wonders is he makes a balcony designed to support 19,000 lbs, will it be sufficient? A survey of over 8000 adults implies that the mean weight of U.S. adults aged 20 and above is 176 with a standard deviation of 61. [m = 176; sigma = 61] First check conditions: Random: yes (100 random ppl) 10%: yes (100 is much less than 10% of US population) Large enough: yes (100 is enough)

Because the conditions are met, we can model the average weight of 100 people as a normal distribution. The true pop avg. m(y bar) = 176. Std. dev. = 61/square root of 100 = 6.1 y bar = N(176, 6.1) where 176 is the center and 6.1 is the spread 190 is over two standard deviations from the mean at 2.295, so the probability of the patio holding is 0.989 or 98.9%. There is only a 1.1% chance it doesn't hold, so it is safe for Mr. Brady to take the chance.

If the correlation coefficient is equal to 0,50, what is RSquare equal to?

0.25

If a residual is positive, which was larger? Intercept Slope Actual Predicted

Actual

Find the predicted value for a regression equation where: b0 = 10 b1 = 2 Obs. value of x = 9

For each one unit increase in X, we expect Y to increase by b1 units, on average. Residual interpretation Intercept interpretation Rsquared interpretation Slope interpretation

Slope interpretation

True or False: Adding together all the residuals from a regression plot will sum to zero, on average.

true

When a sample has characteristics which correspond to characteristics of the population, the sample is said to be ___________.

representative

True or False: Response bias is always possible in a survey that requires an answer from a human being.

True

Ex: You are heading to the LLN Casino for Spring break and plan to play their world-famous slot machine. The slot machine at the LLN Casino has three wheels that spin when you pull the lever. When pulled, each wheel spins and stops on one of the symbols. Assume that the outcome of each wheel is independent of the outcomes of the other 2 wheels. Each wheel has 10 equally likely symbols: 4 skulls, 3 lemons, 2 Cherries, and 1 bell. If you play, what is the probability that you get a lemon, a cherry, and a bell?

0.006

Suppose the probability of the bus arriving on time is 0.58. For the next 4 stops, what is the probability the bus is on time for at least one of the stops? Assume independence.

0.969

True or False: In probability, the value of each trial is called a consequence.

False; it is an outcome

True or False: All other things being equal, the larger the sample size, the smaller the standard deviation of the sampling distribution.

True

True or False: the Success/Failure condition states that your measurement must be binary, either a success or a failure.

False; states that you need at least 10 of each.

According to the Central Limit Theorem, which of the following will happen to the distribution of the sample mean as the sample size increases? The mean gets larger The distribution gets less normal The distribution gets more normal The mean gets smaller

The distribution gets more normal

The Central Limit Theorem relates to which of the following conditions? 10% condition Randomization Nearly Normal Condition

Nearly Normal Condition

The sampling distribution of the sample mean will always have the same _____ as the original distribution? Shape Standard deviation Mean

Mean

Stats Exam #2 Flashcards

(151 cards)