Prob/Stats Final Exam Review Flashcards

1
Q

What is the difference between quantitative and categorical data?

A

Quantitative: data measure the quantity with units. Categorical: data describe a category that a case falls into.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Is “zip code” a quantitative or categorical variable?

A

Categorical variable. Zip codes describe a category (city), and they do not measure a quantity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Is “distance from home” a quantitative or categorical variable?

A

Quantitative variable. “Distance from home” measures a quantity with units of length.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Is “social security number” a quantitative or categorical variable?

A

Categorical variable. Social security numbers do not measure a quantity. The categorize people by both state and individual identity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What types of graphs can represent quantitative data?

A

Histograms, dot plots, stemplots.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What types of graphs can represent categorical data?

A

Bar graphs, pie charts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Imagine a table with two variables: Gender and Handedness. Describe what marginal distributions would be.

A

Marginal distributions come from numbers in the…drum roll please…MARGINS! In other words, the numbers in the “totals” row or column. These usually are reported as PERCENTS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Imagine a table with two variables: Gender and Handedness. Describe what conditional distributions would be.

A

Conditional distributions come from a row or a column “inside” the table (NOT from the margins or totals). For example: the handedness of males would be a conditional distribution. Or the gender of left-handers would be another example. And use PERCENTS!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between a population and a sample?

A

A population is the “larger” group about which we hope to learn. A sample is a subset of the population that is easier to obtain and analyze.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are some features of a histogram?

A

It has a continuous and labeled number line on the horizontal axis. Then the data is grouped so that the heights of equally wide bars represent the number of data points in each interval. The vertical axis can represent counts OR percents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

If you are asked to “describe this distribution,” what should you always include?

A

Always comment about the shape, center, spread and outliers. AND be sure to include CONTEXT (words that show what the data is representing). “SOCS + context” can help you remember.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the “5-Number Summary?”

A

The Five Number Summary is the minimum, quartile 1, median, quartile 3, and maximum.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe the features of a box plot.

A

The box in the middle represents the “middle 50%” of the data. The width of the box is the IQR. If there is a segment inside the box, this is the MEDIAN. Each “whisker” represents another 25% of the data. Each of the five intersection points represents the “5 Number Summary.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does a box plot NOT reveal?

A

Sample size, mean, shape (at least not completely)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

If the mean is higher than the median in a data set, then which way is the data likely skewed?

A

If the mean is higher than the median, typically the data is skewed right. Higher numbers (especially high outliers) will tend to “pull” the mean up. Medias are more resistant to outliers and skewness.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does a z-score tell us about a data point?

A

A z-score tells us how many standard deviations it is from the mean. Positive z-scores are above the mean, negative z-scores are below the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How to you calculate a z-score?

A

z-score = (x – mean) ÷ (standard deviation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does the standard deviation of a data set describe?

A

The standard deviation of a data set is the “typical” (“average”) variation from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Assume the mean height of students is 170 cm and the standard deviation is 5 cm. If all heights were converted to inches, which statistics would change?

A

To change to inches, we would need to DIVIDE by 2.54. Therefore ALL statistics would be divided by 2.54, including the mean and standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Assume the mean height of students is 170 cm and the standard deviation is 5 cm. If all heights were decreased by 3 cm, which statistics would change?

A

When 3cm is subtracted from data, ONLY MEASURES OF POSITION will change by 3cm (mean, median, minimum, Q3, etc.). Measures of spread will NOT change (standard deviation, range, IQR, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

If data were graphed on a scatterplot showing outdoor temperature vs. heating costs, which variable would be the explanatory variable?

A

Temperature would be the explanatory variable since temperature is “explaining” (or perhaps causing) the amount of heating costs. Heating costs are “responding” to the temperatures, so it is the response variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Would it be appropriate to use a scatterplot to graph gender vs. height?

A

No. Scatterplots are only for quantitative variables, and gender is a categorical variable. Parallel dot plots or or parallel box plots would be better.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

When describing a scatterplot between two variables, what should you always include?

A

Strength (weak, moderate, strong)
Direction (positive, negative, none)
Form (linear or not; outliers or not)
Context (words that describe the data story)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

If you saw a graph of students’ heights vs. students’ arm spans, what would be a good description of the scatterplot?

A

“There is a strong, positive, linear association between heights and arm spans of students.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

“If two variables have a strong association, then they have a strong correlation.” True?

A

NO–False! Correlation (r) only measures the strength and direction of LINEAR data! If data is curved, it can have a strong association, but strong or weak correlation. Correlation should NEVER be used to describe curved data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

If there is a strong correlation between two variables, then the explanatory variable is likely causing the reactions in the response variable. True?

A

NO–False! For example: there can be a strong correlation (r) between the number of people eating ice cream and the number of drownings, but that does not necessarily mean that eating ice cream causes drownings. The outside temperature might be a third (lurking) variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the name of the line that we sometimes create to fit onto data?

A

The Least-Squares Regression Line (LSRL). It’s called LinReg(a+bx) on graphing calculators, and sometimes referred to as the “regression line” or “best-fit line” in other textbooks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

If two variables show a high correlation, then the data must be linear. True?

A

NO–False! If two variables showed a very slight curvature, then if someone calculated the correlation (r) value, it might still be pretty high even though the data is clearly curved.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q
Which correlation (r) value shows the strongest linear association?
–.08, –.29, –.88, .38, .82
A

–.88 is stronger than .82 even though it’s negative. The strongest correlations are the ones closest to 1 or –1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Where do you go on your graphing calculator to enter data?

A

STAT–EDIT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Where do you go on your graphing calculator to create the LSRL?

A

STAT–CALC–8: LinReg(a+bx)

Then enter L1, L2, Y1

32
Q

How do you calculate a residual after you find the least squares line?

A

residual = (observed value) – (predicted value)

the predicted value is found by using the least squares equation

33
Q

When you create a LSRL and see it on the scatterplot, some dots will be above the line and some will be below the line. The dots above the line have a positive residual. True?

A

True! Dots about the line have a positive residual (they are higher than predicted) and dots below the line have a negative residual (they are lower than predicted).

34
Q

The least squares regression line (LSRL), for Heating Cost vs. Temperature, might look like this:

A

Predicted Heating Cost = $14.5 + $0.32(Temperature)

the word “predicted” can be replaced by putting a “hat” over “heating cost”

35
Q

If a regression equation is Predicted Heating Cost = $14.5 + $0.32(temperature), interpret the slope.

A

For every additional degree of temperature, this model predicts an additional heating cost of ≈ $0.32.

36
Q

Name this sampling technique: Take a random sample of 40 students from an alphabetized list.

A

This is a Simple Random Sample (SRS). EVERY combination of 40 students has the same chance of being selected.

37
Q

Name this sampling technique: Take a random sample of 40 students from each of these groups: Seniors, Juniors, Sophomores, Freshmen.

A

This is a Stratified Sample. Student groups were created FIRST, by some characteristic they all shared (grade in school).

38
Q

Name this sampling technique: Take a random sample of 5 AL classes. Give all students in these ALs a survey.

A

This is a Cluster Sample. Each AL contains a diverse group of students (different genders, GPA’s, neighborhoods, socioeconomic status, grades, courses taking).

39
Q

Name this sampling technique: Ask your 10 closest friends what they think about some issue.

A

This is a convenience sample. This is NOT a good representative sample, and therefore a poor way to get accurate data.

40
Q

Name this sampling technique: Select a random number, n, between 10 and 20. Then ask every nth student who comes into school a survey question.

A

This is a Systematic Sample. This is a fairly easy way to obtain a random sample from a large group of people.

41
Q

We want to know what seniors think about their experiences at NHS. We obtain a SRS of 50 seniors on the day before final exams. What is the sampling frame?

A

The sampling frame is all seniors who are at school on the day before finals–the seniors that COULD be selected for the survey.

42
Q

We want to know what seniors think about their experiences at NHS. We obtain a SRS of 50 seniors on the day before final exams. What is the sample?

A

The 50 seniors selected to take the survey.

43
Q

We want to know what seniors think about their experiences at NHS. We obtain a SRS of 50 seniors on the day before final exams. What is the population?

A

All NHS seniors.

44
Q

If a survey is written with words like “never,” “prohibit,” “ban,” “restrict,” “Trump,” etc., it will likely have what type of bias?

A

Response bias. This can result from ANYTHING in a survey that might have an affect on the truthfulness of a person’s answers.

45
Q

A randomly selected person chooses to NOT take a survey. What type of bias is this?

A

Nonresponse bias. This occurs when a person has been selected for a survey (i.e. they are in the sample), but they choose to not respond.

46
Q

Students are asked during lunch to stop by a table and take a survey. What bias will this survey likely have?

A

This survey will suffer from voluntary response bias. Only students who care about this issue (or who like to take surveys) will be represented.

47
Q

Cheerleaders in their uniforms conduct a survey asking whether school spirit is increased by having cheerleaders at games. This survey will likely have what type of bias?

A

Response bias. This can result from ANYTHING in a survey that might have an affect on the truthfulness of a person’s answers. In this case, their uniforms might affect answers.

48
Q

Fictional candidates Lakisha Washington, Greg Baker and Santiago Hernandez are “running for office.” White voters tended to “prefer” Greg Baker when asked about the election. What is this bias called?

A

Response bias, which results from ANYTHING in a survey that affects a person’s truthfulness. Even if voters do not know a candidate, they can be influenced by the candidates’ names.

49
Q

People were asked if they eat at least one apple a day or not. They also told the number of cavities they currently have. Is this an observational study or an experiment?

A

Observational study. The researchers were simply documenting subjects’ habits to see if there is a relationship between eating apples and getting cavities.

50
Q

50 people were randomly assigned into two groups: one eats an apple a day, the other does not. After 10 years, the number of new cavities is recorded. Experiment or observational study?

A

Experiment. Treatments were assigned to subjects by the experimenters.

51
Q

Two waterproofing products for shoes are being tested. Fifty volunteers will wear a pair shoes for three months where each shoe of the pair is randomly assigned one of the two treatments. This design is called:

A

A matched pairs design. Each shoe of a pair is given one of two treatments. Then the shoes are compared after three months.

52
Q

If the subjects AND the person measuring the response variable both do not know what treatment the subjects received, the experiment is called:

A

double-blind. (If only the subjects do not know, then it is called single-blind or just “blind.”)

53
Q

A treatment (or pill) that actually has no active ingredient is called a:

A

placebo. (Even a surgery can be a placebo–they cut someone open and sew them back up, but do not actually do the procedure.)

54
Q

What is the placebo effect?

A

A real response to a placebo. Typically, a person actually feels relief or improvement from a placebo even though it has no “active” or “real ingredient.”

55
Q

The Law of Large Numbers, as applied to a coin flip and getting “heads.”

A

The more times you flip a coin, the closer it’s frequency of “heads” is to its true frequency (50% in a fair coin).

56
Q

The Law of Large Numbers, as applied to the car accident rate of Noblesville teenagers.

A

The more time that passes and the more data that is collected, the closer the sample accident rate becomes to the true accident rate of Noblesville teens.

57
Q

If a roulette wheel comes up “black” six times in a row, does the Law of Large Numbers help a smart gambler predict the next spin?

A

No. The probability of “black” on the next spin is still 18/38. Only in the LONG RUN does the percent black “correct itself.” There is no guaranteed short run correction (which people sometimes call the fictitious “Law of Averages.”)

58
Q

When rolling three dice and calculating the sum, what is the sample space?

A

ALL 216 possible outcomes (rolls). Sometimes it helps to think of the dice in three different colors to keep track of all the outcomes.

59
Q

How would you calculate the probability of a thumbtack landing “pin up?”

A

Drop it LOTS AND LOTS of times and calculate the percent of time it lands “pin up.” Its probability is the LONG RUN FREQUENCY of landing “pin up.”

60
Q

What are the boundaries of any probability?

A

All probabilities must be between 0 and 1.

61
Q

What is the sum of the probabilities of all the possible outcomes of any event?

A

1 (or 100%)

62
Q

Describe a standard deck of cards.

A

52 cards; 26 red and 26 black; four suits: spades, clubs, hearts, diamonds; 13 cards in each suit (Ace through King); three face cards in each suit: Jack, Queen, King; two black suits (clubs, spades) and two red suits (hearts, diamonds)

63
Q

What are disjoint (mutually exclusive) events?

A

Events that have no outcomes in common. (Example: “NHS senior” vs. “NHS freshman”)

64
Q

What are independent events?

A

Events the do not effect each other’s probabilities. (Example: “card is a heart” vs. “card is King”–25% of cards are hearts and 25% of Kings are hearts)

65
Q

Are “picking a seven” and “picking a black card” independent, disjoint or neither?

A

Independent: 50% of cards are black and 50% of sevens are black. A card can be BOTH black and a seven, so these events are NOT disjoint.

66
Q

Are “picking a red card” and “picking a heart” independent?

A

NO! 50% of cards are red, but 100% of hearts are red. So knowing that a card is a heart DOES affect the probability that the card is red.

67
Q

Generally, what do “and” and “or” suggest?

A

“AND” generally means you should multiply probabilities and “OR” generally means you should add probabilities. But be sure to follow the GENERAL rules for each type.

68
Q

When you calculate the “expected value” from a probability table, what does it MEAN?

A

(Ha Ha–that was a joke!) The expected value is the LONG-RUN AVERAGE (or mean…get it?) of the situation. So, for instance, if the table represented a casino game, then E(X) could represent the long-run average winnings of the game.

69
Q

When you calculate the standard deviation from a probability table, what does this tell you?

A

The standard deviation measures the typical variation in the outcomes from trial to trial. So in a casino game, it might measure the typical variation in winnings you could expect from game to game.

70
Q

What four conditions must be met for a probability situation to be BINOMIAL?

A

B–Bi–two outcomes possible per trial
I–Independent trials
N–known or fixed Number of trials
S–Same probability every time

71
Q

What is the probability of a 73% free throw shooter hitting exactly 4 of her next 6 shots?

A

31%. Two ways to calculate this:
binompdf(6, 0.73, 4)
6C4 • (0.73)^4 • (0.27)^2

72
Q

How do you create a Venn diagram?

A

Draw two intersecting circles surrounded by a rectangle. Frequently, it is advantageous to fill in the “both” area first. Try to fill in all four areas if you have enough info.

73
Q

Venn problem: 75% of houses have wi-fi and 95% of houses have a smoke detector. If 4% of houses with wi-fi do not have smoke detectors, what percent of houses have neither wi-fi nor a smoke detector?

A

2%. The second sentence says “4% of houses with wifi…”. In other words, 4% of the 75% do NOT have smoke detectors. That would be 3%. So 3% goes in the part of the wi-fi circle that is OUTSIDE the smoke detector circle. You can get the rest from there…

74
Q

Venn problem: Of all NHS students, 23% are seniors, 22% are 18 years old, and 76% are neither. What percent of seniors are 18?

A

91.3%. (Hint: if 76% are neither, this number goes outside the two circles, leaving 24% for the rest of the diagram, which is the two interlocking circles…)

75
Q

In a standard deck of cards, P(ace | red) =

A

2/26 (out of the 26 red cards, 2 are aces)

76
Q

In a standard deck of cards, P(red | ace) =

A

2/4 (out of the four aces, two are red)