Final Exam Flashcards

Just for stats final, notecards from summaries of chapters

1
Q

what are marginal distributions in tables?

A

row totals and column totals, can be presented as percent of table total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are conditional distributions?

A

distributions of row variable for each value of column variable, and column variable for each value of row variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

4 step process for statistical problems?

A

state, plan, do, conclude
SPDC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is simpsons paradox?

A

an association between 2 variables that holds for each value of a third variable can be changed or reversed when the data for all values of 3rd variable are combined. example: helicopter rescues have more deaths even though the care is more advanced, why? They are used for more severe rescues (severity of rescue can be 3rd variable here)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

know what a dotplot, stemplot, and histogram are.

A

Show the distribution of a quantitative variable. A dotplot shows values on a number line. Stemplots separate each observation into a stem and a one digit leaf. Histograms plot the counts or percents of values in equal width classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

other words for counts and percents?

A

frequency and relative frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to describe patterns of distributions?

A

SOCS:
s: shape
o:outliers
c: center
s:spread

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

simple shapes for distributions?

A

symmetric or skewed… # of modes can also be used to describe shape (unimodal, bimodal, multimodal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Mean vs median?

A

mean is the average of the observations, median is midpoint of listed values in numerical order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Five Number summary?

A

Median: middle of values
quartile 1: split data values into 4 sections, this is the second
quartile 3: this is the second section, median divides the two
Maximum: highest value
minimum: lowest value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Interquartile range?

A

Q3 - Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

how to use IQR to find outliers?

A

it is an outlier if:

-smaller than Q1-(1.5IQR)
-larger than Q3+(1.5
IQR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Box plots

A

draw lines at Q1, median, and Q3 and make a divided box with them. Whiskers go to min and max values. Outliers are separate plot points to prevent skew of shape.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are variance and standard deviation?

A

common measures of spread about the mean as its center.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Resistant vs nonresistant measures

A

resistant: not largely affected by extreme observations
example: median, IQR

nonresistant: affected by extreme observations
example: mean, standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

transforming data? how do addition/subtraction compare to multiplication/division in how they affect measures of data?

A

adding a constant to all the values in a data set, measures of center and location increase by a. measures of spread unaffected

multiplying all values in a data set by a constant measures of center and location are multiplied by b, but also measures of spread.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

density curve

A

total area 1 underneath, an area under it gives proportion of data in that region.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

how to locate mean and median on a density curve?

A

mean is balance point, median is where area under it is .5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

normal distributions

A

bell shaped, symmetric density curves.
mean: μ
standard dev: σ
mean is the center of the curve, and stddev can be used to divide graphs into sections with predictable area.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

percent rule for normal distributions

A

-68% of values lie within one stddev
-95% of values lie within two
-99.7 lie within three

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

z score equation? for normal distributions

A

z = (x-μ)/σ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a scatterplot?

A

displays relationship between TWO quantitative variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

explanatory and response variables on scatterplot

A

if we think that a variable x may help explain, predict, or even cause variable y we call x explanatory variable and y a response variable. Always plot explanatory on x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

how to explain a scatterplot? hint: DOFS

A

D: direction
O: outliers
F: form
S: strength

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How to explain direction? of a scatterplot

A

positive association: high values occur together, positive slop on LOBF
negative association: low values occur when the other variable is high, negative slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

how to explain form of a scatterplot?

A

linear relationships, points show a straight line pattern
Curved and clustered are also good ways to describe form!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

how to explain strength of a scatterplot?

A

determined by how close the points in the scatterplot lie to a simple form such as a line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

what does correlation r value measure with two variables?

A

the strength and direction of linear association between two quantitative variables x and y. r only measures straight line. is between -1 and 1. indicates strength by how close it is to -1 or 1 (-1 for neg ass, 1 for pos ass). CORRELATION IS NOT RESISTANT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

what is a regression line?

A

straight line that describes how a response variable y changes as explanatory variable x changes. You can use it to PREDICT value of y for value of x.

30
Q

what is the form of a regression line?

A

y=a+bx

31
Q

what is the least squares regression line? how does it work?

A

a straight line y = a+bx that minimizes sum of squares of vertical distances of observed points from line.

32
Q

what is extrapolation? should you avoid it?

A

use of a regression line for prediction using values outside range of data from which the line was calculated. YES AVOID

33
Q

what are residuals on a scatterplot?

A

differences between observed point and predicted values of y.

34
Q

what does the standard deviation of residuals (s) measure?

A

average size of the prediction errors when using regression line

35
Q

what is the coefficient of determination?

A

r^2. fraction of variation in one variable that is accounted for by least squares regression on other variable.
Example:
(r^2*100)% of y’s variation can be explained by least square regression of x!

36
Q

important stuff about correlation and regression:

A

always interpret with caution. look for outliers that could affect regression line. do not conclude cause and effect between two variables JUST because of a strong correlation.

37
Q

What is a sample survey?

A

selects a sample from the population of all individuals about which we ant info from.

38
Q

what is random sampling?

A

uses chance to select a sample

39
Q

what is a simple random sample (SRS)?

A

gives every possible sample of a given size the same chance to be chosen (do not mix with individuals). Choose an SRS by labeling members with numbers and use random digits to select the sample.

40
Q

what is a stratified random sample?

A

divide population into strata, groups of individuals that are similar in some way that might affect their responses. Choose a separate SRS from each strata.

41
Q

what is a cluster sample?

A

divide population into groups or clusters. randomly select some of these clusters. All individuals in the chosen clusters are included in the sample.

42
Q

when to use Simple random, stratified random, or clustered samples?

A

Use a Simple Random Sample (SRS) when you want every member of the population to have an equal chance of being selected, while a stratified sample is best when you want to ensure representation from different subgroups within the population, and a cluster sample is ideal when you need to study large, geographically dispersed populations by randomly selecting groups (clusters) to sample from

43
Q

what is bias in sampling? two examples?

A

systematic errors in the way the sample represents the population.
voluntary response samples: respondents choose themselves, can cause bias
convenience samples: individuals are close by and included in sample, prone to large bias.

44
Q

What is sampling error? two types?

A

errors that come from the act of choosing your sample
random sampling error: sampling is not truly random
under coverage: some members of population are left out of sampling frame, the list from which the sample is chose.

45
Q

what are two nonsampling errors?

A

nonsampling errors. have nothing to do with choosing sample.

this happens with nonresponse, when people cant be contact or choose not to answer. Incorrect answers can lead to response bias.

also happens with wording of questions, can influence answers.

46
Q

What is an observational study?

A

gathers data on individuals as they are

47
Q

what is an experiment?

A

actively do something to measure a response.

48
Q

what are confounded variables?

A

when effects on a response can’t be distinguished from each other. observational studies and uncontrolled experiments often fail to show changes in an explanatory variable actually causes changes in a response variable because explanatory variable is confounded with lurking variables.

49
Q

what are treatments?

A

a combination of values of the explanatory variables.

50
Q

what are experimental units?

A

the smallest unit a treatment of an experiment is applied to.

51
Q

what is control, random assignment of treatments, and replication in experiment?

A

control prevents lurking variables that are confounded with explanatory variable. random assignment of treatments is just randomly assigning treatments to an experimental unit. replication is doing it over and over and getting consistent results.

52
Q

double blind and single blind treatments?

A

DB: when neither party knows who has what treatment in an experiment
SB: when on party knows who has the treatment.

53
Q

what is blocking in an experiment?

A

individuals that are similar in some way important to experiment

54
Q

what does making an inference about a population require?

A

the individuals taking part in the study be randomly selected from this large population. Doing this allows inference for cause and effect.

55
Q

law of large numbers in probability?

A

the proportion of times that a particular outcome occurs in many repetitions will approach a single number.

56
Q

what is a simulation?

A

imitation of chance behavior. follows 4 step process SPDC

57
Q

what is a probability model?

A

describes chance behavior by listing possible outcomes in the sample space S and giving the probability of each outcome.

58
Q

what is an event?

A

a subset of possible outcomes.

59
Q

complement rule?

A

P(Ac) = 1-P(A)

60
Q

mutually exclusive events?

A

events A and B are mutually exclusive if they have no outcomes in common.

61
Q

addition rule for mutually exclusive events.

A

P(A or B) = P(A) + P(B)

62
Q

what does P(A U B) mean? P(A ∩ B?)

A

P(A or B), P(A and B)

63
Q

general addition rule can be used to find P(A U B) “P(A or B)”

A

P(A U B) = P(A) + P(B) - P(A ∩ B)

64
Q

what is conditional probability? Notation?

A

if one event has happened, the chance another will happen is a conditional prob. Notation P(B|A) represents prob of B given A has happened

65
Q

what are independent events?

A

the chance that event B occurs is not affected by whether or not A has occurred.
P(B|A) = P(B) and P(A|B) = P(A)
if events are mutually exclusive, they cannot be independent.

66
Q

general multiplication rule for probability (for independent events too)

A

P(A ∩ B) = P(A)(B|A)
for independent:
P(A ∩ B) = P(A)
P(B)

67
Q

conditional probability formula?

A

divide both sides of general multiplication rule by P(A) and we get

P(B|A) = P(A ∩ B) / P(A)

68
Q

what is a binomial setting? acronym? binomial random variable?

A

consists of n independent trials with the same chance process, each resulting in a success or failure, prob of success = p. The count X of successes is a binomial random variable. Its probability distribution is a binomial distribution.
BINS
B: Binomial (2 outcomes)
I: independent trials
N: trials fixed in advanced
S: success? (sample value of p for all trials)

69
Q

binomial probability of observing K successes in n trials?

A

P(X=k) = (nk)pk(1-p)n-k

70
Q

mean and stddev of binomial random sample?

A

μx= np
σx= sqrt(np(1-p))