AP exam flashcards
interpret standard deviations
- standard deviation accounts for variability from the mean*
height of students typically varied by about 3.2 inches from the mean height of 64 inches
scope of inference cause and effect
cause and effect conclusions can only be drawn if subjects were randomly assigned treatments and we find a statically significant difference
a difference is statistically significant if it is larger than what would be expected to happen by chance alone
generalizing to a larger population
we can generalize and a study to a larger population if we randomly select from that population.
however, sampling variably can affect estimates because if we conduct different samples of the same size from the same population we will produce different estimates
replication and control
2 out of 4 factors for a good experiment
replication - giving each treatment to enough subjects or units so that any difference in the effect of treatments can be distinguished from chance differences
control - keeping other variables the same for all groups especially variables that are likely to cause confounding(control helps reduce variability in the response variable)
experimental units, factors and levels, treatments
experimental units - objects for which the treatment is randomly assigned. when the unit is a person, they are often called “subjects”
factor - an explanatory variable that is manipulated and may cause a change in the response variable
level - different values of a factor
all combinations of levels are treatments
control groups and blinding
other 2 factors that contribute to a good experiment
control group - provide a baseline for comparing the effects of other treatments. A control group is often given an inactive treatment(placebo), active treatment, or no treatment
blind - when the subject doesn’t know which treatment they are receiving. the people recording or measuring the response variable don’t know they are blind. when both groups don’t know it is called “double-blind”
blocking and matched pairs design
before random assignment divide the experimental units into groups that would respond similarly. then randomly assign treatments within blocks.
a matched pairs design uses blocks of size 2 or gives both treatments to each subject in random order
random assignment and completely randomized designs
random assignment - create groups of experimental units that are roughly equivalent at the beginning of the experiment
if treatments are assigned to experimental units completely at random(no blocking), the result is a completely randomized design
simple random sample
of size n is chosen so that every group of n individuals in the population has an equal chance to be selected as the sample
bias
a statistical study shows bias if it is very likely to underestimate or overestimate the value you want to know
samples that can result in bias - convenience, voluntary, under coverage, non-response, and response bias
using a random table to select a sample
label all members of the population with the same number of digits
randomize and read the digits from left to right skipping any repeated numbers or numbers not in the interval or numbers
selects the individuals whose labels you find
choosing a model
choose the model whose residual plot has the most random scatter
if there is more than one model with a random scattered residual plot, choose the model with the largest coefficient of determinations, r2
population, census, sample
the population in a statistical study is the entire group of individuals we want information about
census collects information from every single person within the population
a sample is a subset of individuals from the population from which we collect data
experimental vs observational study
experimental study - researchers impose treatment(s) upon the experimental units. well designed experiments allow for cause-and-effect conclusions to be made
observational study - does not influence variables and the results cannot conclude cause and effect
what is a chi square distribution
a chi square distribution is defined by a density curve that takes only nonnegative values and is skewed to the right
as df increases the chi square distributions become more variable, less skewed and centered at a larger value (mean = df)
the chi square test statistic measures how different the observed counts are from the expected counts
inference for regression
Liner - association between variables is linear
Independent - observations, 10% condition if sampling without replacement
Normal - responses vary normally around the regression line for all x-values (or n > 30)
Equal SD - around the regression line for all x-values
Random - data from a random sample or randomized experiment
outlier rule
outliers > Q3 + 1.5(IQR)
outliers < Q1 - 1.5(IQR)
what is a resistant measure
a reassure measure is not affected by outliers
resistant measures: median, IQR, Q1, Q3
non resistant: mean, SF, range correlation, equation of LSRL
Interpret a Z-score
“Jessica;s test score was 2.3 standard deviations below the mean”
z = -2.3
z - score formula
z = value - mean/standard deviation
interpret standard deviation of residuals s
s measures the size of the typical residual
“The cost of a car typically varies by about $2375 from the price predicted by the LSRL with x = years”
residual formula
actual - predicted
interpreting a residual plot
- if there is no leftover curvature the model used to make the plot is appropriate
- if there is leftover curvature the model used to make the plot is not appropriate