STAT 200 Final Flashcards
Population parameter
Fixed value that we usually don’t know
Sample statistic
Known value that we achieve by taking a random sample from population
Simple random sample
Sample from population is chosen randomly
A good experiment uses
simple random sample and random assignment
Random assignment
Assigning to groups (control or treatment) randomly
If no random assignment, then
NO causation
If sample was not randomly selected, then
NEVER generalize to population
Observational study
Survey
Randomized experiment
Change the variables
Matched pairs study
Match 2 samples in group or compare as group
More likely, tend to
association
Lead to, affect, cause
Causation
Proportion or risk
Topic on interest / Total # in category
Odds (ex. 45 to 20)
with outcome/ # without outcome
Relative risk (ex. you are 5.2 times more likely to get lung cancer if you smoke compared to a non-smoker)
risk1/risk2 (proportion 1/proportion 2)
5 # summary
Min, Q1, Median, Q3, Max
IQR
Q3-Q1
If median is closer to Q1
right-skewed
If median is closer to Q3,
left-skewed
Mean is greater than median
Right skewed
Mean is less than median
Left skewed
Large standard deviation
More variability and wide graph
Straight graph
Smaller std dev
Less variability and narrow graph
Correlation does NOT imply
causation
Sampling distribution center
Population parameter
As sample size increases, SE
decreases
Confidence interval
CI statistic +- (multiplier)(SE)
Bootstrap center
Sample statistic (ORIGINAL)
Type I error
false positive
Type II error
false negative
Proportions use which distribution?
Z
Means use which distribution?
t
Central limit theorem for Z distribution (Proportion)
np and n(1-p) >10
Central limit theorem for t distribution (mean)
n>30 or looks symmetrical
Z multiplier is NOT the
Z statistic
Paired t test
Same group of individuals (2 means for each case)
Difference in means
Not same group (or 2 sample t-test)
Linear regression hypotheses
B1=0
B1 does not = 0
If b1 = 0,
no linear correlation
If b1 does not equal, >, < 0
Pos, neg, or any relationship at all
R2 tells us
how close data point is to line of best fit
R2 interpretation
Approximately % of the variability in ___ can be explained by the predictor _____
Chi-Square Test for association
Expected counts must be greater than or equal to 5
Randomization center
null value
Larger sample size,
Smaller SE, Narrower CI
Smaller sample size,
Larger SE, Wider CI
Random assignment CANNOT
generalize to the population
Explanatory variable on
x axis
Response variable on
y axis
A smaller standard deviation’s graph will
look more symmetrical
1 man in 7 will be diagnosed with cancer
1 in 39 will die of prostate cancer
Individual risk
Residual is positive when
point is above slope
Residual is negative when
point is below slope
If the proportion of undergrad students who vape is 1/8, what are the odds that a student vapes?
1/7
Larger standard deviation,
larger range
z score
number of standard deviations from the mean
You cannot use linear regression when
the data is not linear
Use which distribution to construct a confidence interval?
bootstrap
Construct the 92% CI for the correlation. What percentiles will we use as cutoffs?
4th and 96th
Increasing the sample size, changes the p value and t value how?
p value decrease
t value larger