L18, L20, L22, L23- biostats lectures Flashcards
what is sampling distribution
where you take all data points (2) and pair them with themselves and each other to find a distribution of means => n^2 sample size with normal distribution curve
(mean stays the same, SD dec)
what is the formula for standard error of the mean
SEM = σ / sqrt(n)
- measures variability in the mean
- decreases as sample size (n) increases
Central Limit Theorem in terms of not normal population distribution
it yields a normal distribution- increases the sample size of the population
calculation of degrees of freedom
df = n - 1
define t-distribution
normal distribution with fatter tails (more extreme values included)
-as df increases (n - 1 inc), tails get smaller and t-values approach Z-values (n = infinity)
define random sample
randomness means everyone has an equal probability of being included
what does the Central Limit Theorem infer about a sample
allows us to infer population parameters (mean, SD) from a single sample of sufficient size
describe when it is better to use standard error OR standard deviation
SEM- how well does the sample estimate the mean
SD- how widely scattered are the measurement in the population
define the purpose of Confidence Interval
makes inferences about the true mean based on the mean and SD of the sample
define null and alternative hyopthesis
null- nothing unusual is happening, no relationship between exposure/disease
alternative- something unusual is happening, exposure and disease are related
what are the two ways to test a hypothesis
One-sided: alternative hypothesis specifies a direction (only better or only worse)
Two-sided: alternative hypothesis can go in either direction (either better or worse)
CI = (1)
P = (2)
1- CI = 1 -α
2- P = 1 - β
what is the Z value formula
Z = (X - µ) / (SD/sqrt(n))
define type I and type II errors in terms of null hypothesis
Type I (α error/FP)- null hypothesis is true, study rejects Ho
Type II (β error/FN)- Ho is false, study supports Ho
what situations are bad to have Type I errors
(false positives)
- Tx is expensive, difficult
- cost of false alarm is high
- no effective Tx
what situations are bad to have Type II errors
(false negative)
- Tx is cheap, easy
- cost of false alarm is low
- Tx is only responsive in early stages
Type I/α error can only occur if Ho is (T/F)
Type II/β error can only occur is Ho is (T/F)
1- (false positives), Ho is true
2- (false negatives), Ho is false
if one inc, the other dec
inc β in will have what effect on α, σ, n, Δ
α- dec (type I error)
σ- inc (standard deviation)
n- dec (sample size)
Δ- dec (effect size)
list the hypothesis test needed for the when the ind. and dep. variable are categorical
chi-squared test (contigency tables)
list the hypothesis test needed for the when the ind. variable is categorical and the dep. variable is continuous
t-tests (2 groups)
ANOVA (3 or more groups)
list the hypothesis test needed for the when the ind. and the dep. variable are continuous
correlation regression
what are the assumptions of a chi-squared test
- categories are exclusive
- each category has an expected value of at least 5
what is the calculation for Chi-squared test
(observed - expected)^2 / expected
what is the formula for the degrees of freedom in a chi-squared test
df = (# columns - 1) * (# rows - 1)
what is the assumption made in a t-test
- assumes sample population is normally distributed
- assumes variance of both samples are the same
define both values used from a Pearson correlation
r = strength of association / correlation between variables
r^2 = variance or the percentage that the one variable explains the other
list the r values and their associated correlation strengths
(note absolute value of r) r < 0.4 --> weak corr. 0.4 < r < 0.6 --> mod. corr. 0.6 < r < 0.8 --> strong corr. r > 0.8 --> very strong corr.
list the assumptions of Pearson Correlation
- normality of variables (continuous normal distribution)
- linearity of associations (correlation strength doesn’t change with higher v lower variables)
- oval scatterplot (not triangle)
what are some conditions that violate assumptions of Pearson Correlation
- extreme values
- multiple modes
- nonlinear / nonmonotone associations
- triangular scatterplot
describe Spearman’s correlation, what is the disadvantage
- ranking correlations regardless of size differences, used when Pearson cannot be used
(ex. extreme values, nonmonotonic data) - less statistically powerful
Spearman is the (1) correlation
Pearson is the (2) correlation
1- Safe
2- Powerful
what is a correlation regression
there is a quantifiable correlation between the 2 variables
y = mx + b
describe the linear association regression formula
y = α + βx
what are residuals in linear regressions
- used for individual data points
- difference in actual value and predicted value
- y = α + βx + e (the residual)
how do assumptions and power of a test relate
- more assumptions made –> the weaker the power (continuous stuff)
- less assumptions made –> the stronger the power (categorical stuff)
list the 4 data types in order of increasing power
(in order of inc power and dec assumptions)
ratio < interval < ordinal < nominal
what is the Odds formula in terms of probability
Odds = probability / (1 - probability)
what is the Probability formula in terms of odds
Probability = odds / (1 + odds)
describe Bayes Theorem
1) pretest probability –> odds ratio
2) PostTest Odds = PreTest Odds x LR
3) posttest odds –> probability
(only used if there is pretest knowledge that can be used)
what are the 4 steps for identifying statistical errors in literature
1) were limits of design and statistical approach acknowledges
2) was appropriate data and statistical tools applied for hypothesis tested
3) were the assumptions of the approach violated
4) were the results interpreted correctly
what are the assumptions of the residuals in a linear regression
- normally distributed
- uncorrelated with the outcome
- uncorrelated with each other