Epi/Biostats Flashcards
Beta
Regression Coefficient, expected (average) change in Y when X (explanatory variable) changes by one unit and the other explanatory variables stay the same
Wald Statistic
Test whether regression coefficient of a variable is zero
Beta squared over var(B)
p-value = P(chi-squared > Wald statistic)
If p is small, the variable associated with the regression coefficient is important (statistically significant)
Likelihood Ratio Test
Test to compare two models: one with q (“null”), the other with p variables with p>q (nested models)
If p < 0.05, the group of p-q variables in the extended model is important (statistically significant)
Type I error
Probability of receiving a significance result (rejecting the null) when it is not true - False positive
Type II error
Probability of failing to reject the null when it should be rejected - false negative
Sensitivity
Probability of positive test given case (A / A+C)
Specificity
Probability of a negative test given a non-case (B/B+D)
PPV
Probability that the case is actually a case given that it tested positive (A/A + B)
NPV
Negative Predictive Value probability that a non-case is true given a negative test result (D/C+D)
Residual variance in regression equation
Error term
Types of bias
- Confounding
- Selection Bias
- Information Bias
Propensity Score
Probability of a unit being assigned to a particular treatment or exposure given an observed set of covariates.
Used to reduce selection bias by equating groups on these covariates
When to use log-binomial
When risk or prevalence is >10% risk odds ratio and prevalence odds ratio will overestimate the prevalence ratio so need to use log-binomial to directly estimate the prevalence ratio or risk ratio
Risk vs odds
Risk = probability of occurrence of an event or outcome
Odds = probability of occurrence of an event or outcome / probability non-occurrence of the event or out come
P-value
Probability of obtaining results as extreme as those observed under the null hypothesis. Protects from type I error or false positives, which lead us to conclude there is an association that isn’t really there.
ICC
Intraclass correlation coefficient- the degree to which the variance of the cluster explains the variance of the whole. The between individual variance / the total variance
Vaccine effectiveness formula
(1 - adjusted OR) x 100%
Risk ratio formula
(a / (a+b)) / (c / (c+d))
Residuals
The difference between the observed outcome and the mean in each group
Kappa statistic
Determines percent of the inter-rater reliability agreement beyond what would be expected by chance
Po - Pc / 1 - Pc
> .8 is an almost perfect level of agreement beyond chance
Interaction coefficient
Measures how much an association between Y and one predictor (X1) differs across levels of another predictor (X2)
Marginal
Does not include other covariates in the model
Structural model
Model for counterfactual outcome
Wilcoxon rank sum vs t-test
WRS compares medians. T-test compares means. WRS more appropriate for data with outliers.
Conditional / random effects model
Include other covariates
Frequentist
Parameters are the truth
Bayesian
Parameters have a distribution
Covariance
Measure of joint probability of two random variables. If both variables are high at the same time covariance is positive. If one is high when the other is low, covariance is negative
The sign of covariance thus shows the tendency of the linear relationship
Correlation Coefficient
Normalized version of covariance. Shows the magnitude and thus the strength of the linear relationship
Survey Weight
Value assigned to each case to indicate how much each case will count in a statistical procedure
Problems with survey weights
Almost always increase standard errors
Design Effect
Variance from survey /
Variance estimate with SRS
AIC
Chooses the best model from a set using : -2(log-likelihood) + 2K
Variance
Average of the squared differences of observations from the sample mean
Normalize distribution
Subtract the mean from each value and divide by the sd (Z = X - u/o)
Central Limit Theorem
If you have a population with mean mu and standard deviation sigma and take a sufficiently large sample then the distribution of the means will be approximately normally distributed
CI
Probability that a population parameter will fall between a set of values for a certain proportion of times
F-statistic
Test if sample variances are equal. P<0.05 means they are not equal
Horvitz-Thompson estimator
Inverse probability weighting applied to samples to account for differences between the sample and target population
Log-linear model
Betas are the derivative of the log of expected y | derivative x -> (log(E(y|x))
Parameters are linear but the data isn’t
B*100 Measures the percentage change in y when x increases by one unit keeping other variables constant
Maximum Likelihood Estimation
MLE is a method that will find values of mean, u, and sd, o, that result in the curve that best fit the data
Bayesian Inference
The process of deducing properties about a population or probability distribution from data using Bayes theorem: P(A|B) = P(B|A)*P(A)/ P(B)
Poisson Offset
Population or person-time
ANCOVA
Analysis of Covariance - used to test for interaction or effect measure modification -> whether means of a dependent variable are equal across levels of a categorical independent variable
Generate an interaction term for the model
Test if the interaction term = 0
If the interaction term is not significant, reduce to MLR/simpler model
Correlation vs Covariance
Two terms that are opposed but related. Correlation shows how two variables are related, covariance shows how two variables differ
Attributes of Surveillance System
Simplicity
Flexibility
Acceptability
Sensitivity
Positive Predictive Value
Representativeness
Timeliness
Sources of measurement error in surveys
- The tool
- The method of data collection
- The interviewer
- The respondent