Bias and Confounding Flashcards
Overarching categories of bias
Information bias
Selection bias
Selection bias is especially common in ___.
Selection bias is especially common in case-control studies.
Social desirability bias
People tend to systematically overreport things that make them look good, and underreport or underestimate things that make them look bad.
Rule of thumb for confounding
If the effect estimate changes by at least 10% when accounting for the potential confounding variable, it can be assumed that the variable is indeed confounding.
Ways to account for confounding in study design
- Restriction (restrict to only one stratum, eliminating the confounding variable entirely)
- Matching (design a paired study and do paired t tests)
- Randomization of exposure
Effect modification
There is a different level of relationship between the exposure and outcome due to the presence of the effect modifier.
Can something be both an effect modifier and confounder?
Yes!
In this case, the stratum specific OR or RR are different from one another, AND different from the OR and RR overall, in the same direction.
Simple vs complex regression
Simple = 1 independent variable
Complex = 2 or more independent variables
Logistic regression
Used for binary dependent variables. Essentially, you convert the raw data into a percentage likelihood of binary variable x given an independent variable.
The correlation coefficient
r
Ranges from -1 to 1. Absolute value determines strength of the relationship, sign determines direction.
Interpretation of thresholds of r magnitude
r > | 0.6 | implies a strong correlation
r > | 0.8 | implies a very strong correlation
Format of an equation derived from linear regression
y = β0 + β<span>1</span> x + e
β0 = intercept
β1 = slope
e = error term / residuals
“Goodness of fit” measure
r2
When testing whether or not a relationship determined by linear regression is statistically significant, the null hypothesis is. . .
. . . that the predicted value of y should be the average value of y for all sample datapoints regardless of the value of x.
A simple linear regression model for a binary independent variable is effectively the same as . . .
. . . a two sample t test.
Nondifferential bias
The frequency of errors is approximately the same in the groups being compared.
In general, nondifferential misclassification tends to result in estimates of effect that are closer to “null” than the true effect.
hazard ratio
Expression of relative risk which quantifies the probability of an event (e.g. dying) during a particular time interval, given that a subject has survived until that time
Multivariate linear regression
y = intercept + b1 x1 + b2x2 + residual error
Multivariate logistic regression
ln(p/1-p) = intercept + b1 x1 + b2x2 + residual error
where p/1-p is the odds ratio of condition y.
Multivariable models are useful for identifying . . .
Multivariable models are useful for identifying both confounding and effect modification
A correlation coefficient and an effect estimate from a simple linear model (i.e., beta) can both give information about . . .
A correlation coefficient and an effect estimate from a simple linear model (i.e., beta) can both give information about the strength and direction of a relationship between two continuous variables.
Preterm birth
When a baby is born before 37 completed weeks of gestation
1/10 infants in the US. Racial and ethnic differences are substantial.
A cofounder should meet these three criteria
- It is associated with the exposure under study
- It is a cause or correlate of the outcome under study, independent of the exposure
- It is not a natural intermediate step between an exposure and outcome, nor is it naturally upstream of the exposure or downstream of the outcome.
If stratum-specific RRs/ORs are equal to each other AND equal to the crude RR/OR, . . .
If stratum-specific RRs/ORs are equal to each other AND equal to the crude RR/OR, then the suspect variable is neither a confounder nor an effect modifier
f stratum-specific RRs/ORs are equal to each other but are different from crude RR/OR, . . .
If stratum-specific RRs/ORs are equal to each other but are different from crude RR/OR, then the third variable is a confounder.
If stratum-specific RRs/ORs are different from each other, . . .
If stratum-specific RRs/ORs are different from each other, then effect modification is present
Methods of correcting for confounding
- In the design stage:
- Stratification
- Restriction
- Randomization
- In the analysis stage:
- Stratification
- Statistical adjustment
The intent of a model can be primarily ___ or ___
The intent of a model can be primarily explanatory or predictive
As in all statistical tests, we are making ___.
As in all statistical tests, we are making inferences from a sample
Generally, when designing a multivariable model, we want to choose variables that. . .
- we know from other research to be important
- add to the ability of the model to explain or predict the outcome
- whose inclusion changes the parameter estimates of the main predictor(s) of interest substantially (a common rule of thumb is more than 10%), since this suggests that the additional variable is a confounder of the exposure-outcome relationship.
Effect modification won’t be apparent from a regression model unless . . .
Effect modification won’t be apparent from a regression model unless you look for it.
The simplest way is to stratify data and re-analyze.
With logistic regression and proportional hazards regression, the coefficients have a special meaning:
The antilogarithm of the coefficient equals the odds ratio (for logistic regression) and the relative hazard (for proportional hazards regression).
The underlying assumption of multiple linear regression is that . . .
as the independent variables increase (or decrease), the mean value of the outcome increases (or decreases) in a linear fashion.
The underlying assumption of multivariable logistic regression is that. . .
each one-unit increase in a predictor multiplies the odds of the outcome by a certain factor (the odds ratio of the predictor) and that the effect of several variables is the multiplicative product of their individual effects.
The underlying assumption of proportional hazards models is that. . .
the ratio of the hazard functions for persons with and without a given risk factor is the same over the entire study period
This one has a special name, the proportionality assumption
If the hazard of death were higher with surgery at the beginning of the study (as is often the case with surgical interventions because of perioperative mortality) but lower with surgery later in the study (because persons who survived after surgery had a longer life expectancy as a result of the beneficial effects of carotid endarterectomy), this would . . .
. . . violate the proportionality assumption.
When the data do not support the proportionality assumption, proportional hazards analysis can still be performed by using . . .
When the data do not support the proportionality assumption, proportional hazards analysis can still be performed by using time-varying covariates.
Time-varying covariates
Independent variables whose values change over time. With time-varying covariates, the proportional hazards model can correctly account for hazard ratios that vary over the course of the study
A major study design advantage of proportional hazards analysis is that. . .
A major study design advantage of proportional hazards analysis is that it includes persons with varying lengths of follow-up.
Censored
A person who does not experience the outcome of interest by the end of the study is considered censored
Residual analysis
Method for determining goodness-of-fit. Residuals are the differences between the observed and the estimated values
Unfortunately, journals rarely print residual plots; readers must assume that the investigators reviewed them.
Automatic variable selection algorithms
Computer programs which systematically test the contribution of variables in different ways in order to eliminate irrelevant variables or variables of questionable relevancy and arrive at a simplified equation.
Hosmer–Lemeshow goodness-of-fit test
Works for logistic regression models. Compares the estimated-to-observed likelihood of outcome for groups of persons. In a well-fitting model, the estimated likelihood will be similar to the observed likelihood.
The reliability of a model depends on . . .
The reliability of a model depends on its purpose
If the model is explanatory, reliability means that a different set of data would probably yield a model with the same variables and similar coefficients.
A reliable predictive model predicts outcomes equally well for settings or for data other than those for which it was developed
As a rule of thumb, to have confidence in the results, there should be at least . . .
As a rule of thumb, to have confidence in the results, there should be at least 20 persons for each independent variable eligible to be included in a linear regression model and at least 10 outcomes for each independent variable eligible to be included in a logistic regression or proportional hazards model
Even if a study has a large enough number of events per independent variable, the estimates of the association between a risk factor and an outcome may still be inaccurate if ___.
Even if a study has a large enough number of events per independent variable, the estimates of the association between a risk factor and an outcome may still be inaccurate if the risk factor is rare
Non-differential misclassification tends to bias relative risk estimates . . .
. . . towards 1
If you match on a characteristic, you can no longer . . .
If you match on a characteristic, you can no longer examine the association of this characteristic with the outcome.