Bias and Confounding Flashcards

1
Q

Overarching categories of bias

A

Information bias

Selection bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Selection bias is especially common in ___.

A

Selection bias is especially common in case-control studies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Social desirability bias

A

People tend to systematically overreport things that make them look good, and underreport or underestimate things that make them look bad.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Rule of thumb for confounding

A

If the effect estimate changes by at least 10% when accounting for the potential confounding variable, it can be assumed that the variable is indeed confounding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Ways to account for confounding in study design

A
  1. Restriction (restrict to only one stratum, eliminating the confounding variable entirely)
  2. Matching (design a paired study and do paired t tests)
  3. Randomization of exposure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Effect modification

A

There is a different level of relationship between the exposure and outcome due to the presence of the effect modifier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Can something be both an effect modifier and confounder?

A

Yes!

In this case, the stratum specific OR or RR are different from one another, AND different from the OR and RR overall, in the same direction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Simple vs complex regression

A

Simple = 1 independent variable

Complex = 2 or more independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Logistic regression

A

Used for binary dependent variables. Essentially, you convert the raw data into a percentage likelihood of binary variable x given an independent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The correlation coefficient

A

r

Ranges from -1 to 1. Absolute value determines strength of the relationship, sign determines direction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Interpretation of thresholds of r magnitude

A

r > | 0.6 | implies a strong correlation

r > | 0.8 | implies a very strong correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Format of an equation derived from linear regression

A

y = β0 + β<span>1</span> x + e

β0 = intercept

β1 = slope

e = error term / residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

“Goodness of fit” measure

A

r2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When testing whether or not a relationship determined by linear regression is statistically significant, the null hypothesis is. . .

A

. . . that the predicted value of y should be the average value of y for all sample datapoints regardless of the value of x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A simple linear regression model for a binary independent variable is effectively the same as . . .

A

. . . a two sample t test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Nondifferential bias

A

The frequency of errors is approximately the same in the groups being compared.

In general, nondifferential misclassification tends to result in estimates of effect that are closer to “null” than the true effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

hazard ratio

A

Expression of relative risk which quantifies the probability of an event (e.g. dying) during a particular time interval, given that a subject has survived until that time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Multivariate linear regression

A

y = intercept + b1 x1 + b2x2 + residual error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Multivariate logistic regression

A

ln(p/1-p) = intercept + b1 x1 + b2x2 + residual error

where p/1-p is the odds ratio of condition y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Multivariable models are useful for identifying . . .

A

Multivariable models are useful for identifying both confounding and effect modification

21
Q

A correlation coefficient and an effect estimate from a simple linear model (i.e., beta) can both give information about . . .

A

A correlation coefficient and an effect estimate from a simple linear model (i.e., beta) can both give information about the strength and direction of a relationship between two continuous variables.

22
Q

Preterm birth

A

When a baby is born before 37 completed weeks of gestation

1/10 infants in the US. Racial and ethnic differences are substantial.

23
Q

A cofounder should meet these three criteria

A
  • It is associated with the exposure under study
  • It is a cause or correlate of the outcome under study, independent of the exposure
  • It is not a natural intermediate step between an exposure and outcome, nor is it naturally upstream of the exposure or downstream of the outcome.
24
Q

If stratum-specific RRs/ORs are equal to each other AND equal to the crude RR/OR, . . .

A

If stratum-specific RRs/ORs are equal to each other AND equal to the crude RR/OR, then the suspect variable is neither a confounder nor an effect modifier

25
Q

f stratum-specific RRs/ORs are equal to each other but are different from crude RR/OR, . . .

A

If stratum-specific RRs/ORs are equal to each other but are different from crude RR/OR, then the third variable is a confounder.

26
Q

If stratum-specific RRs/ORs are different from each other, . . .

A

If stratum-specific RRs/ORs are different from each other, then effect modification is present

27
Q

Methods of correcting for confounding

A
  • In the design stage:
    • Stratification
    • Restriction
    • Randomization
  • In the analysis stage:
    • Stratification
    • Statistical adjustment
28
Q

The intent of a model can be primarily ___ or ___

A

The intent of a model can be primarily explanatory or predictive

29
Q

As in all statistical tests, we are making ___.

A

As in all statistical tests, we are making inferences from a sample

30
Q

Generally, when designing a multivariable model, we want to choose variables that. . .

A
  1. we know from other research to be important
  2. add to the ability of the model to explain or predict the outcome
  3. whose inclusion changes the parameter estimates of the main predictor(s) of interest substantially (a common rule of thumb is more than 10%), since this suggests that the additional variable is a confounder of the exposure-outcome relationship.
31
Q

Effect modification won’t be apparent from a regression model unless . . .

A

Effect modification won’t be apparent from a regression model unless you look for it.

The simplest way is to stratify data and re-analyze.

32
Q

With logistic regression and proportional hazards regression, the coefficients have a special meaning:

A

The antilogarithm of the coefficient equals the odds ratio (for logistic regression) and the relative hazard (for proportional hazards regression).

33
Q

The underlying assumption of multiple linear regression is that . . .

A

as the independent variables increase (or decrease), the mean value of the outcome increases (or decreases) in a linear fashion.

34
Q

The underlying assumption of multivariable logistic regression is that. . .

A

each one-unit increase in a predictor multiplies the odds of the outcome by a certain factor (the odds ratio of the predictor) and that the effect of several variables is the multiplicative product of their individual effects.

35
Q

The underlying assumption of proportional hazards models is that. . .

A

the ratio of the hazard functions for persons with and without a given risk factor is the same over the entire study period

This one has a special name, the proportionality assumption

36
Q

If the hazard of death were higher with surgery at the beginning of the study (as is often the case with surgical interventions because of perioperative mortality) but lower with surgery later in the study (because persons who survived after surgery had a longer life expectancy as a result of the beneficial effects of carotid endarterectomy), this would . . .

A

. . . violate the proportionality assumption.

37
Q

When the data do not support the proportionality assumption, proportional hazards analysis can still be performed by using . . .

A

When the data do not support the proportionality assumption, proportional hazards analysis can still be performed by using time-varying covariates.

38
Q

Time-varying covariates

A

Independent variables whose values change over time. With time-varying covariates, the proportional hazards model can correctly account for hazard ratios that vary over the course of the study

39
Q

A major study design advantage of proportional hazards analysis is that. . .

A

A major study design advantage of proportional hazards analysis is that it includes persons with varying lengths of follow-up.

40
Q

Censored

A

A person who does not experience the outcome of interest by the end of the study is considered censored

41
Q

Residual analysis

A

Method for determining goodness-of-fit. Residuals are the differences between the observed and the estimated values

Unfortunately, journals rarely print residual plots; readers must assume that the investigators reviewed them.

42
Q

Automatic variable selection algorithms

A

Computer programs which systematically test the contribution of variables in different ways in order to eliminate irrelevant variables or variables of questionable relevancy and arrive at a simplified equation.

43
Q

Hosmer–Lemeshow goodness-of-fit test

A

Works for logistic regression models. Compares the estimated-to-observed likelihood of outcome for groups of persons. In a well-fitting model, the estimated likelihood will be similar to the observed likelihood.

44
Q

The reliability of a model depends on . . .

A

The reliability of a model depends on its purpose

If the model is explanatory, reliability means that a different set of data would probably yield a model with the same variables and similar coefficients.

A reliable predictive model predicts outcomes equally well for settings or for data other than those for which it was developed

45
Q

As a rule of thumb, to have confidence in the results, there should be at least . . .

A

As a rule of thumb, to have confidence in the results, there should be at least 20 persons for each independent variable eligible to be included in a linear regression model and at least 10 outcomes for each independent variable eligible to be included in a logistic regression or proportional hazards model

46
Q

Even if a study has a large enough number of events per independent variable, the estimates of the association between a risk factor and an outcome may still be inaccurate if ___.

A

Even if a study has a large enough number of events per independent variable, the estimates of the association between a risk factor and an outcome may still be inaccurate if the risk factor is rare

47
Q

Non-differential misclassification tends to bias relative risk estimates . . .

A

. . . towards 1

48
Q

If you match on a characteristic, you can no longer . . .

A

If you match on a characteristic, you can no longer examine the association of this characteristic with the outcome.