Biostatistics Flashcards

1
Q

How do you calculate annual incidence

A

Number of new cases in a given year/the number of people in that population at risk of developing that condition

Note: people who already have the condition or who are not able to get the condition are not included in the denominator

Note: if there are 200 new cases but all of them die before the end of the year the incidence would still use 200

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Standardized mortality ratio

A

Observed number of deaths/expected number of deaths

Note: SMR of 2 indicated that mortality of that population is twice that of the general population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you calculate maternal mortality rate

A

Number of maternal deaths/number of live births

Note: women who were pregnant but had miscarriages are not included in this calculation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When would you use odd ratio vs relative risk

A

You can only use relative risk in prospective studies that follow pts over time so that you can calculate the risk of developing a disease over a certain period of time

An odds ratio is used during case control studies where you record data at a single point in time (recording who has been exposed to the exposure of interest and who has developed the disease in question). You can’t calculate risk in this situation because you didn’t follow a group of people after being exposed, so you calculate the odds that a pt with the disease had the exposure vs the odds that a pt without the disease had the exposure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you calculate relative risk?

A

(Number of exposed pts who developed the disease/total number of exposed pts) divided by (number of unexposed pts who developed the disease/total number of unexposed pts)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you calculate an odds ratio?

A

(Number of exposed people who have the disease/number of unexposed people with disease) divided by (number of exposed people without disease/number of unexposed people without disease)

Note: you are calculating the odds of exposure in disease pts and dividing that by the odds of exposure in pts without the disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Coefficient of determination

A

A value used to describe degree of correlation that expresses the percentage of variability in the outcome factor that is explained by the predictor factor

Note: If folic acid intake and plasma homocysteine levels have a correlation coefficient of 64%, then 64% of the variability in plasma homocysteine levels is due to changes in folic acid intake

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you calculate the coefficient of determination?

A

It is the square of the correlation coefficient

E.g. two factors that have a correlation coefficient of -0.8 have a coefficient of determination of 0.64 (or 64%)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If 30 of 60 smokers develop lung cancer and 10 of 40 non smokers develop lung cancer, what is the attributable risk of smoking?

A

Risk in smokers is 0.5 and risk in nonsmokers is 0.25

Attributable risk = 0.5 - 0.25 = 0.25

Note: Atributable risk is the risk of developing disease in exposed pts - risk of developing disease in unexposed pts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Attributable risk percent

A

The percentage of disease cases in exposed pts that can be attributed to that exposure

Note: if 20% of smokers develop lung cancer and 10% of nonsmokers develop lung cancer, then the attributable risk percent is 50% (50% of smokers who developed lung cancer got it because they were smokers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Population attributable risk

A

The percentage of total cases that are attributable to a certain exposure

Note: 30 of 60 smokers develop lung cancer and 10 of 40 nonsmokers develop lunch cancer, the population attributable risk is 37.5% (because 50% of smokers who got lung cancer, 15 people, got it because of smoking. Out of 40 people who developed lung cancer, 15 of these can be attributed to smoking: 15/40 = 37.5%)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to calculate the Number needed to treat

A

1/absolute risk reduction

Note: absolute risk reduction = risk of untreated pt developing outcome - risk of treated pt developing outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

25 of 50 pts treated with a new drug survive 5 years and 25 of 100 pts treated with standard therapy survive 5 years. What is the absolute risk reduction?

A

Mortality in those treated with standard therapy is 0.75 and mortality in those treated with the new drug is 0.5, so the absolute risk reduction is 0.75 - 0.5 or 0.25

Note: the number needed to treat is then 1/0.25 or 4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Disability-adjusted life years

A

DALYs are a way to measure disease burden that estimates the total number of years of life lost due to that disease. It can be calculated by adding the years of life lost due to decreased quality of life PLUS the years of life lost due to premature mortality

If a pt develops depression at age 30 and commits suicide at age 50 (assuming life expectancy at age 50 is 84 and that the disability weight for depression is 0.35), then:

DALY = years of life lost (84-50) + years of life with disability (30 + (20 x 0.35)) = 41 years (lost 7 years due to loss of quality of life during disability and lost 34 years due to premature mortality)

Note: DALYs should be minimized. Disability weights are similar to time trade offs used for quality-adjusted years of life but are standard values used for populations, whereas TTOs are an individual pts self-reported number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Quality-adjusted life years

A

QALYs is way to measure the burden of a disease. The pt thinks about how many years with their current disease they would be willing to trade for 1 year of life at full health.

If a pt states that 5 years in their current state is equivalent to 1 year at full health, then the time trade off is 1/5 or 0.2, which can then be used to calculate QALYs. If this pt was healthy until age 30, then had disease until present at age 40. The quality adjusted life years would be 32 (30 years at full health + 10 years at 0.2)

Note: The goal is to maximize QALYs through treatments that increase the time trade off factor (treatments that increase quality of life)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What graph is often used in survival analysis (e.g. determining whether peritoneal or hemodialysis prolongs survival)?

A

A Kaplan-Meier survival curve, which reports the proportion of subjects surviving at each time point through the study (the slower the decline in the curve the more likely those subjects were to survive)

Note: There is only statistical significance if the p-value reported from a log-tank test of the curve is < 0.05

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does an odds ratio < 1 indicate

A

The exposure being studied is associated with a lower odds of the outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

When would you use any odds ratio rather than relative risk?

A

In case control or cross sectional studies (where you already know who has the outcome and who doesn’t and you’re trying to figure out what exposures may be associated with higher odds of the outcome)

Note: Relative risk is used for observational or experimental follow-up studies where subjects are tracked through time to determine whether an outcome occurs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Multiple linear regression vs multiple logistic regression

A

Multiple linear regression is used to evaluate associations between 1 quantitative dependent variable (a primary outcome with a numerical value such as LDL level) and 2 or more independent variables that can be either quantitative or qualitative

E.g. Use multiple linear regression to evaluate the association between statin use and LDL levels while also adjusting for BMI, creatinine clearance, and sex (dependent variable, LDL levels, is quantitative)

Multiple logistic regression is used to evaluate the association between 1 dichotomous dependent variable (primary outcome has only 2 options, such as alive or dead) and 2 or more independent variables

E.g. Use multiple logistic regression to evaluate the association between obesity and the presence of diabetes while adjusting for age and sex (dependent variable, presence of diabetes, is dichotomous because they either have or do not have diabetes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is it called when a study is designed to evaluate multiple interventions and all possible combinations of those interventions?

A

Factorial study (or fully crossed design)

E.g. A study that randomized pts to treatment with placebo, glutamine alone, antioxidants alone, or glutamine + antioxidants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Statistical power

A

The ability of a study design to identify a statistically significant difference between two groups if a difference actually does exist

Power = 1 - beta

Note: Beta is the rate of type II errors (failure to reject a false null hypothesis)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Case fatality rate vs mortality rate

A

The case fatality rate is the proportion of known cases of a particular condition who end up dying from that condition

The mortality rate is the probability of dying from a particular disease in the general population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Standardized mortality ratio

A

An epidemiological parameter used to determine whether there is an unusually high number of deaths in a given group by comparing it to how many deaths would be expected in a similar group of the general population (controlling for age, sex, etc)

It is calculated using the mortality rates for the general population of the disease in question to estimate the number of expected deaths, which are compared to the observed deaths to give the standardized mortality ratio

24
Q

How do you calculate specificity?

A

True negatives/(true negatives + false positives)

Note: This is the probability of a non-diseased person actually testing negative

25
Q

How do you calculate positive predictive value?

A

True positives/(true positives + false positives)

Note: This is the probability that a person who tests positive actually has the disease. The more rare the disease the more relative false positives you will get, reducing the positive predictive value

26
Q

Net clinical benefit

A

A measure of the possible benefit minus the possible harm from an intervention

27
Q

Intention to treat analysis

A

A method of evaluating subjects in a study based on which treatment group they were randomized to, regardless of which treatment they actually received

Note: It is important to do ITT analysis to preserve randomization and minimize the effects of crossover (e.g. pts in the placebo group actually receiving treatment) and dropout (pts not completing the treatment study)

28
Q

What is the best epidemiological parameter to compare the significance of negative and positive results for individual patients, regardless of disease prevalence?

A

Likelihood ratios (the positive likelihood ratio is probability of a positive result occurring in a pt with disease compared to the probability of a positive result in a pt without disease)

Positive likelihood ratio = sensitivity/(1 - specificity)

Note: This can be used to calculate post-test odds of having the disease using that individual Pts pre-test odds (post-test odds = likelihood ratio x pre-test odds)

29
Q

How do you calculate the negative likelihood ratio for a test?

A

Negative likelihood ratio for a negative test result = (1 - sensitivity)/specificity

Note: You want a negative likelihood ratio to be very small, preferably <0.1 to rule out the disease with a negative test result. The more sensitive the test, the less likely there are to be false negatives and the lower the negative likelihood ratio (strong evidence to rule out the disease). The more specific the test, the less likely there are to be false positives and the lower negative likelihood ratio.

30
Q

What is a good value for a likelihood ratio?

A

A positive likelihood ratio greater than 10 suggests that a positive test result provides strong evidence to rule in the disease

A negative likelihood ratio less than 0.1 suggests that a negative test result provides strong evidence to rule out the disease

A likelihood ratio between 0.5 and 2 suggests that performing the test wouldn’t provide much evidence to change the likelihood that a pt has or does not have a disease (the pre-test probability is very similar to the post-rest probability)

31
Q

Verification bias

A

A type of bias that occurs when only some test results are verified using a gold standard technique

Note: In a study evaluating a new screening method, some pts with negative test results in addition to pts with the positive result should undergo the gold-standard test (e.g. biopsy) to minimize verification bias

32
Q

Selection bias

A

Occurs when the study sample is underrepresentative of the target population

33
Q

Ascertainment bias

A

(AKA sampling bias) A type of selection bias that occurs when the study population differs from the target population due to no random selection methods

34
Q

No response bias

A

A type of selection bias that occurs when a high rate of nonresponders results in the study population differing from the target population (only occurs if the nonresponders are significantly different from the responders in some way)

35
Q

Berkson bias

A

A type of selection bias that occurs when a disease is only studied in hospital-based patients and so cannot be generalized to people outside the hospital

36
Q

Neyman bias

A

(AKA prevalence bias) A type of selection bias that occurs when an exposure of interest happens long before the disease is assessed, causing the study to miss pts who die or recover between exposure and evaluation

37
Q

Attrition bias

A

A type of selection bias that occurs when a group of people are lost to follow up that are significantly different from the group that remains in the study to the end

38
Q

Detection bias

A

(AKA surveillance bias) A type of observational bias that occurs when a risk factor causes the exposed group to monitor more closely for disease development than the non exposed group

E.g. Pts with exposure to psychiatry classes may be more likely to self identify depression

39
Q

Type I error (alpha)

A

The probability of rejecting the null hypothesis when it is actually true (when a study identifies a significance difference between two groups when actually there is none)

40
Q

What is the multiplicity problem?

A

There is a higher probability of a type I/alpha error (finding significance when there is none) if evaluating for multiple secondary endpoints. This is because a study is designed to minimize type I and type II errors for the primary outcome and study adjustments are needed to effectively evaluate secondary endpoints.

Note: researchers can adjust for this by using a different alpha level (or p-value)

41
Q

If a father has hemophilia A, what is the probability his son will also have hemophilia A?

A

Same as for the general population (in X-linked recessive diseases a fathers genotype has no bearing on his sons as he gives his Y chromosome to the son; however, every daughter will at least be a carrier because he gives them his X chromosome)

42
Q

What are the basic criteria to establish causality (not just correlation)?

A
  • analogy (known similar associations)
  • biological gradient (a dose-response relationship between cause and effect, the more smoking the higher the risk of lung cancer)
  • biological plausibility (known explanations for why the cause might have the effect)
  • coherence (association preferably does not contradict known facts)
  • consistency (cause is widely associated with the effect)
  • experimental evidence (effect is evidenced by experimental designs)
  • specificity (cause is uniquely associated with the effect)
  • strength of association (cause is associated with a substantive effect)
  • temporality (cause comes before the effect in time)
43
Q

What is the kappa statistic?

A

A quantitative measure of inter-rater reliability (the higher the kappa statistic the less likely there is agreement between different interpretations due to chance alone)

Note: A kappa of 1 indicates that two raters agree 100% of the time, a kappa of -1 indicates that they disagree 100% of the time, and a kappa of 0 indicates that their agreement is solely due to random chance

44
Q

Receiver-operating characteristic (ROC) curves

A

Curves used to evaluate the diagnostic accuracy of a test by plotting sensitivity on the Y axis and (1 - specificity) on the x axis.

Note: The most accurate test will have the largest area under the curve

45
Q

How do you calculate the absolute risk reduction?

A

ARR = event rate in control group MINUS event rate in experimental group

Note: The number needed to treat is 1/ARR

46
Q

Sensitivity analysis

A

When certain criteria or variables are adjusted and then the primary analysis calculations are repeated to see how sensitive the results are to specific criteria/variables

Note: When sensitivity analysis suggests low sensitivity of the results to various criteria/parameters that increases the robustness of the findings

47
Q

What is a non-inferiority study?

A

A study designed to prove that an experimental treatment is NOT worse than a standard of care treatment by more than an acceptable margin. A non-inferiority margin is designated that is easier to achieve than the margin delineating inferiority/superiority. If the confidence interval for the experimental drug lies entirely to the right of the non-inferiority margin than it is said to be non-inferior to the standard of care treatment (if it lies entirely to the left than it is considered not non-inferior, and if the confidence interval includes the non-inferiority margin than nothing is statistically significant either way)

48
Q

How do you calculate relative risk reduction?

A

RRR = (risk in control group - risk in treatment group)/risk in control group

49
Q

How can you tell whether statistical significance is due to a specific possible confounding variable?

A

Stratify the analysis by that confounding variable. If the significance disappears within both the group of people that have the confounding variable and the group that does not have it, then the original significance was due to the confounding variable

Note: Confounding decreases the internal validity of a study

50
Q

Univariate vs multiple regression analysis

A

Both forms of analysis evaluate for associations between two groups, but univariate analysis only looks at the association of one variable (does not account for possible confounding variables), whereas multiple regression analysis does adjust for confounding variables to produce an adjusted odds ratio

Note: Multiple regression always analysis is better suited to give real odds ratios by adjusting for possible confounding variables

51
Q

Ecological fallacy bias

A

When population-level information is applied at an individual level

52
Q

What statistical test should be used to compare the mean values of a continuous variable in several groups of subjects, assuming normality and homoscedasticity of the data?

E.g. to evaluate the associations between mean left ventricular wall thickness of pts with a normal echo, borderline echo, or hypertrophic echo

A

Analysis of variance (ANOVA)

Note: The ANOVA tear gives an F factor based on the variation within and between the different groups that can be used to produce a p-value

53
Q

What statistical test should be used to evaluate the association between categorical variables?

E.g. to measure the association between treatment success vs failure in med compared to that in women

A

The Pearson chi-square fest (provided the sample size is large enough)

Note: If the sample size is small, then you should use Fishers exact test)

54
Q

What is length-time bias?

A

Occurs when patients with more rapidly progressive and fatal disease aren’t detected because they die too quickly

E.g. length-time bias may cause you to believe that a screening test leads to prolonged lifespan with disease because the screening test mostly detects pts with more mild and slowly progressing disease

55
Q

What is a funnel plot?

A

A plot used to assess for publication bias for a meta analysis by plotting each studies treatment effect on the x axis and and the log of the odds ratio on the y axis (as a measure of the studies size or precision). If there’s no bias 95% of studies should lie within the triangle region delineated by the standard error lines. Larger studies should be at the top where they should fit in a more narrow range due to increased power

Note: Funnel plots should be symmetric in the absence of study heterogeneity and publication bias (lack of symmetry represents likely bias)