Biostatistics Flashcards

1
Q

distribution terms

A

mean
median
mode
skew

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

mean

A

average value of a dataset
calculated by summing all values and dividing by the number of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

mean limitations

A

misleading in skewed distributions or distributions with outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

median

A

middle value when a dataset is ordered from lowest to highest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

when is median ideal

A

skewed distributions as it is not influenced by outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

mode

A

the value that occurs most frequently in a dataset
ideal for skewed distributions as it is not influenced by outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

skew

A

describes asymmetry in a distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

positive skew

A

the right tail (higher values) is longer
many low values and a few extremely high values
mean > median > mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

negative skew

A

left tail (lower values) is longer
many high values and a few extremely low values
mean > median > mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

incidence

A

number of new cases of a condition in a given period
useful for assessing risk and evaluating interventions aimed at preventing disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

prevalence

A

total disease cases (new + pre-existing) in a population at one point in time divided by a total population
useful for planning health resource allocation and understanding disease burden
not impacted by disease duration or survival rates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

point prevalence

A

percentage of people with the condition at one specific point in time
better reflects the burden of chronic conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

lifetime prevalence

A

percent of individuals that ever had the condition at some point in their life
higher than point prevalence for chronic conditions
sensitive to survivorship and disease duration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

key differences incidence vs prevalence

A

incidence assesses new case development over time
prevalence assesses existing disease cases at one time point
incidence excludes pre-existing cases, prevalence includes them
incidence assesses risk, while prevalence assesses burden

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

sensitivity vs specificity image

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

sensitivity

A

proportion of people with the disease who test positive on the assessment
conceptualized as the true positive rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

sensitivity formula

A

sensitivity = true positives / (true positives + false negatives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

high sensitivity

A

correctly identifies a high proportion of people who actually have the disease (few false negatives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

sensitivity example

A

Lyme disease screening test with 95% sensitivity would correctly identify 95% of people with Lyme disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

specificity

A

defined as the proportion of people without the disease who test negative on the assessment
also conceptualized as the true negative rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

specificity formula

A

specificity = true negatives / (true negatives + false positives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

high specificity

A

correctly rules out most people who do not have the disease (few false positives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

specificity example

A

a cognitive screening test for dementia with 98% specificity would generate few false positive results, correctly identifying 98% of patients without dementia as testing negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

positive predictive value (PPV)

A

defined as the probability that a person with a positive test result truly has the underlying disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

positive predictive value depends on

A

sensitivity, specificity, and disease prevalence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

formula for positive predictive value

A

PPV = true positives/(true positives + false positives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

high positive predictive value

A

high probability of reflecting the true presence of disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

positive predictive value example

A

if a suicide risk screening test has a PPV of 90%, then 90% of patients screening positive are truly at high risk for suicide

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

negative predictive value

A

probability that a person with a negative test result truly does NOT have the underlying disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

negative predictive value depends on

A

sensitivity, specificity, and disease prevalence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

negative predictive value formula

A

NPV = true negatives / (true negatives + false negatives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

high negative predictive value

A

a negative result reliably rules out the presence of disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

negative predictive value example

A

if a screening test for CJD has an NPV of 97%, only 3% of patients screening negative actually have CJD (low false negative rate)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

case report/series

A

detailed description of a single clinical case or small group of cases
mainly descriptive with no comparisons to a control group
used to illustrate unique cases without evidence of causality
hypothesizes about ideas that can be investigated further with better quality research

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

case report/series example

A

a report of an individual patient diagnosed with Wilson’s disease that describes their symptoms, diagnosis, and treatment response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

case-control study

A

compares cases (with an outcome) to controls (without outcome) to identify factors associated with the outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

case-control study design

A

retrospective design: starts with the outcome and then investigates exposures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

case-control study design useful for

A

studying rare diseases or outcomes with long latency periods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

case-control study primary statistics

A

odds ratios quantifying the level of association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

case-control study example

A

a study comparing the prevalence of chemical exposure at Camp Lejeune between patients diagnosed with Parkinson’s disease and healthy controls without the diagnosis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

cross-sectional study

A

analyzes the relationship between exposures and outcomes at a single point in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

cross-sectional study useful for

A

disease prevalence and studying multiple outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

cross-sectional study cannot determine

A

temporal sequence between exposure and outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

cross-sectional study primary statistics

A

prevalence ratios/odds ratios

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

cross-sectional study example

A

a study surveying the prevalence of essential tremor in octogenarians at a single point in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

cohort study

A

follows population prospectively to quantify outcome risk
groups are defined by exposure status

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

cohort study establishes

A

temporal relationship between predictors and outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

cohort study compared to cross-sectional study

A

more expensive and time-intensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

cohort study primary statistics

A

risk ratios quantifying relative risk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

cohort study example

A

a multi-year study following a group of children into adulthood to track rates of diagnosis of multiple sclerosis and to identify predictive factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

randomized control study

A

gold standard experimental study in which participants are randomly allocated to study groups
highest internal validity due to randomization minimizing bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

randomized control study establishes

A

causality between intervention and outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

randomized control study primary statistics

A

risk ratios comparing outcomes between groups

54
Q

randomized control study example

A

a trial randomly assigning patients with amyotrophic lateral sclerosis to receive either a new medication or placebo, to compare treatment efficacy

55
Q

case report/series advantages

A

describe unique cases in detail

56
Q

case report/series disadvantages

A

no control group, limited generalizability, susceptible to bias

57
Q

case report/series statistics used

A

descriptive only

58
Q

case-control advantages

A

good for rare outcomes, retrospective

59
Q

case control disadvantages

A

prone to bias (recall, selection), doesn’t determine individual risk

60
Q

case control statistics used

A

odds ratio

61
Q

cross-sectional advantages

A

easy, provides snapshot of prevalence

62
Q

cross-sectional disadvantages

A

no temporal relationship between exposure and outcome

63
Q

cross-sectional statistics used

A

prevalence, chi-square

64
Q

cohort advantages

A

can determine individual risk and incidence

65
Q

cohort disadvantages

A

expensive, time-consuming, loss to follow-up

66
Q

cohort statistics used

A

risk ratios

67
Q

randomized controlled trial advantages

A

gold standard, minimizes bias

68
Q

randomized controlled trial disadvantages

A

very expensive, time intensive, may not reflect real world effectiveness

69
Q

randomized controlled trial statistics used

A

risk ratios, NNT, NNH

70
Q

efficacy trials

A

measure whether interventions produce the intended result under ideal/controlled conditions
tight inclusion criteria and close monitoring
maximize internal validity

71
Q

effectiveness trials

A

examine whether intervention works under real-world conditions
broader inclusion, more variability in delivery/adherence
prioritize generalizability and applicability

72
Q

crossover trials

A

participants receive a sequence of different treatments
useful when:
- disease course is stable
- treatment effects short-term or reversible
- minimizes sample size required
must account for treatment carryover effects

73
Q

naturalistic studies

A

investigate interventions under routine clinical practice conditions
broad inclusion criteria, less frequent monitoring
findings complement efficacy data on effectiveness

74
Q

twin studies

A

compare trait frequency between identical vs fraternal twins
estimate genetic components of disease by parsing genetic versus environmental effects

75
Q

meta-analysis

A

statistically synthesizes data from multiple smaller studies to gain Power
can assess consistency or heterogeneity across studies
at risk for selection or publication bias

76
Q

meta-analysis example

A

a statistical analysis combining data from multiple studies examining the efficacy of antiplatelets for stroke prevention to determine the overall treatment effect size across trials

77
Q

association studies

A

correlate genetic variants and other biomarkers to disease states

78
Q

pragmatic trials

A

emphasize accountability by testing interventions in typical “real world” practice settings with more heterogeneous patients and conditions

79
Q

odds ratio

A

quantifies the association between an exposure and an outcome, comparing the odds of the outcome occurring in the exposed group to the odds of the outcome occurring in the unexposed group

80
Q

odds ratio formula

A

OR = (A/B) / (C/D)
A = cases exposed
B = controls exposed
C = cases unexposed
D = controls unexposed

81
Q

odds ratio interpretation

A

OR > 1 means exposure increases odds
OR < 1 means exposure decreases odds
OR = 1 means no association between exposure and outcome

82
Q

odds ratio use

A

in case-control studies as a proxy for relative risk
does not provide information about actual risk or incidence and dose not imply causation

83
Q

relative risk

A

compares the risk of an outcome among an exposed group to the risk of an unexposed group
provides information about the actual likelihood of the outcome occurring

84
Q

relative risk formula

A

RR = incidence in exposed / incidence in unexposed
incidence = number with disease / number without disease

85
Q

relative risk interpretation

A

RR > 1: increased risk in the exposed group
RR < 1: decreased risk in the exposed group
RR = 1: equal risk in both groups

86
Q

relative risk use

A

directly approximates incidence risk
used frequently in cohort studies

87
Q

absolute risk reduction (ARR)

A

the difference in outcome rates between control and experimental groups

88
Q

absolute risk reduction (ARR) formula

A

ARR = control event rate - experimental event rate

89
Q

absolute risk increase (ARI)

A

the increase in event rates in the experimental group compared to control

90
Q

absolute risk increase (ARI) formula

A

ARI = experimental event rate - control event rate

91
Q

absolute risk reduction/increase info

A

provides a direct measure of the benefit or harm

92
Q

relative risk reduction (RRR)/increase (RRI)

A

translates the absolute risk reduction or increase into a percentage value
makes interpretation of efficacy easier clinically

93
Q

relative risk reduction (RRR)/increase (RRI) formula

A

RRR = |ARR|x100 / control event rate
RRI = |ARI|x100 / experimental event rate

94
Q

number needed to treat (NNT)/harm (NNH)

A

number of people needed to treat in order for one additional patient to benefit/experience harm

95
Q

number needed to treat (NNT)/harm (NNH) formula

A

NNT = 1/ARR
NNH = 1/ARI

96
Q

hazard ratio

A

compares the hazard (rate of an event) between groups over time

97
Q

hazard ratio formula

A

HR = treatment hazard rate / control hazard rate

98
Q

hazard ratio interpretation

A

HR > 1: increased rate of outcome with exposure
HR < 1: decreased rate of outcome with exposure

99
Q

hazard ratio use

A

used in survival analysis

100
Q

attributable risks

A

used to determine how much disease burden in a population can be attributed to a risk factor

101
Q

attributable risk percent/proportion

A

proportion of disease in the exposed group attributable to the exposure

102
Q

population attributable risk percent

A

proportion of disease in the whole population attributable to the exposure

103
Q

hypothesis testing

A

in research, statistical analysis evaluate hypotheses about treatment effects. This involves starting a null and alternative hypothesis

104
Q

null hypothesis (Ho)

A

asserts there is no true difference between groups or no effect of treatment
essentially the “status quo” scenario
default position unless evidence indicates otherwise

105
Q

null hypothesis (Ho) form

A

“there is no difference between treatment A and B”

106
Q

alternative hypothesis (H1)

A

what investigator hopes to prove with study data
asserts there is true difference or treatment effect
contradicts the null hypothesis

107
Q

alternative hypothesis (H1) form

A

“there is a difference between treatment A and B”

108
Q

p-value

A

probability of obtaining results >/= the observed effect if the null hypothesis is true

109
Q

low p-value

A

reject the null hypothesis

110
Q

typical threshold for p-value

A

P </= 0.05

111
Q

p-value limitation

A

a statistically significant result does not necessarily imply clinical importance
even large sample studies with tiny differences that are statistically significant may lack clinical significance or practical importance

112
Q

p-value schematic

A
113
Q

confidence intervals

A

range of values expected to contain the true parameter
help assess clinical significance beyond statistical hypotheses

114
Q

confidence interval influenced by the size and variability of the sample

A

wider intervals -> less precision, less confidence in observed effect size
narrower intervals -> greater confidence in point estimate

115
Q

95% CI

A

95% probability of containing the true value

116
Q

type 1 error

A

incorrectly concluding a difference/effect is real, when it is not

117
Q

type 1 error equivalent to

A

false positive result/false rejection of null hypothesis when it was true

118
Q

type 1 error probability

A

probability determined by alpha level (typically 0.05 or 5%)

119
Q

type 2 error

A

failing to detect a true effect or difference -> false negative finding
concluding there is no effect when one actually exists

120
Q

type 2 error determined by

A

the power of the study, which depends on sample size

121
Q

type 3 error

A

asking the wrong research question entirely
no meaningful answer, regardless of the statistical findings
may represent a flawed study design or mismatched hypotheses

122
Q

regression analysis

A

models the relationship between multiple variables and a dependent variable

123
Q

regression analysis determines

A

determines how strongly/weakly one variable predicts or influences another

124
Q

regression analysis quantifies

A

quantifies the effect size for each predictor

125
Q

regression analysis example

A

an analysis that would test which factors are significantly associated with a higher or lower likelihood of developing Guillain-Barre syndrome, after controlling for other variables

126
Q

chi-square test

A

compares observed and expected frequencies between categorical variables

127
Q

chi-square test determines

A

determines the likelihood of differences from chance alone

128
Q

T-test

A

compares means between two groups
can be paired or independent samples

129
Q

T-test determines

A

determines statistical probability that group differences are significant

130
Q

ANOVA

A

compares means across more than two groups

131
Q

ANOVA determines

A

determines the likelihood that all group means are equal

132
Q

two-way ANOVA

A

determines the effects of two independent categorical variables and any interaction between those variables