Biostatistics Flashcards

1
Q

distribution terms

A

mean
median
mode
skew

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

mean

A

average value of a dataset
calculated by summing all values and dividing by the number of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

mean limitations

A

misleading in skewed distributions or distributions with outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

median

A

middle value when a dataset is ordered from lowest to highest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

when is median ideal

A

skewed distributions as it is not influenced by outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

mode

A

the value that occurs most frequently in a dataset
ideal for skewed distributions as it is not influenced by outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

skew

A

describes asymmetry in a distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

positive skew

A

the right tail (higher values) is longer
many low values and a few extremely high values
mean > median > mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

negative skew

A

left tail (lower values) is longer
many high values and a few extremely low values
mean > median > mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

incidence

A

number of new cases of a condition in a given period
useful for assessing risk and evaluating interventions aimed at preventing disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

prevalence

A

total disease cases (new + pre-existing) in a population at one point in time divided by a total population
useful for planning health resource allocation and understanding disease burden
not impacted by disease duration or survival rates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

point prevalence

A

percentage of people with the condition at one specific point in time
better reflects the burden of chronic conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

lifetime prevalence

A

percent of individuals that ever had the condition at some point in their life
higher than point prevalence for chronic conditions
sensitive to survivorship and disease duration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

key differences incidence vs prevalence

A

incidence assesses new case development over time
prevalence assesses existing disease cases at one time point
incidence excludes pre-existing cases, prevalence includes them
incidence assesses risk, while prevalence assesses burden

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

sensitivity vs specificity image

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

sensitivity

A

proportion of people with the disease who test positive on the assessment
conceptualized as the true positive rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

sensitivity formula

A

sensitivity = true positives / (true positives + false negatives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

high sensitivity

A

correctly identifies a high proportion of people who actually have the disease (few false negatives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

sensitivity example

A

Lyme disease screening test with 95% sensitivity would correctly identify 95% of people with Lyme disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

specificity

A

defined as the proportion of people without the disease who test negative on the assessment
also conceptualized as the true negative rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

specificity formula

A

specificity = true negatives / (true negatives + false positives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

high specificity

A

correctly rules out most people who do not have the disease (few false positives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

specificity example

A

a cognitive screening test for dementia with 98% specificity would generate few false positive results, correctly identifying 98% of patients without dementia as testing negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

positive predictive value (PPV)

A

defined as the probability that a person with a positive test result truly has the underlying disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
positive predictive value depends on
sensitivity, specificity, and disease prevalence
26
formula for positive predictive value
PPV = true positives/(true positives + false positives)
27
high positive predictive value
high probability of reflecting the true presence of disease
28
positive predictive value example
if a suicide risk screening test has a PPV of 90%, then 90% of patients screening positive are truly at high risk for suicide
29
negative predictive value
probability that a person with a negative test result truly does NOT have the underlying disease
30
negative predictive value depends on
sensitivity, specificity, and disease prevalence
31
negative predictive value formula
NPV = true negatives / (true negatives + false negatives)
32
high negative predictive value
a negative result reliably rules out the presence of disease
33
negative predictive value example
if a screening test for CJD has an NPV of 97%, only 3% of patients screening negative actually have CJD (low false negative rate)
34
case report/series
detailed description of a single clinical case or small group of cases mainly descriptive with no comparisons to a control group used to illustrate unique cases without evidence of causality hypothesizes about ideas that can be investigated further with better quality research
35
case report/series example
a report of an individual patient diagnosed with Wilson's disease that describes their symptoms, diagnosis, and treatment response
36
case-control study
compares cases (with an outcome) to controls (without outcome) to identify factors associated with the outcome
37
case-control study design
retrospective design: starts with the outcome and then investigates exposures
38
case-control study design useful for
studying rare diseases or outcomes with long latency periods
39
case-control study primary statistics
odds ratios quantifying the level of association
40
case-control study example
a study comparing the prevalence of chemical exposure at Camp Lejeune between patients diagnosed with Parkinson's disease and healthy controls without the diagnosis
41
cross-sectional study
analyzes the relationship between exposures and outcomes at a single point in time
42
cross-sectional study useful for
disease prevalence and studying multiple outcomes
43
cross-sectional study cannot determine
temporal sequence between exposure and outcome
44
cross-sectional study primary statistics
prevalence ratios/odds ratios
45
cross-sectional study example
a study surveying the prevalence of essential tremor in octogenarians at a single point in time
46
cohort study
follows population prospectively to quantify outcome risk groups are defined by exposure status
47
cohort study establishes
temporal relationship between predictors and outcomes
48
cohort study compared to cross-sectional study
more expensive and time-intensive
49
cohort study primary statistics
risk ratios quantifying relative risk
50
cohort study example
a multi-year study following a group of children into adulthood to track rates of diagnosis of multiple sclerosis and to identify predictive factors
51
randomized control study
gold standard experimental study in which participants are randomly allocated to study groups highest internal validity due to randomization minimizing bias
52
randomized control study establishes
causality between intervention and outcome
53
randomized control study primary statistics
risk ratios comparing outcomes between groups
54
randomized control study example
a trial randomly assigning patients with amyotrophic lateral sclerosis to receive either a new medication or placebo, to compare treatment efficacy
55
case report/series advantages
describe unique cases in detail
56
case report/series disadvantages
no control group, limited generalizability, susceptible to bias
57
case report/series statistics used
descriptive only
58
case-control advantages
good for rare outcomes, retrospective
59
case control disadvantages
prone to bias (recall, selection), doesn't determine individual risk
60
case control statistics used
odds ratio
61
cross-sectional advantages
easy, provides snapshot of prevalence
62
cross-sectional disadvantages
no temporal relationship between exposure and outcome
63
cross-sectional statistics used
prevalence, chi-square
64
cohort advantages
can determine individual risk and incidence
65
cohort disadvantages
expensive, time-consuming, loss to follow-up
66
cohort statistics used
risk ratios
67
randomized controlled trial advantages
gold standard, minimizes bias
68
randomized controlled trial disadvantages
very expensive, time intensive, may not reflect real world effectiveness
69
randomized controlled trial statistics used
risk ratios, NNT, NNH
70
efficacy trials
measure whether interventions produce the intended result under ideal/controlled conditions tight inclusion criteria and close monitoring maximize internal validity
71
effectiveness trials
examine whether intervention works under real-world conditions broader inclusion, more variability in delivery/adherence prioritize generalizability and applicability
72
crossover trials
participants receive a sequence of different treatments useful when: - disease course is stable - treatment effects short-term or reversible - minimizes sample size required must account for treatment carryover effects
73
naturalistic studies
investigate interventions under routine clinical practice conditions broad inclusion criteria, less frequent monitoring findings complement efficacy data on effectiveness
74
twin studies
compare trait frequency between identical vs fraternal twins estimate genetic components of disease by parsing genetic versus environmental effects
75
meta-analysis
statistically synthesizes data from multiple smaller studies to gain Power can assess consistency or heterogeneity across studies at risk for selection or publication bias
76
meta-analysis example
a statistical analysis combining data from multiple studies examining the efficacy of antiplatelets for stroke prevention to determine the overall treatment effect size across trials
77
association studies
correlate genetic variants and other biomarkers to disease states
78
pragmatic trials
emphasize accountability by testing interventions in typical "real world" practice settings with more heterogeneous patients and conditions
79
odds ratio
quantifies the association between an exposure and an outcome, comparing the odds of the outcome occurring in the exposed group to the odds of the outcome occurring in the unexposed group
80
odds ratio formula
OR = (A/B) / (C/D) A = cases exposed B = controls exposed C = cases unexposed D = controls unexposed
81
odds ratio interpretation
OR > 1 means exposure increases odds OR < 1 means exposure decreases odds OR = 1 means no association between exposure and outcome
82
odds ratio use
in case-control studies as a proxy for relative risk does not provide information about actual risk or incidence and dose not imply causation
83
relative risk
compares the risk of an outcome among an exposed group to the risk of an unexposed group provides information about the actual likelihood of the outcome occurring
84
relative risk formula
RR = incidence in exposed / incidence in unexposed incidence = number with disease / number without disease
85
relative risk interpretation
RR > 1: increased risk in the exposed group RR < 1: decreased risk in the exposed group RR = 1: equal risk in both groups
86
relative risk use
directly approximates incidence risk used frequently in cohort studies
87
absolute risk reduction (ARR)
the difference in outcome rates between control and experimental groups
88
absolute risk reduction (ARR) formula
ARR = control event rate - experimental event rate
89
absolute risk increase (ARI)
the increase in event rates in the experimental group compared to control
90
absolute risk increase (ARI) formula
ARI = experimental event rate - control event rate
91
absolute risk reduction/increase info
provides a direct measure of the benefit or harm
92
relative risk reduction (RRR)/increase (RRI)
translates the absolute risk reduction or increase into a percentage value makes interpretation of efficacy easier clinically
93
relative risk reduction (RRR)/increase (RRI) formula
RRR = |ARR|x100 / control event rate RRI = |ARI|x100 / experimental event rate
94
number needed to treat (NNT)/harm (NNH)
number of people needed to treat in order for one additional patient to benefit/experience harm
95
number needed to treat (NNT)/harm (NNH) formula
NNT = 1/ARR NNH = 1/ARI
96
hazard ratio
compares the hazard (rate of an event) between groups over time
97
hazard ratio formula
HR = treatment hazard rate / control hazard rate
98
hazard ratio interpretation
HR > 1: increased rate of outcome with exposure HR < 1: decreased rate of outcome with exposure
99
hazard ratio use
used in survival analysis
100
attributable risks
used to determine how much disease burden in a population can be attributed to a risk factor
101
attributable risk percent/proportion
proportion of disease in the exposed group attributable to the exposure
102
population attributable risk percent
proportion of disease in the whole population attributable to the exposure
103
hypothesis testing
in research, statistical analysis evaluate hypotheses about treatment effects. This involves starting a null and alternative hypothesis
104
null hypothesis (Ho)
asserts there is no true difference between groups or no effect of treatment essentially the "status quo" scenario default position unless evidence indicates otherwise
105
null hypothesis (Ho) form
"there is no difference between treatment A and B"
106
alternative hypothesis (H1)
what investigator hopes to prove with study data asserts there is true difference or treatment effect contradicts the null hypothesis
107
alternative hypothesis (H1) form
"there is a difference between treatment A and B"
108
p-value
probability of obtaining results >/= the observed effect if the null hypothesis is true
109
low p-value
reject the null hypothesis
110
typical threshold for p-value
P
111
p-value limitation
a statistically significant result does not necessarily imply clinical importance even large sample studies with tiny differences that are statistically significant may lack clinical significance or practical importance
112
p-value schematic
113
confidence intervals
range of values expected to contain the true parameter help assess clinical significance beyond statistical hypotheses
114
confidence interval influenced by the size and variability of the sample
wider intervals -> less precision, less confidence in observed effect size narrower intervals -> greater confidence in point estimate
115
95% CI
95% probability of containing the true value
116
type 1 error
incorrectly concluding a difference/effect is real, when it is not
117
type 1 error equivalent to
false positive result/false rejection of null hypothesis when it was true
118
type 1 error probability
probability determined by alpha level (typically 0.05 or 5%)
119
type 2 error
failing to detect a true effect or difference -> false negative finding concluding there is no effect when one actually exists
120
type 2 error determined by
the power of the study, which depends on sample size
121
type 3 error
asking the wrong research question entirely no meaningful answer, regardless of the statistical findings may represent a flawed study design or mismatched hypotheses
122
regression analysis
models the relationship between multiple variables and a dependent variable
123
regression analysis determines
determines how strongly/weakly one variable predicts or influences another
124
regression analysis quantifies
quantifies the effect size for each predictor
125
regression analysis example
an analysis that would test which factors are significantly associated with a higher or lower likelihood of developing Guillain-Barre syndrome, after controlling for other variables
126
chi-square test
compares observed and expected frequencies between categorical variables
127
chi-square test determines
determines the likelihood of differences from chance alone
128
T-test
compares means between two groups can be paired or independent samples
129
T-test determines
determines statistical probability that group differences are significant
130
ANOVA
compares means across more than two groups
131
ANOVA determines
determines the likelihood that all group means are equal
132
two-way ANOVA
determines the effects of two independent categorical variables and any interaction between those variables