Biostatistics and Research Design Flashcards

(105 cards)

1
Q

what test to run?

comparing blood pressure to patients before and after taking their meds

A

paired t-test
DV: BP (continuous)
IV: before/after meals (2 paired observations)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what test to run?

comparing blood pressure values with patients on vs off their medication

A

2-sample t-test
DV: BP (continuous)
IV: on/off medication (2 samples, binary)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what test to run?

comparing length of hospital stay to age

A

correlation
DV: length of stay (continuous)
IV: age (continuous)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what test to run?

comparing length of hospital stay to age and also mobility

A

linear regression
DV: length of stay (continuous)
IV: age (continuous) and mobility (confounding factor)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what test to run?

comparing mortality of hospital patients by age and also mobility

A

logistic regression
DV: mortality (binary)
IV: age (continuous) and mobility (confounding variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what test to run?

comparing mortality of patients with high vs low blood glucose

A

chi-squared (2x2 table)
DV: mortality (binary)
IV: high vs low glucose (binary)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what test to run?

comparing patient Hgb A1c levels to one of 3 types of diet

A

ANOVA
DV: Hgb A1c (continuous)
IV: type of diet (more than 2 samples)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what test to run?
DV: binary
IV: binary

A

chi-squared (2x2 table)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what test to run?
DV: binary
IV: continuous or categorical/binary + confounding variables

A

logistic regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what test to run?
DV: continuous
IV: 2 paired observations

A

paired t-test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What test to run?
DV: continuous
IV: 2 samples (binary)

A

2-sample t-test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what test to run?
DV: continuous
IV: continuous

A

correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what test to run?
DV: continuous
IV: continuous or categorical/binary + confounding variables

A

linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what test to run?
DV: continuous
IV: more than 2 samples

A

ANOVA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

type 1 error

A

false positive - reject the null hypothesis (detect a difference) when the null hypothesis is true

alpha: probability of false positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

type 2 error

A

false negative - fail to reject null hypothesis when there is truly a difference

beta: probability of false negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

how to calculate power?

A

1 - beta
aka
1 - the probability of a false negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is the probability of a true negative in relation to alpha,
and the probability of a true positive in relation to beta?

A

probability of true negative = 1 - alpha (probability of false positive)

probability of true positive = 1 - beta (probability of false negative)

*false positive = type 1 error, false negative = type 2 error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what does the p value represent?

A

probability of finding an outcome more extreme than your findings (closer to being an outlier, outer edges of bell curve), assuming null hypothesis is true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

study is “statistically significantly” if p value is below a certain level of ____

A

alpha - probability of false positive

alpha cutoff is usually 0.05

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is meant by power? what is statistically significant power? when is it especially important?

A

the probability of rejecting the null hypothesis by obtaining a p value less than 0.05% (alpha)

power should be at least 0.80 (80% chance of rejecting null hypothesis is difference truly exists)

power is very important only if experiment fails to reject null hypothesis (assuming a meaningful difference actually exists) … but an experience that with low power can still be statistically significant if p value is < 0.05

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

give an example of when negative studies might be used (looking for evidence to support null hypothesis)

A

testing side effects of a new drug compared to a conventionally used drug - hoping the new drug will not cause effects more adverse than current drug

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

contrast nominal categorical data to ordinal categorical data

A

nominal data: categories with no hierarchy (ex - ethnicity)

ordinal data: data does not have numeric assignment, but rather falls into specific bucket with some rank or order (ex - level of schooling)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

define:
Bernoulli distribution
log-normal distribution
binomial distribution

A

Bernoulli distribution: proportions expected in binary outcome (think pie chart with 2 options), ex: infected v uninflected

log-normal distribution: continuous data that cannot be negative (think right skewed bell curve), ex: income v age

binomial distribution: counting up multiple binary (Bernoulli) outcomes in discrete observations (think bell curve made of bars), ex: number of positive tests per batch of 100 tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
if mean > median, what direction is the data skewed?
RIGHT (positive)
26
if mean < median, what direction is the data skewed?
LEFT (negative - tail towards left)
27
what is the golden rule of standard deviation?
68 - 95 - 99.7 first SD contains 68% of data second SD contains 95% of data third SD contains 99.7% of data
28
what does a Z distribution show? what is the z cutoff for 95% CI?
bell curve of standard deviation values - each point on the x-axis is number of SD away from mean z = 1.96 for cutoff of 95% CI
29
how does sample size and variance affect CI?
when n (sample size) increases, CI gets narrower (more confident) when variance (SD) increases, CI gets wider (less confident)
30
when comparing 2 binary variables (2x2 contingency table), when should you use chi-squared statistic vs fisher’s exact test?
chi-squared statistic - assumes n is large fisher’s exact test - any cell <10
31
what is the ordinary lease squares (OLS) in linear regression?
computed line (slope) with the least amount of error when comparing continuous data IV with continuous data DV
32
contrast absolute risk difference with risk ratio (relative risk, rate ratio, RR)
absolute risk difference: probably 1 - probability 2 risk ratio: probability 1 / probability 2
33
what does a larger chi-squared (x^2) signify?
larger x^2 signifies a larger difference between expected and measured outcomes/values - signifies more error with a no-difference assumption each x^2 value is associated with p value
34
describe the Hawthorne Effect
people change their behavior in a study, form of subject bias minimize via subject blinding (placebo)
35
give 4 time points during a RCT that blinding can occur
1. treatment allocation (randomization) 2. patient blinding (placebo) 3. clinician blinding (blind to what treatment being provided) 4. outcome assessor blinding (investigator assessing outcome blinded to which group)
36
describe intention to treat vs explanatory/ “as treated” protocol and why this difference is important
intention to treat: analysis of outcome according to treatment assigned, regardless of drop out or lack of compliance —> preserves randomization “as treated”: analysis of outcome according to the treatment actually received, irrespective of what group subjects were originally assigned to
37
describe the difference between “as treated” protocol and “per-protocol” in RCT
“as treated” protocol would not remove non-compliant subjects “per-protocol” would remove non-compliant subjects from the results in both cases, analysis of outcome is according to what treatment was actually received, regardless of original group assignment
38
``` describe these variations on RCT: parallel stratified crossover cluster non-inferiority ```
parallel: classic, intervention group v control stratified: randomization into stratified groups if there is a variable that will likely have a large influence on outcome (such as stage of cancer, wealth, etc) crossover: subjects undergo intervention, then control after wash-out period (can serve as their own controls - reduces variability, increases power) - doesn’t work when the order matters or intervention has lasting effect cluster: randomize care systems rather than individual patients (ex - testing 2 different disinfectants in a number of different emergency rooms) - does not work if individual consent is required non-inferiority: basically testing if new treatment is not excessively less effective than current treatment, if new treatment is more favorable in other ways (cost, convenience, availability, etc)
39
what is therapeutic equipoise
refers to genuine uncertainty about which treatment is better in RCT ethical concern in RCT
40
describe secondary, composite, and surrogate outcomes
secondary: outcomes of interest other than primary, ideally also designated a priori, require more stringent p value to be considered significant (otherwise you can find a difference anywhere if you look hard enough) composite: combining multiple outcomes into one (ex: a OR b OR c - achieving any of these would be considered a primary outcome) surrogate: some number outcome, often a lab measurement, that doesn’t necessarily speak to patient’s experience or quality of life
41
describe internal vs external validity
internal validity: how much results reflect reality for patients in study external validity = generalizability
42
describe the features of cohort studies, outcomes, and key strength
observational, from exposure to disease exposures can be beneficial or harmful can be prospective (“will x exposure causes disease?”) or retrospective (“did x exposure cause disease”?) outcomes: absolute risk (aka incidence), absolute risk difference (subtraction), and relative risk (risk ratio) best way to look at prognosis and incidence
43
“big data” studies are a subtype of what kind of study? | what are the strengths and weaknesses?
subtype of retrospective cohort study strength: can use very large populations, most practical method for looking for rare side effects/outcomes weakness: data often gathered for intent other than medicine (billing, etc), important data may not have been collected on outcome date (weakness of all retrospective cohort studies)
44
describe features and outcomes of case-control studies
working backwards to explore possible causes of developed disease (“what exposure may have caused x disease?”) - can be hypothesis generating cases: newly-incident cases of outcome control: persons without outcome who “had the opportunity” to be exposed outcomes: odds ratio
45
contrast retrospective cohort study to case control study
retrospective cohort - “did x exposure cause disease?” outcome: absolute risk (incidence), absolute risk difference, relative risk ratio case control - “what exposure may have caused x disease?” outcome: odds ratio
46
describe case series study and its strength
case-control without the control - may be no way to know who had the opportunity to be exposed best for describing emergent diseases, rare outcomes, odd exposures
47
describe features of cross-sectional study and its strength
snapshot of individuals in particular population at particular time best way to determine prevalence
48
describe features of ecological study
aka correlational study snapshot of population and/or environment mostly hypothesis generating
49
what type of study is best for examining prognosis and incidence?
cohort study - follow group from exposure to disease
50
what type of study is most practical for looking for rare side effects?
“Big Data” studies
51
what type of study is best for describing emergent diseases, rare outcomes, and odd exposures
case series - case-control without the control (unable to determine)
52
what type of study is best for determining prevalence?
cross-sectional
53
lead-time bias (observational studies)
testing increases perceived survival time without affecting the course of the disease
54
ascertainment bias
aka measurement bias the way data is collected is more likely to include some members of a population than others, such as more intense surveillance or screening among exposed individuals than unexposed individuals
55
what type of bias is present here? you are conducting cross-sectional study of last year’s urology clinic patients to determine the ability of elevated blood PSA to pick out patients with biopsy-proven prostate cancer
ascertainment/ measurement bias - people with high PSA are more likely to get the biopsy
56
what type of bias is present here? you are conducting case-control study of recently-diagnosed liver cancer patients compared to community controls, surveying them regarding if they ever used herbal supplements
recall bias - patients with liver cancer are probably thinking about what may have led to developing cancer and might remember their past supplement use more than average control
57
describe modeling studies and the benefit of using modeling studies
incorporates all pertinent info of medical decision into computer model that simulates multiple patients outcomes: overall mortality, cause-specific morbidity or mortality, utility (ex - QALYs or DALYs), cost helpful for decision analysis of complicated decisions or cost-effective/cost-benefit analysis (cost effective - compares cost to outcomes, cost benefit - represents all outcomes in money terms)
58
incidence vs prevalence (in ratio terms)
incidence: probability that unaffected people will develop disease during specific time period —> new cases per time / unaffected at-risk people at beginning of time period prevalence: proportion of people in population that have a disease at given time —> affected persons / all persons
59
how do chronic illness, incidence, and migration of people affect prevalence?
chronic illness - increases prevalence, even if incidence is small incidence increase/decrease causes the same in prevalence (in acute illnesses, incidence and prevalence tend to track each other - not the case for chronic illness) in-migration can increase prevalence, out-migration can decrease it
60
what is the relative risk difference for a RR of 1.62? what about for RR 5?
relative risk difference = | RR - 1 | RR 1.62 = 62% RR increase RR 5 = 400% RR increase
61
why is absolute risk more important for a patient than relative risk?
absolute risks makes risks and benefits sound smaller relative risks makes risk or benefits sound bigger
62
convert probability of 1%, 25%, and 80% into an odds ratio
1% —> 1:99 25% = 1/4 —> 1:3 80% = 4/5 —> 4:1
63
contrast NNT to absolute risk decrease
NNT = # people treated / one additional good outcome absolute risk decrease = additional good outcomes / # of people treated NNT is the INVERSE of absolute risk decrease (same goes for NNH and absolute risk increase)
64
what are Bradford Hill’s criteria for causation
1. strength 2. dose-response 3. specificity 4. alternative explanations (have been considered) 5. temporality (cause —> effect) *** 6. reversibility 7. consistency 8. plausibility/ coherence 9. analogy
65
``` which of the following are descriptive (hypotheses generating) or analytic (hypothesis testing) studies? case reports case-control studies retrospective cohort studies case series prospective cohort studies cross-sectional studies ecological studies RCT meta-analysis ```
descriptive/ hypothesis generating: 1. case reports 2. case series 3. cross-sectional 4. ecological analytic/ hypothesis testing: 1. case control 2. retrospective and prospective cohort studies 3. RCt 3. meta-analysis
66
what is ecologic fallacy
ascribing relationships observed for groups (in ecological studies) to individuals members
67
which of these variables is influenced by population or patient: specificity pre-test probability sensitivity
pre-test probability: estimated probability of disease before you do the test nearly synonymous with prevalence specificity and sensitivity should not vary with population
68
NNT/NNH is inverse of what
inverse of ARR (absolute risk ratio)
69
given an absolute risk ratio (ARR) of 25%, what is the NNT?
NNT is inverse of ARR | 1/0.25 = 4
70
describe predictive value and how it differs from sensitivity
predictive value: how likely is it that this test result is correct? PPV: among everyone with positive test, what percent have disease? NPV: among everyone with negative test, what percent do not have disease? sensitivity: # true positives / all with disease PPV: # true positives / all positives
71
pre-test probability (prevalence) alters predictive value of test -T or F
TRUE: prevalence of illness among certain populations differ, if for example, one population is screened more often (ex: blood donors and HIV)
72
how do you express likelihood ratio for a positive and negative test?
LR+: likelihood of + test in diseased / likelihood of + test in non-diseased LR-: likelihood of - test in diseased / likelihood of - test in non-diseased
73
contrast the ratios of LR+, sensitivity, and PPV
LR+ = likelihood + test in disease/ likelihood + test in non-diseased sensitivity = # true positives / everyone with disease PPV = # true positives / # all positives
74
how does LR+ relate to sensitivity and specificity?
LR+ = likelihood of + test in disease / likelihood of + test in non-diseased this is the same as saying LR+ = sensitivity / (1 - specificity) aka LR+ = ratio of true positives out of everyone (sensitivity) ————————————— 1 - [ ratio of true negatives out of everyone (specificity) ]
75
how does LR- relate to sensitivity and specificity
LR- = likelihood of - test in disease / likelihood of - test in non-diseased aka LR- = 1 - sensitivity / specificity aka LR- = 1 - [ratio of true positives out of everyone] ———————————— ratio of true negatives out of everyone
76
what does a LR of 1 represent?
likelihood ratio of 1 means that a test result will be the same whether or not a patient has the disease - so the test is useless large LR+ and low LR- values are associated with big changes between pre-test and post-test probabilities
77
given a test with a 85% sensitivity and 90% specificity, what is the LR+?
LR+ = sensitivity / (1 - specificity) ``` LR+ = 0.85 / (1 - 0.9) LR+ = 0.85 / 0.1 LR+ = 8.5 ```
78
what are receiver operating characteristic (ROC) curves used for?
portray the trade-off of various test cut-offs and sensitivity/specificity diagonal line through the middle representes a slope of 0.5, where the test becomes useless
79
name 3 “gold” reference standards that can be used to study characteristics of a new diagnostic test (need something to compare to)
another test, taken near-simulteaneously clinical judgement, simultaneously clinical outcomes over time, needs follow up
80
define these types of outcome presentation in prognosis: case-fatality rate x-year survival median survival
case-fatality rate = #deaths/#cases (usually used for disease that kill quickly) x-year survival: show as a curve of survival probability spanning years, usually for more long-lasting disease median survival: time at which half of the cohort has died
81
describe censoring in prognosis studies
to compensate for competing mortality (unrelated to illness being studied), censored patients are removed from the denominator of patients at risk for an event - lowers the number of patients but does not affect survival rate
82
describe: hazard ratios stratified survival curves clinical prediction rules for prognosis
hazard ratios: similar to relative risk, evaluate one prognosis factor at at time (what’s the slope of survival over time comparing one group to another, such as patients with varying levels of immunoglobulins) - relative risk cares about outcome at the end, hazard risk cares about events along the way to the end outcome stratified survival curves: survival curves comparing subjects grouped by criteria clinical prediction rules for prognosis: patients are “scored” by the amount of criteria they meet to put them in a high/medium/low risk group (combining prognostic variables is more predictive than one alone)
83
match: validity and reliability precision and accuracy
``` accuracy = validity precision = reliability ```
84
Gaussian distribution
aka normal distribution curve
85
why does running more tests on a patient decrease the likelihood that they will all be in the 95% CI range?
the chance of one test being in the 95% CI range is 0.95 so the probability of more than one test being in that range would be each of their probabilities multiplied together each test has a 0.95 probability, so total probability they are all in that range would be 0.95^n where n is number of tests hence a smaller probability in other words, with too many tests there is a higher chance you’ll “find” something wrong
86
what are the “gold standard” comparisons for each below? x-ray all imaging, blood tests everything
x-ray —> compare to CT/MRI all imaging, blood tests —> autopsy/ surgical study (biopsy) everything —> autopsy (“ultimate” gold standard)
87
carcinoma in situ
cancer in its “usual” place, not breaking through membrane barriers surrounding primary tumor
88
cancer grade v stage
``` grade = severity under a microscope stage = severity of physical extent (metastasis) ```
89
cytology vs pathology specimens, and 3 subtypes of each
cytology: cells not in environment of origin - exfoliative (cells fall off), brush (cells brushed off), needle aspiration (cells pulled out with syringe) pathology: cells with some surrounding environment - excisional biopsy (small volume), surgical resection (large volume), autopsy
90
difference between medical autopsy and medical examiner autopsy?
medical: hospital or outpatient death, requiring next of kin consent medical examiner: unexpected/ unusual death, possible legal issues involved (ex- homicide) or public health concern (outbreak), no consent needed
91
what is the “perfect” study for: intervention/ exposure questions prognosis questions diagnostic test characteristics
intervention/ exposure —> RCT prognosis —> cohort diagnostic test characteristics —> cross-sectional if using simultaneous reference, prospective cohort if using clinical follow-up reference
92
for studies involving unknowns, important data not measured, or systematic differences in people (aka selection bias), what kind of study design is best and why?
RCT! because you can control for unknowns, confounding factors, bias, etc
93
when is it important to know the odds ratio?
when you don’t have/ know the absolute or relative risk difference
94
why are RCT studies often low-powered?
non-adherence to treatment and cross-over
95
you’re an epidemiologist and you notice that in the past 3 months there has been a pattern of patients being treated in central America for a very strange and unique set of symptoms. What type of study should you employ to investigate this further?
case series (observational) hypothesis generating, like case-control but without controls because it is difficult to determine who the controls would be
96
you’re Dr. Fauci and you’ve been tasked with measuring how many people in the US currently have covid (it’s March 21, 2020 - good luck). what kind of study should you employ?
cross-sectional: best for establishing prevalence (how many people out of everyone currently have disease)
97
can you look for trends over time with a cross-sectional study?
actually, yes, but not with the same group of people. comparing snapshots of different groups of people in different groups of time
98
you’re reading a study showing the prevalence of XYZ disease over time. the data shows in 1995, 0.1% of Americans had XYZ, in 2002, 0.12%, and in 2019, 0.14%. What kind of study is this most likely to be?
cross-sectional, comparing snapshots of populations over time to establish trends in prevalence
99
what type of study design is best for examining uncommon outcomes, risky or rare exposures, or outcomes and exposures not likely captured in routine care?
case-control: from disease to exposure (“what exposure might have caused x disease?”)
100
what type of study is best for examining diagnostic test characteristics (with a long reference standard), prognosis, and incidence?
prospective cohort study
101
you’re conducting a study that involves chart reviews of 500 patients who were known to be significantly exposed to radiation. you’re investigating the health effects of their radiation. What kind of study is this?
retrospective cohort study: from exposure to disease (“what disease did x exposure cause?”) best for diagnosis or prognosis questions based on information usually captured in routine care, risky or rare exposures, long time-frame issues (such as radiation exposure)
102
case control studies and retrospective cohort studies (“chart reviews”) are similarly advantageous for studying risky or rare exposures, but differ in how they are advantageous in another way - what is different about their strengths?
case control studies (from disease to exposure - searching for an exposure) are for examining outcomes or exposures not likely captured in routine care (you’re looking for an unstudied exposure) retrospective cohort studies (from exposure to disease - searching for a disease) are for examining diagnosis or prognosis questions based on information usually captured in routine care (you’ll need this information to study the development of the disease)
103
what type of observational study is best for evaluating a test with simultaneous reference?
cross-sectional
104
your clinical research team wants to evaluate a new type of blood test for liver failure. every patient that is seen for liver failure is given the new test, and simultaneously the conventional blood test. what kind of test is this?
cross-sectional best [observational study] for evaluating a test with a simultaneous reference
105
what is the best study design for evaluating the efficacy of an intervention
RCT!