Stats Flashcards
Name 3 types of descriptive studies.
case report/series
ecological study-examines rates of disease on a population level
cross-sectional study-looks at exposure and disease at the same time- like a survey
what is equipoise?
in an RCT, you truly don’t know if your intervention is better than the standard of care, but you feel confident that withholding the intervention will not bring harm
What is an intention-to-treat analysis?
include all participants were assigned to a particular group in the analysis.
what is an efficacy or per protocol analysis?
only include participants who were compliant
What is an as treated analysis?
analysis participant data based on the treatment they actually received.
what are the pros and cons of retrospective vs prospective cohort study?
cost, time, and quality of the data
What are some strengths of cohort studies?
- efficient for rare exposures
- clear temporal sequence between exposure and disease
- good information on exposures, confounders
- study the effect of exposure on multiple outcomes
What are some limitations of cohort studies?
-not good for rare disease or those with long latency
-not good for exposures that are expensive to determine
-large populations with long f/u time
-loss to follow up
-expensive and time consuming
how do you design a case-control study?
total population then identify patients who have or do not have the disease that you want to study, then compare the odds of having the disease in the exposed and unexposed groups.
individuals are chosen based on their outcome status and then exposure status is assessed
what are the benefits of case-control over a cohort study?
smaller, more efficient, with shorter follow up compared to cohort studies
Case-controls are great for when…
-exposure data is expensive or difficult to obtain
-long latent period
-disease is rare
-population is difficult to follow
-little is known about the disease
-want to evaluate many exposures
What is the definition of a control in a case control study?
a sample from the source population that produces the cases
so NOT just everyone who doesn’t have the disease
What is the purpose of the control group in a case-control study?
estimate exposure distribution in the source population that gave rise to cases.
What are some limitations of case-control studies?
-often limited to studying a single outcome
-inefficient for rare exposures
-more opportunity for bias
-temporal sequence between exposure and outcome
-cannot calculate absolute measure of association
Define cumulative incidence.
Proportion of population at risk that develops the disease or outcome over a specified time period
example: number of new case during the time period/the total population at risk at the start of the time period
risk of cervical cancer in 5 years is number of new cases of cervical cancer in 5 years divided by the number of people with cervix, but notcancer in the population at the time of the start of those 5 years.
define prevalence.
number of ppl with the disease divided by the entire population
ex: number of ppl with cervical cancer divided by the total population (not just people with a cervix)
Define incident rate.
number of new cases of disease during the time period divided by the total person-time observation in the population at risk
example- one perosn is followed for 3 years and the other is followed for 4 years, that is 7 person years
how do you calculate an absolute risk difference?
risk of the disease is the cumulative incidence in the exposed minus the cumulative incidence in the unexposed
how do you calculate an absolute rate difference?
Rate difference is the incidence rate in the exposed group minus the incidence rate in the unexposed group
how do you calculate risk ratio AKA relative risk?
cumulative incidence in the exposed group divided by the cumulative incidence in the unexposed group
how to calculate an Odds ratio?
numerator: the number of cases in the exposed group multiplied by the number of controls in the UNexposed group
denominator: the number of cases in the UNexposed group multiplied by the number of controls in the exposed group
define the odds of an event
Probability that an event will occur divided by the probability that it will occur
Name two categories of error in epidemiologic research
random error (p value)
systematic error (bias, confounding)
Name the three necessary criteria for a variable to be a confounder.
- must be an independent predictor of the outcome, like a risk factor for the disease
- must be associated with exposure
- cannot be caused by the exposure
name three ways to reduce confounders during study DESGIN
- randomization
- restrict confounders through our exclusion criteria
- match confounders in study groups
Name 3 ways to address confounders during study ANALYSIS.
-Standardization-“among women over 65…”
-stratification and pool - split the data into groups and then math to pool them into one risk
-modeling- multivariable statistical models
What is internal validity?
is the study free of bias
What is external validity?
generalizability
name two ways to analyze data from an RCT?
intent to treat
per protocol
describe categorical data
describe ordinal data
describe nominal data
data fits into specific categories
ordinal data- categories that have a specific order to them, (mild, mod, severe)
nominal data- there is no rank to the categories
where is the tail on a positive skewed data?
where is the tail on a negative skewed data?
positive- tail is to the right
negative - tail is on the left
define mean, median, mode
mean-average
median-value in the middle of the data set
mode-value that appears most often in a data set
how do you present parametric vs non-parametric data?
parametric data is normally distributed so you can used the mean and the standard deviation
non-parametric data is not-normal or skewed
median and interquartile range
Which test should you use?
categorical data
independent data
small sample size
fischers exact test
Which test should you use?
categorical data
independent data
Large sample size
chi-square
Which test should you use?
categorical data
paired data- like pre and post intervention data
Large sample size
McNemar’s
Which test should you use?
categorical data
paired data- like pre and post intervention data
small sample size
McNemar’s exact test
Which test should you use?
continuous data
parametric
3 or more groups
ANOVA
Which test should you use?
continuous data
independent
parametric
2 groups comparing the means to see if there is a difference
t- test
Which test should you use?
continuous data
independent
parametric
2 groups assessing for a correlation between the two groups
pearson correlation
Which test should you use?
continuous data
independent
non-parametric
2 groups comparing the medians
Mann-whitney-U test AKA wilcoxon rank sum test
Which test should you use?
continuous data
independent
non-parametric
2 groups comparing the medians to assess for a correlation
spearman correlation
Which test should you use?
continuous data
independent
non-parametric
3+ groups comparing the medians
kruskal wallis
Which test should you use?
continuous data
paired
parametric
3+ groups comparing the means
repeated measures of ANOVA
Which test should you use?
continuous data
paired
parametric
2 groups comparing the means
paired t-test
Which test should you use?
continuous data
paired
non-parametric
2 groups comparing the medians
wilcoxon signed rank
What is a kaplan meier curve?
a graph that shows the probability of survival over time.
“ex: how long until the prolapse recurs? or how long until the patient stops the pessary”
What test is used to compare kaplan meier curves?
log-rank test
Let’s talk regressions
When do you use a linear regression and how do you report the result?
- continuous outcome
- mean difference
Let’s talk regressions
When do you use a logistic regression and how do you report the result?
- independent categorical outcomes
- Odds ratio
Let’s talk regressions
When do you use a log-binomial regression and how do you report the result?
- independent categorical outcomes
- risk ratio
Let’s talk regressions
When do you use a poisson regression and how do you report the result?
- count data
- rate ratio
Let’s talk regressions
When do you use a cox regression and how do you report the result?
- time to event data
- hazard ratio
What is a type 1 error?
alpha
probability of rejecting a hypothesis that is true
What is a type 2 error?
beta
probability of failing to reject a false hypothesis
What type of error is a false positive?
alpha
rejecting the null when it is true.
saying that there is a difference when there is none
What type of error is a false negative?
beta
failing to reject the null hypothesis when it is false
saying that there is no difference when there is one
Explain the 95% confidence interval
if a study were to be repeated 100 times, a 95% CI would contain the true value in 95 of those studies
What is statistical power?
1-beta
OR 1 minus the probability of committing a type 2 error
OR 1 minus the probability of failing to reject the null when the null is false.
typically 0.80, beta is 0.20
What variables do you need to calculate the sample size?
power
alpha
difference you want to see
buffer for 20% loss of participants, desired n/0.80
Define sensitivity.
probability of testing positive, given that they have the disease
true positives/total population with disease
define specificity
probability of testing negative, that they do no have the disease
true negatives/ total population without the disease
define PPV
probability of having the disease if they test positive
true positives/all positive test results
affected by prevalence of disease
define NPV
probability of NOT having the disease if they test negative
true negatives/ all negative test results
affected by prevalence of disease
If the prevalence of the a disease increases, what happens to the following values?
Sensitivity
specificity
PPV
NPV
sen and spec are the same
PPV increase
NPV decreases
Define positive likelihood ratio.
likelihood of positive result in someone with the disease vs someone who does not have the disease
sensitivity/1-specificity
combo of sensitivity and specificity
higher the positive likelihood ratio, the better.
Define negative likelihood ratio.
likelihood of negative result in someone with the disease vs someone who does not have the disease
specificity/1-sensitivity
the closer the result is to 0, the better.
What would be considered a high positive likelihood ratio?
> 10
What would be considered a low negative likelihood ratio?
<0.1