Stats Flashcards
When would you use unpaired t test vs paired t test
unpaired if independent, paired if dependent
for correlation, what would you use for parametric (normally distributed) vs non parametric
parametric (normally distributed): Pearson’s coefficient
non-parametric: Spearman’s coefficient
Q: What is selection bias?
An error in assigning individuals to groups, leading to differences that may influence the outcome.
Sampling bias
Subjects are not representative of the population (e.g., volunteer bias).
Non-responder bias
Non-responders may differ significantly (e.g., poorer diets in non-responders to dietary surveys).
Prevalence/incidence bias (Neyman bias)
Missed cases (e.g., early fatalities or silent cases) are omitted.
Admission bias (Berkson’s bias)
Systematic differences in hospital-based studies due to exposure and disease occurrence increasing admission likelihood.
Healthy worker effect
Healthier individuals are more likely to be employed, skewing results.
Q: What is recall bias, and when is it a problem?
Recall bias is a difference in the accuracy of memories retrieved by study participants.
A patient with a disorder may recall exposure more thoroughly than controls.
Common in case-control studies.
Q: What is publication bias, and why is it important?
Failure to publish valid studies, often due to negative or uninteresting results.
Significant in meta-analyses, where excluding negative results distorts findings.
Q: What is work-up bias (verification bias)?
Occurs when comparing new diagnostic tests with gold standard tests.
Gold standard tests may be avoided if the new test is positive, especially if invasive (e.g., tissue biopsy).
Can distort sensitivity and specificity, requiring adjustment if unavoidable
Q: What is expectation bias (Pygmalion effect)?
Observers subconsciously measure or report data to favor the expected outcome.
A problem in non-blinded trials.
Q: What is the Hawthorne effect?
Describes a group changing its behavior due to the knowledge that it is being studied.
Q: What is late-look bias?
Occurs when data is gathered at an inappropriate time.
Example: Studying a fatal disease years later, after some patients have died
Q: What is procedure bias?
Occurs when subjects in different groups receive different treatments.
Q: What is lead-time bias?
Happens when a new test diagnoses a disease earlier than an existing test.
It appears to improve survival time but does not change the disease outcome.
Q: What is the goal of a Phase 0 clinical trial?
Exploratory studies to assess how a drug behaves in the human body.
Focus on pharmacokinetics and pharmacodynamics.
Involves a very small number of participants.
Determines feasibility for further phases.
Q: What is the goal of a Phase I clinical trial?
Safety assessment.
Determines side effects before larger studies.
Conducted on healthy volunteers.
Q: What is the goal of a Phase II clinical trial?
Assess efficacy in patients affected by a specific disease.
Subdivisions:
Phase IIa: Assesses optimal dosing.
Phase IIb: Assesses efficacy.
Q: What is the goal of a Phase III clinical trial?
Assess effectiveness of a treatment.
Involves 100-1000s of participants.
Often conducted as randomized controlled trials.
Compares new treatments with established treatments.
Q: What is the goal of a Phase IV clinical trial?
Postmarketing surveillance.
Monitors long-term effectiveness and side effects.
Q: What does a 95% confidence interval mean?
It means the interval should contain the true effect of an intervention 95% of the time.
Q: How is the standard error of the mean (SEM) defined?
A measure of the spread expected for the mean of the observations.
Indicates how “accurate” the sample mean is compared to the true population mean.
Q: What is the formula for SEM?
SEM = SD / √n
SD = standard deviation.
n = sample size.
As the sample size increases, the SEM becomes smaller.
Q: What is confounding in statistics?
Confounding refers to a variable that correlates with other variables within a study, leading to spurious results.
Q: Can you provide an example of confounding in a study?
In a case-control study looking at low-dose aspirin and colorectal cancer prevention:
If the case and control groups are not matched for age, age becomes a confounding factor.
Older people are more likely to take aspirin and develop cancer, which skews the results.
Q: What is the difference between correlation and regression?
Correlation: Tests for an association between variables (e.g., salary and IQ).
Regression: Predicts values of a dependent variable from an independent variable, but only used after correlation has been shown.
Q: What does the correlation coefficient (r) represent?
Indicates how closely points lie to a line drawn through plotted data.
Ranges from -1 to +1:
r = 1: Strong positive correlation (e.g., systolic blood pressure always increases with age).
r = 0: No correlation (e.g., systolic blood pressure is unrelated to age).
r = -1: Strong negative correlation (e.g., systolic blood pressure always decreases with age).
Q: How is correlation measured for parametric and non-parametric data?
Parametric variables: Pearson’s correlation coefficient (r).
Non-parametric variables: Spearman’s correlation coefficient (ρ or rs).
Q: What type of regression is used for different variables?
Logistic regression: For dichotomous variables.
Linear regression: For two continuous variables.
Multiple regression: For more than two continuous variables.
Q: What is the primary purpose of a funnel plot?
To demonstrate the existence of publication bias in meta-analyses.
Q: What does a symmetrical, inverted funnel shape in a funnel plot indicate?
Publication bias is unlikely.
Q: What does an asymmetrical funnel in a funnel plot suggest?
A relationship between treatment effect and study size.
May indicate publication bias or systematic differences between smaller and larger studies (‘small study effects’).
Q: What is a forest plot (blobbogram)?
A graphical display of results from multiple studies, commonly used in meta-analyses.
Q: What is the purpose of the large vertical line on a forest plot?
Represents the line of no effect.
Confidence intervals crossing this line indicate results that could be insignificant.
Q: What is a Kaplan-Meier survival plot?
A plot of the Kaplan-Meier estimate of the survival function, showing decreasing survival with time.
Q: What is the hazard ratio (HR)?
A measure similar to relative risk, but used when risk is not constant over time.
Typically used when analyzing survival over time.
Q: What is incidence?
The number of new cases of a condition per population in a given time period.
Example: If 40 new cases of condition X occur over the past 12 months per 1,000 people, the annual incidence is 0.04 or 4%.
Q: What is prevalence?
The total number of cases of a condition per population at a specific point in time.
Q: What are the two types of prevalence?
Point prevalence: Number of cases in a defined population / number of people in that population at the same time.
Period prevalence: Number of identified cases during a specified period of time / total number of people in that population.
Q: What is the relationship between prevalence and incidence?
Prevalence = Incidence * Duration of condition
In chronic diseases, prevalence is much greater than incidence.
In acute diseases, prevalence and incidence are similar (e.g., common cold).
Q: What is intention to treat analysis?
A method of analysis for randomized controlled trials where all patients randomly assigned to a treatment are analyzed together, regardless of whether they completed or received the treatment.
Q: What is the number needed to treat (NNT)?
A measure indicating how many patients need an intervention to reduce the expected number of outcomes by one.
Q: How is the number needed to treat (NNT) calculated?
NNT = 1 / (Absolute risk reduction)
The result is rounded to the next highest whole number.
Q: How is the experimental event rate (EER) calculated?
EER = (Number who had a particular outcome with the intervention) / (Total number who had the intervention)
Q: How is the control event rate (CER) calculated?
CER = (Number who had a particular outcome with the control) / (Total number who had the control)
Q: How is the absolute risk reduction (ARR) calculated?
ARR = CER - EER (for undesirable outcomes)
ARR = EER - CER (for desirable outcomes)
Note: For desirable outcomes, ARR is sometimes termed absolute benefit increase.
Q: What are odds?
The ratio of the number of people who incur a particular outcome to the number of people who do not incur the outcome.
Q: What is the odds ratio?
The ratio of the odds of a particular outcome with experimental treatment to the odds of that outcome with control.
Q: How do odds compare to probability?
Probability is the fraction of times you’d expect to see an event in many trials (between 0 and 1).
Odds is the ratio of favorable outcomes to unfavorable ones.
Example: Probability of rolling a six with a die = 1/6 (0.166), odds of rolling a six = 1/5 (0.2).
Q: When are odds ratios typically used?
Odds ratios are the usual reported measure in case-control studies.
It approximates relative risk when the outcome of interest is rare.
Q: What is the power of a study?
The probability of correctly rejecting the null hypothesis when it is false (i.e. not making a Type II error).
Q: How is power calculated?
Power = 1 - beta, where beta is the probability of a Type II error.
A typical minimum acceptable power level is 0.80 (80%).
Q: What is publication bias?
The tendency for only studies with positive results to be published, leading to bias in the overall results.
Q: What is relative risk (RR)?
The ratio of risk in the experimental group (experimental event rate, EER) to risk in the control group (control event rate, CER).
Q: How is relative risk (RR) calculated?
Relative risk (RR) = EER / CER.
EER = rate at which events occur in the experimental group
CER = rate at which events occur in the control group
Q: What is a Type I error?
A Type I error occurs when the null hypothesis is rejected when it is actually true (false positive). It is determined by the preset significance level (alpha).
Q: What is a Type II error?
A Type II error occurs when the null hypothesis is not rejected when it is false (false negative). The probability of making a Type II error is called beta, and it is influenced by sample size and the significance level (alpha).
Q: What is the power of a study?
The power of a study is the probability of correctly rejecting the null hypothesis when it is false (i.e., detecting a statistically significant difference). It is calculated as:
Power = 1 - beta (probability of Type II error)
Q: What is the Student’s t-test used for?
The Student’s t-test is used for comparing the means of two groups. It can be either paired (for the same group measured twice) or unpaired (for two different groups).
Q: What is the Pearson’s product-moment coefficient used for?
The Pearson’s product-moment coefficient is used to measure the correlation between two continuous variables.
Q: What is the Mann-Whitney U test used for?
The Mann-Whitney U test is a non-parametric test used to compare unpaired data, typically ordinal, interval, or ratio scales.
Q: What is the Wilcoxon signed-rank test used for?
The Wilcoxon signed-rank test is used to compare two sets of observations on a single sample (e.g., before and after an intervention).
Q: What is the chi-squared test used for?
The chi-squared test is used to compare proportions or percentages, such as comparing the percentage of patients who improved with two different interventions.
Q: What is the Spearman and Kendall rank correlation used for?
The Spearman and Kendall rank correlation tests are non-parametric methods used to assess the strength and direction of association between two ranked variables.
Q: What is paired data?
Paired data refers to data obtained from a single group of patients, such as measuring the same group before and after an intervention.
Q: What are the key features of a cohort study?
Observational and prospective study design.
Two (or more) groups are selected based on their exposure to a particular agent (e.g., medicine, toxin) and followed up to see how many develop a disease or outcome.
The usual outcome measure is the relative risk.
Example: Framingham Heart Study.
Q: What are the key features of a case-control study?
Observational and retrospective study design.
Patients with a particular condition (cases) are matched with controls.
Data is collected on past exposure to a potential causal agent for the condition.
The usual outcome measure is the odds ratio.
Inexpensive and produce quick results.
Useful for studying rare conditions, but prone to confounding.
Q: What are the key features of a cross-sectional survey?
Provides a ‘snapshot’ of a population, also called prevalence studies.
Provides weak evidence of cause and effect.