Lecture 8 and 9 (Correlation, Regression, CIs) Flashcards
The dependent variable is on what axis?
The independent variable is on what axis?
When should you use a correlation analysis?
- examine relationship between variables
- estimate strength of association between variables
- when independent and dependent variables are not clearly different
- when regression requirements not met
A correlation coefficient of 0 means:
- there is no association between the two variables
A regression is:
- how well data fits a line
- r-value close to 0 = no correlation
- r-value closer to 1 or -1 = high correlation
- r-squared tells you the amount of variation in Y that is contributed by variation in X.
When should you use regression analysis?
- look for a trend in data between variables
- more than one X (independent) variable = multiple regression
- predict a dependent variable
- adjust for confounding variables
- curve fitting (pharmacokinetics)
- calibration and laboratory assays
- detect patterns in microarray data
Regression r-value close to 0:
no association
Regression r-value close to 1:
strong association
Regression r-squared value tells you:
- the amount of variation in Y that is contributed by variation in X.
Parametric test characteristics:
- assume variables are normally distributed with equal variances
- dependent on mean and variance
- susceptible to outliers
- requires continuous variables
Non-parametric test characteristics:
- based on ranks
- distribution, variance, mean does not matter
You can transform non-linear data to linear data by:
- taking logs
Three ways you can control for outliers:
- using non-parametric test
- dropping the outlier(s)
- log transformation
Multivariate regression:
- more than one X (independent) variable
- allows adjustment for confounders
- controls for variable interactions by multiplying variables together
Stepwise regression:
- finds the top contributing variable, then the second, then the third, etc. until a point of diminishing returns is reached.
- a.k.a finds the group of variables that has the largest collective r-squared value.
Multiple logistic regression:
- a multivariate analysis
- adjusts for confounding
- useful when outcome is dichotomous
- provides a direct estimate of the ODDS RATIO for each independent variable
When the distribution of your data is not normal, what type of test should you use?
If you are analyzing more than one type of independent (X) variable, what type of analysis should you use?
multivariate regression
Principal Component Analysis (PCA):
- takes many variables and reduces them by regression
- gives you groups of variables that best explain variation
Risk factors are modfiable through:
primary prevention
Prognostic factors are modifiable through:
secondary prevention
Common prognosis endpoints:
- case fatality (patients with disease who die of it)
- disease-specific mortality (people per 10,000 who are dying of specific disease)
- response
- remission
- recurrence
- a genuine lack of consensus in the medical community about a treatment or prognosis, and how to treat.
- allows for RCTs
Kaplan-Meier Analysis:
- most widely used survival analysis:
- a graph of time to event
- every horizontal segment is a time period
- every vertical drop is an event (death) or a dropout
- larger the sample size, smoother the curve
Kaplan-Meier analysis truncation:
when a patient enters the study after it has already started
Kaplan-Meier analysis censoring:
when a patient drops out of a study after it has started
Can a Kaplan-Meier analysis handle covariates?
- use a Cox regression for this
Cox regression:
- multivariate survival analysis
- can control for other factors
- calculates hazard ratio (same as relative risk)
Equipoise allows for:
- randomized control trials to occur
- equipoise = uncertainty in the medical community
Variance =
measure of the spread/dispersion of values around the mean.
Standard deviation =
√v; (v = variance)
- decreases as sample size increases
Standard error of the mean (SEM) =
SD/ √n
Central limit theorem posits:
- larger the sample size, the closer the study mean is to the population mean
- i.e. narrower confidence interval
Interquartile range:
- IQR contains 50% of the observations
- (from the 25th - 75th percentile)
Confidence intervals describe:
- the uncertainty that surrounds a particular observation
- larger the sample size, narrower the CI = MORE PRECISE STUDY
Equation for 95% CI:
95% CI = mean +/- 1.96(SD/ √n)
- SD = standard deviation
- n = sample size
For correlation analyses, the confidence interval cannot contain:
0 = no correlation
For relative risk, hazard ratios, and odds ratios, the confidence interval cannot contain: