Lecture 8 and 9 (Correlation, Regression, CIs) Flashcards
1
Q
The dependent variable is on what axis?
A
Y-axis
2
Q
The independent variable is on what axis?
A
X-axis
3
Q
When should you use a correlation analysis?
A
- examine relationship between variables
- estimate strength of association between variables
- when independent and dependent variables are not clearly different
- when regression requirements not met
4
Q
A correlation coefficient of 0 means:
A
- there is no association between the two variables
5
Q
A regression is:
A
- how well data fits a line
- r-value close to 0 = no correlation
- r-value closer to 1 or -1 = high correlation
- r-squared tells you the amount of variation in Y that is contributed by variation in X.
6
Q
When should you use regression analysis?
A
- look for a trend in data between variables
- more than one X (independent) variable = multiple regression
- predict a dependent variable
- adjust for confounding variables
- curve fitting (pharmacokinetics)
- calibration and laboratory assays
- detect patterns in microarray data
7
Q
Regression r-value close to 0:
A
no association
8
Q
Regression r-value close to 1:
A
strong association
9
Q
Regression r-squared value tells you:
A
- the amount of variation in Y that is contributed by variation in X.
10
Q
Parametric test characteristics:
A
- assume variables are normally distributed with equal variances
- dependent on mean and variance
- susceptible to outliers
- requires continuous variables
11
Q
Non-parametric test characteristics:
A
- based on ranks
- distribution, variance, mean does not matter
12
Q
You can transform non-linear data to linear data by:
A
- taking logs
13
Q
Three ways you can control for outliers:
A
- using non-parametric test
- dropping the outlier(s)
- log transformation
14
Q
Multivariate regression:
A
- more than one X (independent) variable
- allows adjustment for confounders
- controls for variable interactions by multiplying variables together
15
Q
Stepwise regression:
A
- finds the top contributing variable, then the second, then the third, etc. until a point of diminishing returns is reached.
- a.k.a finds the group of variables that has the largest collective r-squared value.