L11 to L13: Correlation and Regression Flashcards
Define Correlation
ASSUMING that relationship is linear, it QUANTIFIES degree to which 2 random variables are related
What is correlation coefficient (R)?
QUANTITATIVE measure of the STRENGTH and DIRECTION of a linear relationship between two variables
Two types of correlation analysis, and when to use them?
- Pearson product-moment correlation (PPMC): Parametric test used when variables are continuous
- Spearman rank Correlation (SRC): NPT, when ≥1 are non-normal distribution, OR ordinal data
Assumptions of correlation analyses
- x and y independent
- Pairs of observations (x,y) are randomly selected
- PPMC: underlying ppn of both variables are normally variables
Hyp of correlation analysis
H0: r = 0 (no correlation)
H1: r ≠0 (have correlation) or r>0 or r<0
Advantage of SRC over PPMC
Decreased sensitivty to outliers since ranks are used (similar to other NPT)
When you receive data and want to check for correlation, the VERY FIRST STEP you should do
Construct scatter plot and roughly scan for linear relationship
This is to check whether assumption that variables have linear relationship, before quantifying their linearity
Distinguish between correlation and simple linear regression (SLiR)
- Correlation: Find out how linear x and y are, provided their relation is already linear from scatter plot. No defined independent/dependent variable is defined yet
- SLiR: Provided that correlation is SIGNIFICANT, give BEST-FIT LINE for DEFINED x and y (defined independent/dependent variable)
Purpose of SLiR
Estimate y for defined x using equation obtained from best fit line
One disadvantage of SLiR
Not suitable for extrapolation. Equation only applies WITHIN data range
The equation for SLiR and what do each symbol mean
y = a + Bx
y: Dependent variable
x: Independent variable
a: y-intercept
B: Slope. i.e. change in MEAN of y that correspond to one unit change in x
Assumptions of SLiR
- Assume variables have linear relationship
- Observations are independent
- For any values of x, y is NORMALLY distributed
- Fo any x, variances are equal (similar to other tests)
How does SLiR get its line of best fit?
Method of least squares
Hypotheses of SLiR and tail?
H0: No effect by x on y (B = 0)
H1: B ≠ 0
- ALWAYS two-tailed
Given B = 1.657, alpha = 23.811, x is Weight, and y is systolic blood pressure (SBP), p = 0.001, construct the regression equation and formulate a conclusion.
y = 23.811 + 1.657(BW)
Conclusion:
- For every 1kg increase in BW, the MEAN SBP increases by 1.657 mmHg.
- At a sig level of 0.05, there is a statsig effect of BW on SBP (p = 0.001)
(rmb both word explanation of equation and sig level)
(rmb units)