Correlation -> Linear Regression Flashcards
1
Q
CORRELATION
A
- measures “degree of association” between 2 scale (interval/ratio)/ordinal (ordered category) variables
2
Q
ASSOCIATIONS
A
- measured in “r” (parametric/Pearson’s)/”p” (non-parametric/Spearman’s)
- unless both variables = normally distributed, Spearman’s MUST be used
POSITIVE - increase in 1 variable = associated w/increase in other
NEGATIVE - increase in 1 variable = associated w/decrease in other
3
Q
CORRELATION (EXAMPLES)
A
- comparing 2 variables in degree of association terms (ie. attitude scales VS behavioural frequency)
- test statistic = r (parametric)/p (non-parametric) (aka. -1 = perfect negative correlation; 0 = random distribution zero correlation; +1 = perfect positive correlation)
4
Q
CORRELATION: HEIGHT VS WEIGHT
A
- strong positive correlation between height/weight
- can see how relationship works BUT cannot calculate one from other
- causal inference (?); aka. is 1 variable dependent on other? are both influenced independently by third variable?
- ie. if 120cm tall, how heavy (approx.)?
5
Q
CORRELATION: SUMMARY
A
- association tests use “correlation coefficient” to assess how strongly/in what direction 2 continuous variables are associated
- CANNOT tell us anything about causal inferences aka. cause-effect direction of relation; logic only sometimes determines this
6
Q
CORRELATION: ANOVA EXAMPLE
A
- eg. one-way ANOVA w/contrasts
- tells us (ie.) if treatment has effect on symptom index BUT not anything about relation between amount of A/B drug/symptoms experienced by patient
- correlation = strong & negative; shows relation but not predictions
7
Q
LINE OF BEST FIT
A
- allows to describe relationship between variables more accurately
- can now predict specific values of 1 variable from knowledge of other
- all points should be close to it
8
Q
RESIDUAL VALUE
A
- can predict specific values of 1 variable from knowledge of another via simple regression/best fit line
- BUT will predictions be as accurate? NO; via “residual value”
- aka. dif between observed value of DV (y-axis) VS predicted by equation
9
Q
GENERAL REGRESSION RULES
A
- DV should be measured on interval/ratio (continuous) scale variable
- ordinal usually good in practice so long as we have large enough category number (7+) & frequency distribution = normal
- IVs should be measured on interval/ratio scales
- BUT… most ordinal measurement = acceptable in practice (apply same rules as ordinal DVs)
- dichotomies/binary variables = also OK as IVs
- distributions of variables should be roughly normal; correlation/regression = sensitive to shape of frequency distribution of variables (unlike ANOVA)
- regression/associated techniques = robust BUT w/limits
- if not roughly normal can be corrected via appropriate transformation (ie. taking logarithms of all measurements)
- 2-valued categorical variables (dichotomies/binary variables) can be used directly as regressors (ie. yes/no)
- categorical variables w/3+ categories are dealt w/via dummy variables
10
Q
ANOVA VS MULTIPLE REGRESSION
A
DVs
- ANOVA/regression = continuous only
IVs
- ANOVA = category only
- regression = both continuous/category
11
Q
SUMMARY
A
1-WAY ANOVA
- continuous (DV) -> category (IV (1))
2-WAY ANOVA
- continuous (DV) -> category (IV (1)) + category (IV(2))
SIMPLE LINEAR REGRESSION
- continuous (DV) -> continuous (IV (1))
MULTIPLE LINEAR REGRESSION
- continuous (DV) -> continuous (IV (1)) + continuous (IV (2)) + …