Multiple Regression Analysis (MRA) Flashcards
Process MRA
Objectives of MRA –> research design of MRA –> assumptions of MRA –> estimate regression model –> assess overall fit –> interpretation –> validation of results
MRA is
analyzing the relationship between IV(s) (predictor var) and DV (criterion var)
MRA =
- Assumes linear dependency between var
- Simple = 1 metric IV + 1 metric DV, multiple = several metric IVs + 1 metric DV (all metrically scaled)
- Applied in causes, forecast/predictions of impact, time-series (predicting trends)
- Uses plotting = line that minimizes residual (so best line that fits all points at same time)
MRA FORMULA
See summary
Step 1: MRA objectives [1]
- Predictions = which IV best predicts DV (maximize predictive value of DV)
Explanation = how/why each IV affects DV based on rela (linear dependencies) - If rela is nonlinear: other model to better reflect reality (that may also include polynomial terms)
- You need strong theory for picking IV & DV! (Use reliable measurements, beware of measurement errors (esp in DV) because MRA assumes errors are random & avg out to 0.
Step 1: MRA objective [2, rules of thumb]
- Measurement error:
1. SEM can handle measurement errors directly
2. Summated scales + FA to reduce error - Irrelevant vs omitted var:
> Better to include too many var & remove later
> Omitting important var can lead to bias - Curvilinear rela’s: use squared/cubic terms if you think the rela is not a straight line
Step 2: MRA research design [1]
- Sample size gives power
- Use dummy var as IVs to make model simpler & more efficient (slightly improve statistical power)
- IVs can be fixed (by researcher) or random (natural)
> Random var need more statistical power to estimate
> Fixed var are easier (helps with small sample size)
Step 2: MRA research design [2, rules of thumb]
- Simple regression is ok with sample size of 20 (doesn’t need much power because it only has 1 IV)
- Multiple needs 50-100 (depending on complexity of model)
- Min 5 to 1: better to have 15/20 to 1 (keep as simple as possible = parsimony, while also having enough people)
Step 3: MRA assumptions [1]
- Theory should support linear rela IV and DV
- Constant variance (homogeneity) -> spread of error (residuals) should be roughly the same across all values of predictors
- Errors (residuals) should be independent (no connections/patterns)
- Appropriate sample size = most important!
Step 3: MRA assumptions [2, rules of thumb]
- Assumptions also count for variates
- To check how well regression model works, use graphs:
> Partial regression plots
> Residual plots (bivariate rela’s)
> Normal probability plots - If variate is nonlinear –> modify IV
Step 3: MRA assumptions [3, residual plots]
Assumption of homoscedasticity and unbias.
Interpret:
- You want residuals to spread out evenly with no clear pattern
- If there is a pattern: model likely misses important var –> omitted var bias
- If overlooked nonlinear rela: quadratic/cubic terms
- Heteroscedastic = variance is not constant
Remedies:
- Transform data to make linear
- Polynomial terms
Linearity
- Critical!
- To ensure this: descriptive statistics (skewness & kurtosis, and distribution)
- Bivariate rela’s & linearity: check via residual plots
Step 4+5: estimation + model fit
3 basic tasks:
- Select method for specifying regression model
1. Confirmatory (simultaneous) = add all IVs at once based on theory
2. Sequential (based on data)
> Stepwise = add 1 step-by-step
> Forward inclusion = start with most important and add step-by-step
> Backward elimination = start with all and remove least useful step-by step
> Hierarchical = choose step-by-step based on theory
3. Combinatorial (all-possible subsets) = test every combo
- Asses statistical sig
> Check overall model; think of practical sig
> Use Adjusted R2 (corrects for having too many var)
> Back up model with theory (most important!)
> 3 questions:
1. Statistical sig?
2. Does sample size affect sig?
3. Is effect practically sig aka is it meaningful irl? (Use theory to answer) - Determine if there are any influential outliers
Step 6: interpretation [issues]
See examples summary
Step 7: validate model
- Second sample/split-half reliability
- Compare with other models
- Predict future outcomes to see if it works in practice
Moderating effects in MRA
Moderator = var that changes strength/direction of rela between IV and DV
> Shows effect under condition (Z) aka context/boundary condititons
To check if there is a moderating effect: create interaction term (XZ)
- If Z is a number: mean-center X and Z before multiplying (prevents multicollinearity
- If Z is a category: Just do XZ
Test moderation:
- Check if R2 goes up if XZ is added (does model explain more?)
- Check if XZ is sig (is moderating effect real?)
Key overview [diff coefficients]
- Beta = standardized
- Regression = unstandardized (B)
- Partial correlation = how strong are X & Y related after removing effects of other var?
Key overview [logistic regression]
Use when DV is binary; nonmetric with two categories
IV is categorical
Nonlinear
Nagelkerke test