Multiple Regression Flashcards
What type of analysis does Multiple Regression come under?
Analysis of dependence = “in which one variable is identified for study and is then examined in terms of its dependence on others”
What is Multiple Regression, why is it useful?
It involves moving beyond the simple and often inappropriate reliance on a single variable (X) for predicting changes in Y.
It uses multiple variables to explain a single outcome, thus allowing the investigation of more complex phenomena.
What is the coefficient of explanation for a multiple regression?
Adjusted R-square = how much variation we see that can be accounted for by the model, between -1 and +1, give as a PERCENTAGE %. Unlike simple linear, here it is ‘adjusted’ to account for the number of predictor variables
In what other ways are the independent variables called in this type of test?
‘Predictors’ or ‘regressors’ - the variables (X) that are being used to predict changes in Y
What is the general equation for this?
y = a + b1x1 + b2x2…… + bnxn
How is the problem of visualising a multiple regression dealt with?
Uses a ‘best-fit plane’ to which variables fit.
Using the adjusted R-square, we can compare ‘goodness of fit’ between similar models (but NOT different studies)
What is scale-dependency? Why does it occur?
Different variables are often measured differently (on different scales), but the model requires variables to be on the same scale otherwise the relative influence of a variable on the outcome variable is impossible to determine.
How is scale-dependecy dealt with?
Standardisation = conversion of partial regression coefficients into standardised coefficients, called BETA VALUES.
(all variables converted based on the distribution of data, into z-units by taking mean of X from X, divided by standard deviation)
What are the two other model outcomes needed to report results (besides adjusted R-square)?
F-ratio = significance of overall regression model, total explained variance; the larger the more variance that is explained
T = the significance of coefficients in explaining the variance (null says B = 0)
Name the three problems that could stop the model from working effectively.
Multicollinearity
Heteroscedasticity
Autocorrelation
What is multicollinearity? How is it identified and how is it addressed?
In multiple regression, each predictor should be independent.
When there is high correlation between any two predictor variables (makes it difficult to isolate the influence of each X variable on Y).
It is present when tolerance is lower than 0.20 and VIF scores exceed 5.0.
Addressed by removing one or more of the correlated variables, creating an interaction term, or reducing them using a factor/principal component analysis.
What is heteroscedasticity? How is it identified?
Variables should be homoscedastic, represented by an even scatter on P-P plot, a Gaussian distribution, and a random scatterplot.
When the x variables are uneven, the above figures would not occur and the models would be heteroscedastic.
What is autocorrelation? How is it identified?
When you don’t have independence between residuals.
The result of the Durbin-Watson test indicates whether null can be rejected or not. BUT, it is more of a problem to deal with if using time series data.
Name three approaches and outline their differences.
SIMULTANEOUS - “cauldron pot”; used when there is no strong theoretical consideration underpinning the importance of variables
HIERARCHICAL - predictors added in a particular order based on priority (often statistical significance), often following theoretical considerations based on past research; a logical order
STEP-WISE - forward or backward; adding one by one or taking out one by one; predictors are based only on their statistical significance
How is explained variance calculated?
Sum of ‘best estimate of Y’ subtract ‘mean of Y’ squared, over K (no. of predictors)