L2 Multiple Regression Flashcards

Question 1

Q

What is multivariate analysis?

Answer

A

Analysing the impact of multiple variables for predicting changes in Y.

Question 2

Q

What are the two types of multivariate analysis?

Answer

A

Dependent and independent

Question 3

Q

What are two analysis of dependence?

Answer

A

multiple regression and discrimination

Question 4

Q

What are three analysis of independence?

Answer

A

cluster analysis, principal component analysis, factor analysis

Question 5

Q

What is the purpose of multiple regression?

Answer

A

It is designed to isolate the effects of each x-variable (predictor) upon the y-variable

Question 6

Q

What is a coefficient?

Answer

A

A numerical description of an x-variable’s relationship upon the y-variable

Question 7

Q

What is a regressor/predictor?

Answer

A

an independent variable

Question 8

Q

What is the model?

Answer

A

term used to describe the collection of regressors involved in predicting y

Question 9

Q

Given an example of a multiple regression case and 5 potential regressors?

Answer

A

Plant growth rate (y) is dependent upon 1) temperature, 2) radiation, 3) carbon dioxide, 4) nutrient supply, 5) water

Question 10

Q

How and WHY does the line of best fit changes for multiple regression?

Answer

A

Because there are more x-variables, their relationship is now collectively non-linear so the line becomes a plane of best fit

Question 11

Q

How does the r^2 statistics differ in multiple regression?

Answer

A

it becomes r^2v(adj) [adjusted r squared value].

Question 12

Q

What is the adjusted r squared metric and why does it differ from its linear counterpart?

Answer

A

The linear r squared considered how much of y was explained by a singular predictor. Whereas the adjusted version measures how well the whole model explains variance in Y.

Question 13

Q

What happens to the adjusted r squared value if you add more regressors to the model and why?

Answer

A

The adjusted r squared can either increase or decrease. This is because the other regressors may either contribute an increase to the understanding/explanation of y, or they may not be relevant and so cloud out/decrease the model’s overall explanation of Y.

Question 14

Q

What is crucial before we compare adjusted r squared values?

Answer

A

Standardizaiton

Question 15

Q

Why do we need to standardize in order to compare adjusted r squared values?

Answer

A

because of the different data, scales and models between different samples that make it inappropriate to compare

Question 16

Q

What does scale dependency mean and why is it important?

Answer

A

In multiple regression because there are different regressors inputted in to the model, of which they have different units of measurement, it means that when you add another regressor, the initial ones change i.e. their scale is dependent upon the others. Importantly however the way they change is not necessarily correct because of the different units of measurement between the different variables

Question 17

Q

What is an example of scale dependency being important ?

Answer

A

Plant growth rate - water is measured in millilitres, carbon dioxide is measured in ppm usually. This means that they respond in incorrect ways

Question 18

Q

What technique allows us to overcome the scale dependency?

Answer

A

standardization

Question 19

Q

What happens to the name of the coefficients as they are standardised?

Answer

A

they transform from partial regression coefficients in to beta regression coefficients

Question 20

Q

To overcome the different units of the different regressors what happens to the units?

Answer

A

They become z units

Question 21

Q

What is the t value?

Answer

A

the significance of the coefficient in explaining the variance

Question 22

Q

What is multicollinearity?

Answer

A

When the regressors in a model correlate with each other

Question 23

Q

Why is multicollinearity a problem for multiple regression?

Answer

A

Because multiple regression seeks to determine the specific effect of each regressor separately upon Y.

Question 24

Q

If multicollinearity is present, then how would the finding of one regressor impacting Y in a certain way be hampered?

Answer

A

because a regressor correlates with another regressor then the impact of the initial regressor upon Y would have to be shared between both regressors, and there would be no way of determining which one is a more important regressor. Furthermore, they may both be controlled by an underlying factor

Question 25

Q

How do we look for multicollinearity?

Answer

A

using the variance inflation factor

Question 26

Q

What are the 3 ways we can address multicollinearity?

Answer

A

Remove one or more of the correlated regressors and then re-run the test
Where there is a logical relationship between the two variables we can create an interaction term
If variables are related because of underlying factors then they can be reduced using factor analysis

Question 27

Q

What does the f-test enable?

Answer

A

the determination of how taking a regressor out of the model impacts the Y value

Question 28

Q

How does the f-test result work?

Answer

A

Look to see how much the f-test value changes between different models with different regressors included or excluded.

Question 29

Q

How do you interpret the f-test result?

Answer

A

If F changes a lot then it means the inclusion or exclusion of that regressor has a big impact upon Y. Furthermore, the f-value will also have a p value which indicates the statistical significance of that result.

Question 30

Q

What are 3 things to remember when we want to undertake a meaningful multiple regression analysis?

Answer

A

Variable selection - dont chuck in a load of variables and see what happens, instead select them and their order based on theory and previous findings.
Do not rely on a single metric to assess, instead use multiple as they all have a specific and useful purpose.
Parsimony - do not overcomplicate the model, seek coherence and simplicity.

Question 31

Q

What are the 3 methods for building the structure of the model’s regressors?

Answer

A

Hierarchical: ordering the regressors individually, based on theoretical considerations with known predictors entered in first and in order of importance
Step-wise: order based on mathematical reasoning, which means that is is usually carried out by computers based on a correlation match rate (predictors that explain most are ordered first)
Forced-entry/simultaneous: all regressors (known or unknown) are put in to the model in a simultaneous manner but all regressors have been considered before

Question 32

Q

What is important to remember for the validation of a model?

Answer

A

Cross-validation

Question 33

Q

What is cross-validation?

Answer

A

Applying the model with the same predictors for a different sample or population

Question 34

Q

What two things are important to consider about your model once you have produced it?

Answer

A

Does it fit the observed data well (or is it overly affected by a few regressors), and can the model be generalised to other samples

Question 35

Q

What is the standardization formula?

Answer

A

beta coefficient (standardized multiple regression coefficient) = partial regression coefficient x (standard deviation for regressor divided by standard deviation of variable y)

Question 36

Q

How do we determine multicollinearity in SPSS?

Answer

A

Look at the tolerance and VIF values. Ideally we want tolerance to be >0.2 and the VIF to be <5.

Question 37

Q

What do the tolerance and VIF metrics mean?

Answer

A

tolerance = e.g. if = 0.3 then this means 30% of the variance for that given predictor is not accounted for by the other predictors. We therefore want it to be higher than 0.2 so that we know that the predictor is not useless because other predictors explain it themselves.
VIF = if >5 this indicates the presence of multicollinearity

Question 38

Q

How doe we determine heteroscedasticity in SPSS?

Answer

A

We look at the scatterplot, histogram and the p-p plot.

Question 39

Q

What is heteroscedasticity? Why does it matter?

Answer

A

When our residuals are not constant across the x-axis. If this occurs then we cannot be as confident in our regression analysis

Question 40

Q

How do we interpret the scatterplot, histogram and p-p plot in SPSS to determine heteroscedasticity?

Answer

A

Scatterplot = we want to see a random and wide spread for homoscedascity
Histogram = we want to see a normal distribution (bell-shaped curve)
P-P plot = we want to see a fairly even alignment of points to the line

Question 41

Q

What is autocorrelation? Why is it a problem?

Answer

A

This is when each of our correlations is not independent of the others around it. Regression assumes that they are independent

Question 42

Q

How do we determine auto correlation in SPSS?

Answer

A

Look at the Durbin Watson statistic?

Question 43

Q

What is the range of Durbin-Watson statistics where we would state there was no autocorrelation?

Answer

A

1.566 - 2.434

Question 44

Q

What is the range of Durbin-Watson statistics where we would state there is significant positive or negative autocorrelation?

Answer

A

Positive = 0.1.475
Negative = 2.525-4.0

Question 45

Q

What are the ranges of Durbin-Watson statistics where we would state there is indeterminate autocorrelation?

Answer

A

1.475-1.566 & 2.434-2.525

Question 46

Q

How does stepwise regression work in SPSS?

Answer

A

SPSS uses the correlation matrix to find the independent variable that has the largest significant Pearson’s correlation with Y, followed by the next and then orders them in that order.