L2 Multiple Regression Flashcards

1
Q

What is multivariate analysis?

A

Analysing the impact of multiple variables for predicting changes in Y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two types of multivariate analysis?

A

Dependent and independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are two analysis of dependence?

A

multiple regression and discrimination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are three analysis of independence?

A

cluster analysis, principal component analysis, factor analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the purpose of multiple regression?

A

It is designed to isolate the effects of each x-variable (predictor) upon the y-variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a coefficient?

A

A numerical description of an x-variable’s relationship upon the y-variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a regressor/predictor?

A

an independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the model?

A

term used to describe the collection of regressors involved in predicting y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Given an example of a multiple regression case and 5 potential regressors?

A

Plant growth rate (y) is dependent upon 1) temperature, 2) radiation, 3) carbon dioxide, 4) nutrient supply, 5) water

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How and WHY does the line of best fit changes for multiple regression?

A

Because there are more x-variables, their relationship is now collectively non-linear so the line becomes a plane of best fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does the r^2 statistics differ in multiple regression?

A

it becomes r^2v(adj) [adjusted r squared value].

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the adjusted r squared metric and why does it differ from its linear counterpart?

A

The linear r squared considered how much of y was explained by a singular predictor. Whereas the adjusted version measures how well the whole model explains variance in Y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What happens to the adjusted r squared value if you add more regressors to the model and why?

A

The adjusted r squared can either increase or decrease. This is because the other regressors may either contribute an increase to the understanding/explanation of y, or they may not be relevant and so cloud out/decrease the model’s overall explanation of Y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is crucial before we compare adjusted r squared values?

A

Standardizaiton

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why do we need to standardize in order to compare adjusted r squared values?

A

because of the different data, scales and models between different samples that make it inappropriate to compare

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does scale dependency mean and why is it important?

A

In multiple regression because there are different regressors inputted in to the model, of which they have different units of measurement, it means that when you add another regressor, the initial ones change i.e. their scale is dependent upon the others. Importantly however the way they change is not necessarily correct because of the different units of measurement between the different variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is an example of scale dependency being important ?

A

Plant growth rate - water is measured in millilitres, carbon dioxide is measured in ppm usually. This means that they respond in incorrect ways

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What technique allows us to overcome the scale dependency?

A

standardization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What happens to the name of the coefficients as they are standardised?

A

they transform from partial regression coefficients in to beta regression coefficients

20
Q

To overcome the different units of the different regressors what happens to the units?

A

They become z units

21
Q

What is the t value?

A

the significance of the coefficient in explaining the variance

22
Q

What is multicollinearity?

A

When the regressors in a model correlate with each other

23
Q

Why is multicollinearity a problem for multiple regression?

A

Because multiple regression seeks to determine the specific effect of each regressor separately upon Y.

24
Q

If multicollinearity is present, then how would the finding of one regressor impacting Y in a certain way be hampered?

A

because a regressor correlates with another regressor then the impact of the initial regressor upon Y would have to be shared between both regressors, and there would be no way of determining which one is a more important regressor. Furthermore, they may both be controlled by an underlying factor

25
Q

How do we look for multicollinearity?

A

using the variance inflation factor

26
Q

What are the 3 ways we can address multicollinearity?

A
  1. Remove one or more of the correlated regressors and then re-run the test
  2. Where there is a logical relationship between the two variables we can create an interaction term
  3. If variables are related because of underlying factors then they can be reduced using factor analysis
27
Q

What does the f-test enable?

A

the determination of how taking a regressor out of the model impacts the Y value

28
Q

How does the f-test result work?

A

Look to see how much the f-test value changes between different models with different regressors included or excluded.

29
Q

How do you interpret the f-test result?

A

If F changes a lot then it means the inclusion or exclusion of that regressor has a big impact upon Y. Furthermore, the f-value will also have a p value which indicates the statistical significance of that result.

30
Q

What are 3 things to remember when we want to undertake a meaningful multiple regression analysis?

A
  1. Variable selection - dont chuck in a load of variables and see what happens, instead select them and their order based on theory and previous findings.
  2. Do not rely on a single metric to assess, instead use multiple as they all have a specific and useful purpose.
  3. Parsimony - do not overcomplicate the model, seek coherence and simplicity.
31
Q

What are the 3 methods for building the structure of the model’s regressors?

A
  1. Hierarchical: ordering the regressors individually, based on theoretical considerations with known predictors entered in first and in order of importance
  2. Step-wise: order based on mathematical reasoning, which means that is is usually carried out by computers based on a correlation match rate (predictors that explain most are ordered first)
  3. Forced-entry/simultaneous: all regressors (known or unknown) are put in to the model in a simultaneous manner but all regressors have been considered before
32
Q

What is important to remember for the validation of a model?

A

Cross-validation

33
Q

What is cross-validation?

A

Applying the model with the same predictors for a different sample or population

34
Q

What two things are important to consider about your model once you have produced it?

A

Does it fit the observed data well (or is it overly affected by a few regressors), and can the model be generalised to other samples

35
Q

What is the standardization formula?

A

beta coefficient (standardized multiple regression coefficient) = partial regression coefficient x (standard deviation for regressor divided by standard deviation of variable y)

36
Q

How do we determine multicollinearity in SPSS?

A

Look at the tolerance and VIF values. Ideally we want tolerance to be >0.2 and the VIF to be <5.

37
Q

What do the tolerance and VIF metrics mean?

A
tolerance = e.g. if = 0.3 then this means 30% of the variance for that given predictor is not accounted for by the other predictors. We therefore want it to be higher than 0.2 so that we know that the predictor is not useless because other predictors explain it themselves.
VIF = if >5 this indicates the presence of multicollinearity
38
Q

How doe we determine heteroscedasticity in SPSS?

A

We look at the scatterplot, histogram and the p-p plot.

39
Q

What is heteroscedasticity? Why does it matter?

A

When our residuals are not constant across the x-axis. If this occurs then we cannot be as confident in our regression analysis

40
Q

How do we interpret the scatterplot, histogram and p-p plot in SPSS to determine heteroscedasticity?

A

Scatterplot = we want to see a random and wide spread for homoscedascity
Histogram = we want to see a normal distribution (bell-shaped curve)
P-P plot = we want to see a fairly even alignment of points to the line

41
Q

What is autocorrelation? Why is it a problem?

A

This is when each of our correlations is not independent of the others around it. Regression assumes that they are independent

42
Q

How do we determine auto correlation in SPSS?

A

Look at the Durbin Watson statistic?

43
Q

What is the range of Durbin-Watson statistics where we would state there was no autocorrelation?

A

1.566 - 2.434

44
Q

What is the range of Durbin-Watson statistics where we would state there is significant positive or negative autocorrelation?

A
Positive = 0.1.475
Negative = 2.525-4.0
45
Q

What are the ranges of Durbin-Watson statistics where we would state there is indeterminate autocorrelation?

A

1.475-1.566 & 2.434-2.525

46
Q

How does stepwise regression work in SPSS?

A

SPSS uses the correlation matrix to find the independent variable that has the largest significant Pearson’s correlation with Y, followed by the next and then orders them in that order.