Practical 7: Multiple linear regression Flashcards

Question 1

Q

How does multiple regression model plot two or more explanatory variables?

Answer

A

Rather than a regression line on a scatter plot - a regression plane is used

Question 2

Q

When is multiple linear regression used?

Answer

A

To see how a set of independent variables relates to one dependent variable

Question 3

Q

What is a partial regression coefficient?

Answer

A

Bi is the change in y when xi increases by unit with all other independent variables held static

In MLR - the independent/explanatory/predictors/covariates x1, x2 can be continuous or categorical

y the dependent variable is continuous - a linear model is used. If y is categorical another model needs to be used

Question 4

Q

What is the null hypothesis in MLR?

Answer

A

When holding all other variables constant there is no linear association between y and Xi (i.e. B1 = 0)

If p is < 0.05 we reject this and conclude that B1 is significant at the population level

Question 5

Q

If a B1 is not significant what can we conclude?

Answer

A

That the independent variable in relation to B1 is not statistically significantly related to the dependent variable when the other variables are held constant
- we cannot generalise this variable to the population

Question 6

Q

How do we see if there is confounding?

Answer

A

Plot a scatterplot between independent variables –> is there a linear relationship between them

Confounding is when one independent variable affects both the other relationship variable and the dependent variable

When not taken into account confounders may introduce bias into the result of the slope

Question 7

Q

How do we adjust for confounders?

Answer

A

Include them in the multiple linear regression model

Question 8

Q

How do we adjust for confounders?

Answer

A

Include them in the multiple linear regression model

Question 9

Q

How can sample restrict the number of variables included in a MLR?

Answer

A

Generally we need at least 10 observations per independent variable

i.e. if 100 observations we can have 10 independent variable?

Question 10

Q

What does a R-squared value of 0 indicate?

Answer

A

0 - R-square indicates the plane is horizontal and doesn’t fit the data points –> the model doesn’t explain the variables well

1 would be the opposite

(R-squared is the proportion of variance explained in the dependent variable is “explained” by the independent variable”)

Question 11

Q

Does R-square indicate if one variable is explained by another?

Answer

A

No R-square just explains how much variance of dependent variable is accounted for by our model (the independent variables)

Also doesn’t indicate if the correct model was used

Question 12

Q

How does Adjusted R-square differ to R-square?

Answer

A

Adjusted R square accounts for chance increases seen in R-squared when another variable is added

Therefore its a more robust value of estimating how much the variance in dependent variable is explained by our model.

Question 13

Q

What are the assumptions of MLR

Answer

A

The relationship between Y and each CONTINUOUS independent variable is linear
- First plot scatterplot of residuals of dependent variable against residuals of each independent variable in turn (both variables are regressed separately on each x)
- Partial residual plots are conducted to show the influence between y and specific x with the effects of other x’s removed - requires at least two independent variables
Residuals or error terms should be normally distributed - normality of residual error

Assess this by plot a histogram of error terms
Or by using a p-p plot –> this plots the data against a theoretical normal distribution (i.e. the p-p plot should have a straight line)
MLR assumes normality of residual error - i.e the variation in residual error of Y is not explained by predictors

Homoscedasticity (stability in variance of residuals)

The variance of error terms doesn’t depend on value of x

Scatter plot of standardised residuals and standardised predicted values - no pattern - i.e as x increases standardised residuals do not increase

Independent observations (i.e. not couples etc.)

Question 14

Q

How do we assess the assumptions of MLR in SPSS?

Answer

A

Analyse - regression - linear - define terms

PLOTS –> histogram + normal probability plot + produce all partial plots

ZRESID (standardised residuals) in Y
ZPRED (standardised predicted values) in X

Question 15

Q

How do we check assumption 1 for MLR (linearity between continuous predictor and Y)?

Answer

A

1# Check the partial residual plots (the scatter plots illustrating the relationship between residual of Y and each residual of X) –> for a variable to be included in the model the scatter plot of that variable should show a linear relationship

Question 16

Q

How do we assess assumption #2 - normally distributed residual variance of Y when residuals of x are removed

Answer

Study These Flashcards

A

Check the histogram of standardised residuals for Y - normal distribution

Check the p-p plot of standardised residuals - should have a straight line

Question 17

Q

How do we assess homoscedasticity?

Answer

Study These Flashcards

A

Check the scatter plot for standardised residual (Y) vs standardised predicted values (X) –> should have a homoscedastic trend i.e the residuals scatter randomly above and below 0 and the scatter around horizontal - is roughly constant

If heteroscedastic - the residual variance increases with the size of the predicted volume (like this shape

Practical 7: Multiple linear regression Flashcards

(17 cards)