W1: Multiple Linear Regression Flashcards

1
Q

In multiple linear regression, there are:

A

Multiple independent variables (X) and one dependent variable (Y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In multiple linear regression, there are:

A

Multiple independent variables (X) and one dependent variable (Y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The multiple R-squared value for a regression represent the proportion of the variation in the Y variable that can explained by its regression on the X variables.

True or False?

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The assumptions which we need to check when we perform a multiple linear regression are (3):

A

Normality of the errors
Common variance of the errors
Independence of the errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

For the Kolmogorov-Smirnov and Shapiro-Wilk tests of Normality, if p < 0.05 then we conclude that the Normality assumption has been satisfied.
True or False?

Multiple linear regression

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

If the p-value for a correlation coefficient was p = 0.036 then the correlation would be significant at

A

5% level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

We can use multiple linear regression to allow the use of several X-variables (predictors/IV) to predict the

A

response Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the multiple linear regression model equation?

A

Y = a + (b1 * X1) + (b2 * X2) + … + e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the multiple linear regression model equation - Y?

A

Y is the response (DV)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the multiple linear regression model equation? - X

A

X is predictors/IV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the multiple linear regression model equation? - B1/B2

A

B1/B2 is the slope/gradient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the multiple linear regression model equation? - a

A

A is constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the multiple linear regression model equation? - e

A

e is error term

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The multiple linear regression has predictor variables (X) with its own

A

coefficient (b1/b2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is their an error term ( e ) in multiple linear regression?

A

Knowing the values of X1,X2…. does not allow us to predict the value of Y exactly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a residual?

A

Difference between the observed Y-value and its prediction (fitted value) based on corresponding X-values

17
Q

How to calculate residual?

Multiple linear regression

A

Residual = Observations - Fitted Valeu

18
Q

If the scatterplot of residuals are not independent + common variance (funnel effect graph)

Multiple linear regression

A
19
Q

If the scatterplot of residuals are not independent + common variance (funnel effect graph)

Graph does not have independence

A
20
Q

Test signifiance of each predictor, test null and alternate hypothesis that:

Multiple linear regression

A

H0: b = 0 vs H1 : b≠ 0 (for each particular X variable)

21
Q

Generally, an R-Squared above 0.6 (2)

Multiple linear regression

A

makes a model worth your attention
Means that most of the variability in Y var can be explained by X var/multiple linear regression model

22
Q

Step 1 (In SPSS): Writing Regression Equation (2)

A

The regression equation is:
MRI Count = 237.598 + 55.236(Gender) + 1.280 (PIQ) + 6.515 (Height)

23
Q

Step 1 (In R): Writing Regression Equation (2)

A

The regression equation is:
Costs = -3085.657 -86.774(Region) + 511.084(Sex) + 115.61(Age) -2.62(Martial) + 51.16 (Alcohol) + 138.00 (Cigs) -269.264(Exercise)

24
Q

How can you tell Y and X variables utilised in multiple linear regression model in R? (4)

A
  • Costs = Y
  • X = Region, Sex, Age, Marit, Alco
  • Data is from ex.data
  • This is all stored in variable called model
25
Q

Step 2: Writing R^2 and Interpreting it (In R) - (2) where R^2 is less than 60% (11.3%)

Multiple linear regression

A

R^ = 0.113 and so 11.3% of the variation in Y var (name it) can be explained by our multiple linear regression model using X variable (e.g., using X2 and X4 var)
Most of the variation remains unexplained

26
Q

Step 2: Writing R^2 and Interpreting it (In SPSS)

Multiple linear regression

A

We see R^2 is 0.618 and so 61.8% of the variability in MRI count is explained by our multiple linear regression model

27
Q

Step 3 Rule: What P value to include or not?

Multiple linear regression

A

p </= 0.05 (p less than or equal to 0.05)

28
Q

Step 3 Rule: How to interpret signifiance in R? (5)

A
  • Anything with ‘ ‘ = significant at 100% (non-sig for mul linear reg)
  • Anything with . = significant at 10% (non-sig for mul linear reg)
  • Anything with one * = significant at 5%
  • Anything with two ** = significant at 1%
  • Anything with three *** = significant at 0.1%
29
Q

Step 3: Interpreting p-value of predictor and whether to include them (In R) - (3)

Multiple linear regression

A

The coefficient for X2 is significant at 5% level ( p = 0.0397)
whereas the coefficient for X4 is not significant (p = 0.123)
Only X2 should be kept in model

30
Q

Step 3: Interpreting p-value of predictor and whether to include them (In SPSS) - (3)

A

The coefficients for Gender, PIQ and Height are all significant at the 5% level or greater, and so all can be kept in the model

31
Q

How to write B value?

A
32
Q

Step 4: Interpreting assumptions - histogram normally disturbed

Multiple linear regression

A
33
Q

Step 4: Interpreting assumptions - histogram is not normally disturbed

Multiple linear regression

A
34
Q

Step 4 - Interpreting assumptions - scatterplot random scattor

Multiple linear regression

A
35
Q

Step 4 - Interpreting assumptions - scatterplot no random

A
36
Q

Step 5: Making a prediction and find residual for following squireel
Time = 52.9382 + 21.6954 (Mass) -0.8899 (Length) + 2.9466 + 0.5157(Distance) - (5)

Multiple linear regression

A

Input values into the equation
Time = 52.9382+21.6954(1.1)+−0.8899(17)+2.9466(1.2)+0.5157(42.37)
Time = 87.060969 (Fitted value)
Residual = Observation - Fitted Value
Residual = 78 (from table) - Fitted

37
Q

The Kolomorgorv and Smirnov test should be greater than 0.05 so

A

assumption of normality of errors are satisfied

38
Q

Written assumptions (2)

Multiple linear regression that is satisfied

A

The histogram of residuals and normality tests ( p = 0.749 and p = 0.182) suggest that we have no evidence against the assumption of normal errors

The scatterplot of predicted against residuals doesnt show any pattern suggesting the independence and constant variance assumptions on the errors are reasonable.

39
Q

What would your next steps in modelling confidence based on multiple regression analysis? (4) Grade and income covariates not significantly

A

Try to remove covariants from the regression

In backgrounds elimination strategy we would remove the least significant covariants (income) and consider its effect on R^2 and signing ace of remaining covariates

Following we could remove grade to see it’s effect on regression model

Best regression model is one with high R^2 with fewest covariates