W1: Multiple Linear Regression Flashcards by Gitanjali Sharma

In multiple linear regression, there are:

Multiple independent variables (X) and one dependent variable (Y)

How well did you know this?

Not at all

Perfectly

In multiple linear regression, there are:

Multiple independent variables (X) and one dependent variable (Y)

How well did you know this?

Not at all

Perfectly

The multiple R-squared value for a regression represent the proportion of the variation in the Y variable that can explained by its regression on the X variables.

True or False?

True

How well did you know this?

Not at all

Perfectly

The assumptions which we need to check when we perform a multiple linear regression are (3):

Normality of the errors
Common variance of the errors
Independence of the errors

How well did you know this?

Not at all

Perfectly

For the Kolmogorov-Smirnov and Shapiro-Wilk tests of Normality, if p < 0.05 then we conclude that the Normality assumption has been satisfied.
True or False?

Multiple linear regression

False

How well did you know this?

Not at all

Perfectly

If the p-value for a correlation coefficient was p = 0.036 then the correlation would be significant at

5% level

How well did you know this?

Not at all

Perfectly

We can use multiple linear regression to allow the use of several X-variables (predictors/IV) to predict the

response Y

How well did you know this?

Not at all

Perfectly

What is the multiple linear regression model equation?

Y = a + (b1 * X1) + (b2 * X2) + … + e

How well did you know this?

Not at all

Perfectly

What is the multiple linear regression model equation - Y?

Y is the response (DV)

How well did you know this?

Not at all

Perfectly

What is the multiple linear regression model equation? - X

X is predictors/IV

How well did you know this?

Not at all

Perfectly

What is the multiple linear regression model equation? - B1/B2

B1/B2 is the slope/gradient

How well did you know this?

Not at all

Perfectly

What is the multiple linear regression model equation? - a

A is constant

How well did you know this?

Not at all

Perfectly

What is the multiple linear regression model equation? - e

e is error term

How well did you know this?

Not at all

Perfectly

The multiple linear regression has predictor variables (X) with its own

coefficient (b1/b2)

How well did you know this?

Not at all

Perfectly

Why is their an error term ( e ) in multiple linear regression?

Knowing the values of X1,X2…. does not allow us to predict the value of Y exactly

How well did you know this?

Not at all

Perfectly

What is a residual?

Study These Flashcards

Difference between the observed Y-value and its prediction (fitted value) based on corresponding X-values

How to calculate residual?

Multiple linear regression

Study These Flashcards

Residual = Observations - Fitted Valeu

If the scatterplot of residuals are not independent + common variance (funnel effect graph)

Multiple linear regression

Study These Flashcards

If the scatterplot of residuals are not independent + common variance (funnel effect graph)

Graph does not have independence

Study These Flashcards

Test signifiance of each predictor, test null and alternate hypothesis that:

Multiple linear regression

Study These Flashcards

H0: b = 0 vs H1 : b≠ 0 (for each particular X variable)

Generally, an R-Squared above 0.6 (2)

Multiple linear regression

Study These Flashcards

makes a model worth your attention
Means that most of the variability in Y var can be explained by X var/multiple linear regression model

Step 1 (In SPSS): Writing Regression Equation (2)

Study These Flashcards

The regression equation is:
MRI Count = 237.598 + 55.236(Gender) + 1.280 (PIQ) + 6.515 (Height)

Step 1 (In R): Writing Regression Equation (2)

Study These Flashcards

The regression equation is:
Costs = -3085.657 -86.774(Region) + 511.084(Sex) + 115.61(Age) -2.62(Martial) + 51.16 (Alcohol) + 138.00 (Cigs) -269.264(Exercise)

How can you tell Y and X variables utilised in multiple linear regression model in R? (4)

Study These Flashcards

Costs = Y
X = Region, Sex, Age, Marit, Alco
Data is from ex.data
This is all stored in variable called model

Step 2: Writing R^2 and Interpreting it (In R) - (2) where R^2 is less than 60% (11.3%) Multiple linear regression

R^ = 0.113 and so 11.3% of the variation in Y var (name it) can be explained by our multiple linear regression model using X variable (e.g., using X2 and X4 var) Most of the variation remains unexplained

Step 2: Writing R^2 and Interpreting it (In SPSS) Multiple linear regression

We see R^2 is 0.618 and so 61.8% of the variability in MRI count is explained by our multiple linear regression model

Step 3 Rule: What P value to include or not? Multiple linear regression

Step 3 Rule: How to interpret signifiance in R? (5)

* Anything with ' ' = significant at 100% (non-sig for mul linear reg) * Anything with . = significant at 10% (non-sig for mul linear reg) * Anything with one * = significant at 5% * Anything with two ** = significant at 1% * Anything with three *** = significant at 0.1%

Step 3: Interpreting p-value of predictor and whether to include them (In R) - (3) Multiple linear regression

The coefficient for X2 is significant at 5% level ( p = 0.0397) whereas the coefficient for X4 is not significant (p = 0.123) Only X2 should be kept in model

Step 3: Interpreting p-value of predictor and whether to include them (In SPSS) - (3)

The coefficients for Gender, PIQ and Height are all significant at the 5% level or greater, and so all can be kept in the model

How to write B value?

Step 4: Interpreting assumptions - histogram normally disturbed Multiple linear regression

Step 4: Interpreting assumptions - histogram is not normally disturbed Multiple linear regression

Step 4 - Interpreting assumptions - scatterplot random scattor Multiple linear regression

Step 4 - Interpreting assumptions - scatterplot no random

Step 5: Making a prediction and find residual for following squireel Time = 52.9382 + 21.6954 (Mass) -0.8899 (Length) + 2.9466 + 0.5157(Distance) - (5) Multiple linear regression

Input values into the equation Time = 52.9382+21.6954(1.1)+−0.8899(17)+2.9466(1.2)+0.5157(42.37) Time = 87.060969 (Fitted value) Residual = Observation - Fitted Value Residual = 78 (from table) - Fitted

The Kolomorgorv and Smirnov test should be greater than 0.05 so

assumption of normality of errors are satisfied

Written assumptions (2) Multiple linear regression that is satisfied

The histogram of residuals and normality tests ( p = 0.749 and p = 0.182) suggest that we have no evidence against the assumption of normal errors The scatterplot of predicted against residuals doesnt show any pattern suggesting the independence and constant variance assumptions on the errors are reasonable.

What would your next steps in modelling confidence based on multiple regression analysis? (4) Grade and income covariates not significantly

Try to remove covariants from the regression In backgrounds elimination strategy we would remove the least significant covariants (income) and consider its effect on R^2 and signing ace of remaining covariates Following we could remove grade to see it’s effect on regression model Best regression model is one with high R^2 with fewest covariates

W1: Multiple Linear Regression Flashcards

(39 cards)