Lecture 39- Multiple Linear Regression Flashcards
What is the general idea of multiple linear regression as opposed to simple linear regression?
Simple linear regression allow us to assess the effect of a single explanatory variable (x) on a response variable (y).
Multiple linear regressions are when you have multiple x’s/ explanatory variables to explain your outcome
What is the model in multiple linear regression given by?
Y = β0 + β1x1 + · · · + βk xk + e
k denotes the number of explanatory variables;
β0, β1, . . . , βk are parameters (regression coefficients);
e is an error term following a N(0, σ2e) distribution.
What do you take of the multiple linear regression equation to get the mean response as predicted by multiple x’s/ explanatory variables? What is this known as?
- Take of the e (error term)
- Conditional mean of Y given the fixed values of the predictor variables.
Why is being able to do multiple linear regression important i.e. what are it’s applications?
- Adjusting for the effect of confounding variables.
- Establishing which variables are important in explaining the values of the response variable and what is just extra noise
- Predicting values of the response variable.
- Describing the strength of the association between the response variable and the explanatory variables.
How do we visualize multiple regression?
Exits in 3D, plane is put to go through points as close as possible
How is the fitted multiple regression model different to the model at the population level?
At population level have parameters in reality am unlikely to know actual parameters, instead have to estimate and put hats on top of variables
How do we measure the overall quality of predictions with our model?
Through the sum of squared errors (RSS), looks at the difference between the predictions via the model and actual responses
What are the least squares estimate?
The value of parameters that minimizes RSS (difference between predictions via model and actual values)
How do you use the residual sum of squares to estimate error variance?
RSS/ (n-k-1)
k=number of predictors
What happens as you add variables to the multiple linear regression?
The amount of error reduces (you explain more).
However, can get to point where ‘too much’ of the data is being explained
How do you read the output from R to tell which explanatory variables are valuable in the model?
Look at estimate column and values after the intercept. Each line is a different explanatory variable with the P value (less than 0.05) showing whether or not the variable is important to the model or can/ should be removed.
What happens in R if there is a missing value?
- Omitted from model fitting
- This is a huge problem in lots of data sets
Sub the parameter estimate values from slide 747 to the multiplier linear regression equation in order to make a prediction estimate for Y
Answers in slide
How do you interpret the coefficients for multiple regression?
-The intercept β0 is the predicted value of the response when all explanatory variables are zero.
-Other coefficients are specific to the associated explanatory variable.
For example, β2 is the change in the mean response when variable x2
is increased by one unit… and all other explanatory variables remain unchanged.
Interpret βˆ1i for the example on slide 752?
Interpretation of βˆ
1 is that male students are estimated to be
15.08cm taller than female students on average, having adjusted for
father’s height and age (other variables remain the same)