Lecture 12 Flashcards
how many predictors and response variable are in simple/univariate linear regression?
one predictor and one variable
how many predictors and response variable are in multiple linear regression?
more than one predictor variable and one response variable
why is it called multiple linear regression?
we have p>1 predictors
why the model is called regression?
we are modelling a response variable (y) as a function predictor variables (x1,…xp)
what is the relationship between Xj and Y in multiple linear regression model?
the relationship is linear
is collinearity beneficial amongst explanatory variables?
no, it can complicate or prevent the identification of optimal set of explanatory variables for a statistical model.
what could be the source of multicollinearity?
if two variables have high variability
what is another way of detecting multicollinearity?
to estimate the multiple regression and then examine the output carefully
how VIF value is linked to multicollinearity?
• Calculate VIF as an indicator of multicollinearity. The larger the value of VIFj, the more “troublesome” or collinear the variable Xj. As a rule of thumb, if the VIF of a variable exceeds 10, that variable is said to be highly collinear
what degrees of freedom refer to?
the number of values involved in the calculations that have the freedom to vary
What is the coefficient of determination in multiple regression?
The coefficient of multiple determination (R2) measures the proportion of variation in the dependent variable that can be predicted from the set of independent variables in a multiple regression equation. When the regression equation fits the data well, R2 will be large (i.e., close to 1); and vice versa
describe forward model
‘forward’ - it is done in the opposite way. Instead of doing the whole list, here, it will work with first variable and add the second one. Before adding the next variable, the significance will be checked. Insignificant variables will be dropped.
describe backward model
‘backward’ - the model will take all the list of the variables , check which one is insignificant and drop the one which is most insignificant. It will do this one by one. We will end up with the list of significant variables only
what AIC stands for
Akaike information criteria
what BIC stands for
Bayesian information criteria