Multiple regression and logistic regression Flashcards
Describe multiple regression
with a single explanatory variable we would carry out a simple linear regression.
With several explanatory variables we can use a more general form of regression model that allows more than one explanatory variable at a time
What are the assumptions for multiple regression?
residuals are normally distributed residuals have mean 0 residuals have constant variance observations are independent error-free measurement of predictors
What does the regression line indicate?
The regression line indicated the nature of the relationship between the explanatory variables and the response
What does the coefficient for each variable?
it gives the expected change that an increase in one unit would give to the response assuming that the other variables do not change
What can the coefficients’ standard errors can be used for?
to create confidence intervals for the coefficients
What is the Ho in multiple regression?
each coefficient = 0
Why is the adjusted R squared used instead of the raw value in multiple regression?
The normal R squared will increase whenever another variable is added to the model regardless of whether the variable has any predictive ability
Describe adjusted R squared
most commonly used selection criteria
it adjusts the estimated proportion of variance explained by the model
What are the other selection criteria used in multiple regression?
predicted R squared
Mallow’s C-p
Principle of parsimony
describe the predicted R-squared
cross validation
takes over fitting into account
Describe Mallow’s C-p
penalises for having too many variables, the model with minimum C-p is most effective
Describe the principle of parsimony
keep it simple
What are model searching methods?
stepwise regression - usually based on forwards selection or backwards elimination
Best subsets regression - considers all potential models
What are the potential pitfalls of multiple regression?
overfitting - large number of variables and small sample size
Typically need sample size to be at least 10 times larger than the number of variables considered
The final number of variables in the model should not be more than the square root of the sample size
Collinearity - when explanatory variables are highly correlated to each other. usually best to remove one before the regression
Describe logistic regression
A logistic regression model is appropriate when we are interested in modelling a binary response or dependent variable
This variable can be related to one or more risk factors or covariates that may either be categorical or continuous