Multiple regression and logistic regression Flashcards

1
Q

Describe multiple regression

A

with a single explanatory variable we would carry out a simple linear regression.
With several explanatory variables we can use a more general form of regression model that allows more than one explanatory variable at a time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the assumptions for multiple regression?

A
residuals are normally distributed 
residuals have mean 0
residuals have constant variance 
observations are independent 
error-free measurement of predictors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the regression line indicate?

A

The regression line indicated the nature of the relationship between the explanatory variables and the response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the coefficient for each variable?

A

it gives the expected change that an increase in one unit would give to the response assuming that the other variables do not change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What can the coefficients’ standard errors can be used for?

A

to create confidence intervals for the coefficients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the Ho in multiple regression?

A

each coefficient = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is the adjusted R squared used instead of the raw value in multiple regression?

A

The normal R squared will increase whenever another variable is added to the model regardless of whether the variable has any predictive ability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe adjusted R squared

A

most commonly used selection criteria

it adjusts the estimated proportion of variance explained by the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the other selection criteria used in multiple regression?

A

predicted R squared
Mallow’s C-p
Principle of parsimony

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

describe the predicted R-squared

A

cross validation

takes over fitting into account

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe Mallow’s C-p

A

penalises for having too many variables, the model with minimum C-p is most effective

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe the principle of parsimony

A

keep it simple

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are model searching methods?

A

stepwise regression - usually based on forwards selection or backwards elimination
Best subsets regression - considers all potential models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the potential pitfalls of multiple regression?

A

overfitting - large number of variables and small sample size
Typically need sample size to be at least 10 times larger than the number of variables considered
The final number of variables in the model should not be more than the square root of the sample size
Collinearity - when explanatory variables are highly correlated to each other. usually best to remove one before the regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe logistic regression

A

A logistic regression model is appropriate when we are interested in modelling a binary response or dependent variable
This variable can be related to one or more risk factors or covariates that may either be categorical or continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe multiple logistic regression

A

can be extended to examine the influence of large numbers of explanatory variables
need a large sample size
assumptions and error checking tricky - seek help