W4: GLM 1 Flashcards
What does the simple linear regression equation mean
yi = b0 + b1 * xi + ei
- yi = outcome variable
- b0 = intercept (expected y when x = 0)
*b1 = slope of line (how much y expected to change for 1 unit change in x)
*x = predictor / explanatory variable
*e = residual / error term
What direction will the level of line shift when intercept is positive (b0 >0) ?
Up
What direction will the level of line shift when intercept is negative (b0 < 0)?
Down
Does changes to intercept change the slope of line?
Doesn’t have to
What is the line of best fit?
- Line that minimizes sum of squared residuals
- Gives estimates of b0 and b1
What are residuals?
Difference between observed and predicted outcomes
What is a multiple linear regression?
- Model with more than 1 predictor
- Each predictor has independent associations with outcome
What does the multiple linear regression equation mean
yi = b0 + b1 * x1i + … + bk * xki + ei
- b0 = intercept (expected y when all predictors are 0)
- bk / b1 = slope (how much y is expected to change for 1 unit change in xk / x1, holding all other predictors constant)
*e = residual / error term
What do simple and multiple linear regressions assume the outcome to be?
Continuous and normally distributed
What do simple and multiple linear regressions assume the association between explanatory variables and outcome to be?
Linear association
What are generalized linear models (GLMs) used to extend…
To extend linear model to different outcomes
What are 3 examples of GLMs?
- Linear regression (continuous)
- Logistic / probit regression (binary)
- Poisson / Binomial regression (count)
How many parameters does normal distribution have and name their functions.
2 parameters
Mean : controls location for centre of distribution
SD: controls scale/spread of distribution
* N (mean, SD)
* standard normal distribution: N (0,1)
What is link function and inverse link function always called?
g() and g()-1
What is the difference between R^2 and adjusted R^2?
- R^2 : assumes all independent variables in model affects model results
- Adjusted R^2: better estimate of model (USE FOR INTERPRETATION),
considers only independent variables which actually have an effect on model performance
What does F-test / F-statistic do?
Tests whether model is statistically significant overall or not (all predictors tested simultaneously)
What should you do to categorical variables when including them in linear regression?
Dummy code them so it becomes numeric predictor (0s and 1s)
E.g 1 = female, 0 = male
How do you interpret the regression coefficient for sex that has been dummy coded?
It is the difference in predictor score on average between males (0) and females (1).
E.g Expected value of neuroticism at intercept (when predictors are 0) = male ppts scores
sex1 = difference between male + female ppts
What are 3 assumptions of linear regression model diagnostics?
- Normality (distribution) of residuals
- Independent observation
- Homogeneity of variance (spread/variance of residuals should be about equal)