W4: GLM 1 Flashcards
What does the simple linear regression equation mean
yi = b0 + b1 * xi + ei
- yi = outcome variable
- b0 = intercept (expected y when x = 0)
*b1 = slope of line (how much y expected to change for 1 unit change in x)
*x = predictor / explanatory variable
*e = residual / error term
What direction will the level of line shift when intercept is positive (b0 >0) ?
Up
What direction will the level of line shift when intercept is negative (b0 < 0)?
Down
Does changes to intercept change the slope of line?
Doesn’t have to
What is the line of best fit?
- Line that minimizes sum of squared residuals
- Gives estimates of b0 and b1
What are residuals?
Difference between observed and predicted outcomes
What is a multiple linear regression?
- Model with more than 1 predictor
- Each predictor has independent associations with outcome
What does the multiple linear regression equation mean
yi = b0 + b1 * x1i + … + bk * xki + ei
- b0 = intercept (expected y when all predictors are 0)
- bk / b1 = slope (how much y is expected to change for 1 unit change in xk / x1, holding all other predictors constant)
*e = residual / error term
What do simple and multiple linear regressions assume the outcome to be?
Continuous and normally distributed
What do simple and multiple linear regressions assume the association between explanatory variables and outcome to be?
Linear association
What are generalized linear models (GLMs) used to extend…
To extend linear model to different outcomes
What are 3 examples of GLMs?
- Linear regression (continuous)
- Logistic / probit regression (binary)
- Poisson / Binomial regression (count)
How many parameters does normal distribution have and name their functions.
2 parameters
Mean : controls location for centre of distribution
SD: controls scale/spread of distribution
* N (mean, SD)
* standard normal distribution: N (0,1)
What is link function and inverse link function always called?
g() and g()-1
What is the difference between R^2 and adjusted R^2?
- R^2 : assumes all independent variables in model affects model results
- Adjusted R^2: better estimate of model (USE FOR INTERPRETATION),
considers only independent variables which actually have an effect on model performance
What does F-test / F-statistic do?
Tests whether model is statistically significant overall or not (all predictors tested simultaneously)
What should you do to categorical variables when including them in linear regression?
Dummy code them so it becomes numeric predictor (0s and 1s)
E.g 1 = female, 0 = male
How do you interpret the regression coefficient for sex that has been dummy coded?
It is the difference in predictor score on average between males (0) and females (1).
E.g Expected value of neuroticism at intercept (when predictors are 0) = male ppts scores
sex1 = difference between male + female ppts
What are 3 assumptions of linear regression model diagnostics?
- Normality (distribution) of residuals
- Independent observation
- Homogeneity of variance (spread/variance of residuals should be about equal)
What is the difference between b and beta in linear regression equation?
B = unstandardized coefficients (raw change in units)
beta = standardized coefficients (change in SDs)
What does b1 in the linear regression equation show?
The ___ between 2 variables
The direction and strength of relationship (reg coeff) between 2 variables
What does the inclusion of error term suggest for our data?
Data is real and not going to fall perfectly on the regression line.
Perfect model: eta = b0 + b1 * 1 (without ei)
What are 4 examples of linear models?
- T-test
- ANOVAs
- Pearson correlations
- Linear regressions
What is the type of y variable for linear regression?
Continuous and normally distributed
What is the type of y variable for logistic regression?
Binary (0 / 1, yes / no, T / F)
What is the type of y variable for Poisson regression?
Count (how many of something)
What is a probability distribution?
Distribution of probability of an outcome
E.g for a coin flip, prob distribution = 0.5 heads, 0.5 tails.
Assumption of y (outcome) being conditionally normal has a mean and SD of what?
What does this mean for the distribution of errors?
Mean as eta and SD as residuals
i.e N (eta, ohm(residuals)
Also means that errors are normally distributed with mean of 0 and some SD
i.e N(0, ohm/SD of residuals)
What does the acronym L.I.N.E represent for the assumption of normality?
Linear r-ship
Independent variables (observations) and errors (uncorrelated)
Normally distributed errors (with a mean of 0, random)
Equal variance of errors
What does GLMs do?
Uses some function to transform/link eta from linear space to outcome space
Is there a link function in linear regression?
No, it’s already in linear space so it’s called the identity function
What kind of model does lm() fit?
Linear model
lm ( outcome (dependent variable) ~ predictor, data = d )
Is the estimate of predictor variable from lm() output standardized?
No, it is the unstandardized coefficient.
What does the shaded region on visreg graphs show?
95% confidence intervals
What does the QQ plot / deviates plot from modelDiagnostics() test for?
Extreme outliers (solid black)
What does the density plot from modelDiagnostics test for?
Normal distribution of residuals
What is the equation for the effect size, R^2?
variance explained / total variance
What is the equation of cohen’s f^2 (effect size) for linear regression models?
R^2 / 1 - R^2 (variance not explained)
What is the equation of cohen’s f^2 (effect size) for multiple regression models (individual predictor)?
R^2AB - R^2A / 1 - R^2AB
- (R^2AB - R^2A) = difference in the coefficient of determination (variance) between the full model (including all independent variables) and a reduced model (subset of independent variables)
- (1 - R^2AB) = unexplained variance/residual variance by the model
How do you show the inclusion of main effects when including interaction term in lm() equation?
Example:
neuroticism = b0 + (b1 * stress) + (b2 * sex) + (b3* stress * sex)
When plotting a continuous moderator, what are the values used for breaks() in visreg?
breaks = c( mean - 1 SD, mean + 1SD)
If we have more than 2 predictors (multiple linear regression) what kind of best fit do we have?
A plane of best fit (3D)
Nothing is truly linear. Regression models are a simplification of _____
reality
Normal distribution is also known as the ______ distribution
Gaussian
p-value can be used to determine effect size and magnitude (strength of relationship).
True or false?
False, just used to see if it’s above or below our determined threshold (significance)
If the data is derived from siblings or repeated measures, what assumption does it violate?
Assumption of independent (variables) observations and errors (bc they would be correlated)
If the loess smooth line is not flat and about 0, what does it indicate?
Systematic bias in residuals
What is a transformation you do if the assumption of homogeneity is violated?
Remove extreme values