W4: GLM 1 Flashcards

Question 1

Q

What does the simple linear regression equation mean
yi = b0 + b1 * xi + ei

Answer

A

yi = outcome variable
b0 = intercept (expected y when x = 0)
*b1 = slope of line (how much y expected to change for 1 unit change in x)
*x = predictor / explanatory variable
*e = residual / error term

Question 2

Q

What direction will the level of line shift when intercept is positive (b0 >0) ?

Question 3

Q

What direction will the level of line shift when intercept is negative (b0 < 0)?

Question 4

Q

Does changes to intercept change the slope of line?

Answer

A

Doesn’t have to

Question 5

Q

What is the line of best fit?

Answer

A

Line that minimizes sum of squared residuals
Gives estimates of b0 and b1

Question 6

Q

What are residuals?

Answer

A

Difference between observed and predicted outcomes

Question 7

Q

What is a multiple linear regression?

Answer

A

Model with more than 1 predictor
Each predictor has independent associations with outcome

Question 8

Q

What does the multiple linear regression equation mean
yi = b0 + b1 * x1i + … + bk * xki + ei

Answer

A

b0 = intercept (expected y when all predictors are 0)
bk / b1 = slope (how much y is expected to change for 1 unit change in xk / x1, holding all other predictors constant)
*e = residual / error term

Question 9

Q

What do simple and multiple linear regressions assume the outcome to be?

Answer

A

Continuous and normally distributed

Question 10

Q

What do simple and multiple linear regressions assume the association between explanatory variables and outcome to be?

Answer

A

Linear association

Question 11

Q

What are generalized linear models (GLMs) used to extend…

Answer

A

To extend linear model to different outcomes

Question 12

Q

What are 3 examples of GLMs?

Answer

A

Linear regression (continuous)
Logistic / probit regression (binary)
Poisson / Binomial regression (count)

Question 13

Q

How many parameters does normal distribution have and name their functions.

Answer

A

2 parameters
Mean : controls location for centre of distribution
SD: controls scale/spread of distribution
* N (mean, SD)
* standard normal distribution: N (0,1)

Question 14

Q

What is link function and inverse link function always called?

Answer

A

g() and g()-1

Question 15

Q

What is the difference between R^2 and adjusted R^2?

Answer

A

R^2 : assumes all independent variables in model affects model results
Adjusted R^2: better estimate of model (USE FOR INTERPRETATION),
considers only independent variables which actually have an effect on model performance

Question 16

Q

What does F-test / F-statistic do?

Answer

A

Tests whether model is statistically significant overall or not (all predictors tested simultaneously)

Question 17

Q

What should you do to categorical variables when including them in linear regression?

Answer

A

Dummy code them so it becomes numeric predictor (0s and 1s)
E.g 1 = female, 0 = male

Question 18

Q

How do you interpret the regression coefficient for sex that has been dummy coded?

Answer

A

It is the difference in predictor score on average between males (0) and females (1).
E.g Expected value of neuroticism at intercept (when predictors are 0) = male ppts scores
sex1 = difference between male + female ppts

Question 19

Q

What are 3 assumptions of linear regression model diagnostics?

Answer

A

Normality (distribution) of residuals
Independent observation
Homogeneity of variance (spread/variance of residuals should be about equal)

Question 20

Q

What is the difference between b and beta in linear regression equation?

Answer

A

B = unstandardized coefficients (raw change in units)
beta = standardized coefficients (change in SDs)

Question 21

Q

What does b1 in the linear regression equation show?
The ___ between 2 variables

Answer

A

The direction and strength of relationship (reg coeff) between 2 variables

Question 22

Q

What does the inclusion of error term suggest for our data?

Answer

A

Data is real and not going to fall perfectly on the regression line.
Perfect model: eta = b0 + b1 * 1 (without ei)

Question 23

Q

What are 4 examples of linear models?

Answer

A

T-test
ANOVAs
Pearson correlations
Linear regressions

Question 24

Q

What is the type of y variable for linear regression?

Answer

A

Continuous and normally distributed

Question 25

Q

What is the type of y variable for logistic regression?

Answer

A

Binary (0 / 1, yes / no, T / F)

Question 26

Q

What is the type of y variable for Poisson regression?

Answer

A

Count (how many of something)

Question 27

Q

What is a probability distribution?

Answer

A

Distribution of probability of an outcome
E.g for a coin flip, prob distribution = 0.5 heads, 0.5 tails.

Question 28

Q

Assumption of y (outcome) being conditionally normal has a mean and SD of what?
What does this mean for the distribution of errors?

Answer

A

Mean as eta and SD as residuals
i.e N (eta, ohm(residuals)
Also means that errors are normally distributed with mean of 0 and some SD
i.e N(0, ohm/SD of residuals)

Question 29

Q

What does the acronym L.I.N.E represent for the assumption of normality?

Answer

A

Linear r-ship
Independent variables (observations) and errors (uncorrelated)
Normally distributed errors (with a mean of 0, random)
Equal variance of errors

Question 30

Q

What does GLMs do?

Answer

A

Uses some function to transform/link eta from linear space to outcome space

Question 31

Q

Is there a link function in linear regression?

Answer

A

No, it’s already in linear space so it’s called the identity function

Question 32

Q

What kind of model does lm() fit?

Answer

A

Linear model
lm ( outcome (dependent variable) ~ predictor, data = d )

Question 33

Q

Is the estimate of predictor variable from lm() output standardized?

Answer

A

No, it is the unstandardized coefficient.

Question 34

Q

What does the shaded region on visreg graphs show?

Answer

A

95% confidence intervals

Question 35

Q

What does the QQ plot / deviates plot from modelDiagnostics() test for?

Answer

A

Extreme outliers (solid black)

Question 36

Q

What does the density plot from modelDiagnostics test for?

Answer

A

Normal distribution of residuals

Question 37

Q

What is the equation for the effect size, R^2?

Answer

A

variance explained / total variance

Question 38

Q

What is the equation of cohen’s f^2 (effect size) for linear regression models?

Answer

A

R^2 / 1 - R^2 (variance not explained)

Question 39

Q

What is the equation of cohen’s f^2 (effect size) for multiple regression models (individual predictor)?

Answer

A

R^2AB - R^2A / 1 - R^2AB

(R^2AB - R^2A) = difference in the coefficient of determination (variance) between the full model (including all independent variables) and a reduced model (subset of independent variables)
(1 - R^2AB) = unexplained variance/residual variance by the model

Question 40

Q

How do you show the inclusion of main effects when including interaction term in lm() equation?

Answer

A

Example:
neuroticism = b0 + (b1 * stress) + (b2 * sex) + (b3* stress * sex)

Question 41

Q

When plotting a continuous moderator, what are the values used for breaks() in visreg?

Answer

A

breaks = c( mean - 1 SD, mean + 1SD)

Question 42

Q

If we have more than 2 predictors (multiple linear regression) what kind of best fit do we have?

Answer

A

A plane of best fit (3D)

Question 43

Q

Nothing is truly linear. Regression models are a simplification of _____

Question 44

Q

Normal distribution is also known as the ______ distribution

Question 45

Q

p-value can be used to determine effect size and magnitude (strength of relationship).
True or false?

Answer

A

False, just used to see if it’s above or below our determined threshold (significance)

Question 46

Q

If the data is derived from siblings or repeated measures, what assumption does it violate?

Answer

A

Assumption of independent (variables) observations and errors (bc they would be correlated)

Question 47

Q

If the loess smooth line is not flat and about 0, what does it indicate?

Answer

A

Systematic bias in residuals

Question 48

Q

What is a transformation you do if the assumption of homogeneity is violated?

Answer

A

Remove extreme values