L12: Linear regression Flashcards
How do we estimate the effect of independent variable on dependent variable?
Using regression analysis.
If dependent variable (y) is a continuous variable –> linear regression
y is ordinal –> ordinal regression
y is nominal (dichotomas) –> logistic regression
What is simple vs multiple regression?
Simple –> only 1 independent x variable
Multiple/multivariable –> >1 independent x variable
What is the difference between correlation and simple linear regression?
Correlation:
- quantifies the degree to which two variables are related, provided that the relationship is linear
- makes no distinction between the two variables (treated symmetrically)
Simple linear regression:
- determines the best fitting straight line to investigate the change in dependent variable y (continuous) that corresponds to a given change in independent variable x (continuous, ordinal or nominal), provided that there is significant correlation.
- two variables are assymmetrical
Can we extrapolate beyond the observed range of values?
No, do not extrapolate the regression line beyond the observed range.
What is the general equation for simple linear regression model?
y = alpha + beta (x)
alpha = y intercept = mean value of y when x = 0 beta = slope = change in the MEAN value of y when there is a one-unit change in x
What does the simple linear regression model use?
Method of least squares (smallest residual sum of squares)
How do we evaluate the goodness of fit of the simple linear regression model?
- using the coefficient of determination (R2)
- in simple linear regression, R2 = r2 (r is the pearson product moment correlation coefficient)
- R2: proportion of variability among the observed values of y that is explained by the linear regression model
- range of R2 from 0 to 1 (bc r is from -1 to 1)
- if R2 = 1 –> all data pts lie exactly on the best fitting line
- if R2 = 0, there is no linear relationship between x and y
How to check for statistical significance of linear regression model?
Look at p-value for the beta
What is multiple linear regression?
- when there is >1 independent variable –> have multiple betas
What does multiple linear regression use?
Method of least squares
When are dummy variables used?
Used when we have nominal independent variables
How many dummy variables do we need for a nominal variable with k categories?
k-1
How do we evaluate the goodness of fit of the multiple linear regression model?
- inspect the coefficient of determination (R2)
- R2 is the proportion of variability among the observed values of y that is explained by the linear regression model containing the set of independent variables
- range of values is 0 to 1
How do we compare between models that contain different numbers of independent variables?
Compare the adjusted R2
(adjusted R2 increases when inclusion of independent variable improves the ability to predict y, and decreases when it does not)
-HOWEVER, adjusted R2 cannot be directly interpreted as proportion of variability among observed values of y explained by the linear regression model.
What does beta represent in a multiple linear regression model?
e.g.
For every 1 unit increase in x, the MEAN y will increase/decrease by ___, after controlling for other x variables (keeping them constant).