Lectures 7 and up: Pearson's R, Simple & Multiple linear regressions, Anova, etc Flashcards
If a correlation answers the questions of:
1) Are two variables related or associated?
2) If so, how strong is the relationship?
Then what is the “added value” of a simple linear regression (SLR)? (Two concepts)
3) What is the precise nature of the relationship and how much will Y change with a given change in X.
4) How much of a change in Y is explained by a specific change in X.
Explain what each variable represents in the equation for a line / linear regression:
Y=a + bX
y = dependent/response variable a = intercept (the value of Y when X=0) b = the regression coefficient (slope of the line) X = independent/explanatory variable
In an SLR, what value determines how much of a change in Y is explained by a specific change in X?
R-squared (the coefficient of determination)
This measures the “goodness of the fit” - how well the regression line fits the data (i.e. how close the points of a scatter plot are to the reg line). Always between 0 and 1, closer to 1 being a “stronger” fit.
True or False: When estimating the strength of association between two variables, the coefficient of correlation is symmetrical (i.e. it is the same for the corr of X with Y and Y with X)
True
In an SLR, is symmetry seen in the regression coefficient in the same way a coefficient of correlation is symmetrical?
No, therefore it is important to distinguish which is your independent variable and which is your dependent.
Regression coefficients and intercepts of
[reg var1 var2] are not equal to [reg var2 var1]
In the SLR example below, what does R-squared tell us?
Investment Rates = y
Policy Uncertainty Index = x
R-sqaured = 0.4629
That 46% of the change in investment rates can be explained by policy uncertainty.
Why is it important to keep in mind your units of measurement when performing an SLR? i.e. population # in millions vs percentage of population
It will effect the size of your coefficient.
Example: A city that is 1 million people larger will experience a 1% rise in homelessness (larger coefficient)
VS.
A city that is 1 PERSON larger will experience a rise in homelessness of one-millionth of a percent (smaller coefficient)
A SLR examines the relationship between two or more variables: True or False
False: relationship between TWO and only two variables - not more
What does the standardized regression coefficient tell us and would it best be used for an SLR or MLR?
Tells us the “ranking of impact” among IV’s. Only useful when you have multiple IV’s, thus it should be used only in MLR’s
Why is the Adjusted R-squared always lower than the “regular” R-squared?
The adjusted R-squared accounts for the number of IV’s in an MLR. The more IV’s, the lower your adjusted R-squared value will be.
If you’re really just interested in the relationship between two variables, say education and income, why bother doing an MLR with the introduction of more IV’s?
Because it allows you to model reality more closely, as there are often multiple reasons that explain a particular phenomenon
What is Mirjam’s favorite word?
Eeeeeeeexactly….. exactly.
Does this mean we will all pass this bloody class???
Eeeeeeexactly :)
When would it make sense to specify an interaction between two variables in an MLR?
If the variables are likely to have similar impacts on the DV.
An interaction “describes a situation in which the simultaneous influence of two variables on a third is not additive”