17: Regression with different data Flashcards
What is the Coefficient of Determination (R-square)?
Proportion of the variation in the dependent variable that is explained by regression model
Is R Squares so important?
Not so much, more want to close all backdoor paths (removing colliders, endogeneity)
What happens to R-square when additional independent variables are added?
Generally raises the R-square
What is the difference between Pearson R and R-square?
Pearson R is symmetrical; R-square is asymmetrical
In a normal regression table, can you comppare regresssion coefficients for indepedent variables
No as they have different unitss
In what cases can we compare regression coefficients directly?
- Dummy variables (among themselves)
- Standardized coefficients
- Logged variables
Otherwise you compare one standard deviation increases to compare effects within a regression (multiplying by respective sd)
What is the purpose of standardizing regression coefficients?
To compare the relative importance of different variables in the regression
What do standardized coefficients indicate?
How a one standard deviation increase in each independent variable affects the dependent variable
What is a standardised coefficient
Standardise all variabless to have a mean of 0 and s.d. of 1
What is the norminal linear model and the impact of a 1 unit increase in x
y = a + bx + error
1 unit increrase in x leads to a b unit increase in y
What is the linear log model and impact of a 1% increase in x on y
y = a + b * log(x) + error
1% increase in x leads to a b/100 unit change in y
as log (1) = 0 so log(1.01) is close to 1/100
What is log-linear model and the impact of a 1 unit increase in x
log(y) = a + bx + error
1 unit increase in x leads to B * 100% change in y
What is the log log model and what is the impact of a 1 unit increase in x
log (y) = a + blog(x) + error
1% increase in x leads to a b percentage change in y
elasticity
What is a cross-sectional regression?
Regression with a number of units at one point in time
How do you know something is a cross sectional regression
as the subscript i denotes differences
Explain each component in this cross sectional regression
Interpret the impact of 1 unit in Ln Population Density on IMR
Linear Log model so:
1% change in Pop. Density leads to a 0.15 (15.1/100) increase in infant mortality
What is a critique of cross-sectional regression?
Need to be aware of spatial autocorrelation; each spatial region is not independent
An effect that exists across group may not predict change over time
Think carefully about potential omitted confounders
What does time series regression analyze?
Observations of one place or thing over time
What are lags and leads in time series regressions?
- Lags: Variables in the previous year
- Leads: Variables in the next year
Allows us to avoid reverse causality sometimes ( as Today’s Y cannot affect past X)
What is a problem with time series regressions?
Spurious regressions:
- If the values for two variables generally increase over time, we cannot be certain whether changes in x cause changes in y
both may be functions of time –> leads to spurious regression
need series that are stationary over time
What is a panel regression?
Uses both cross-sectional and time series variation since panel data is data on multiple entities across multiple time periods
Panel variable is the unit, time variable is the time
What is the purpose of fixed effect regressions?
Controlling for all unit variables not change over time (time invariant)
What are time fixed effects used for in panel regressions? How are they implemented?
Control for variables related to time that are common across all units