17: Regression with different data Flashcards

1
Q

What is the Coefficient of Determination (R-square)?

A

Proportion of the variation in the dependent variable that is explained by regression model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Is R Squares so important?

A

Not so much, more want to close all backdoor paths (removing colliders, endogeneity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What happens to R-square when additional independent variables are added?

A

Generally raises the R-square

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the difference between Pearson R and R-square?

A

Pearson R is symmetrical; R-square is asymmetrical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In a normal regression table, can you comppare regresssion coefficients for indepedent variables

A

No as they have different unitss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In what cases can we compare regression coefficients directly?

A
  • Dummy variables (among themselves)
  • Standardized coefficients
  • Logged variables

Otherwise you compare one standard deviation increases to compare effects within a regression (multiplying by respective sd)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the purpose of standardizing regression coefficients?

A

To compare the relative importance of different variables in the regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What do standardized coefficients indicate?

A

How a one standard deviation increase in each independent variable affects the dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a standardised coefficient

A

Standardise all variabless to have a mean of 0 and s.d. of 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the norminal linear model and the impact of a 1 unit increase in x

A

y = a + bx + error

1 unit increrase in x leads to a b unit increase in y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the linear log model and impact of a 1% increase in x on y

A

y = a + b * log(x) + error

1% increase in x leads to a b/100 unit change in y

as log (1) = 0 so log(1.01) is close to 1/100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is log-linear model and the impact of a 1 unit increase in x

A

log(y) = a + bx + error

1 unit increase in x leads to B * 100% change in y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the log log model and what is the impact of a 1 unit increase in x

A

log (y) = a + blog(x) + error
1% increase in x leads to a b percentage change in y

elasticity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a cross-sectional regression?

A

Regression with a number of units at one point in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you know something is a cross sectional regression

A

as the subscript i denotes differences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Explain each component in this cross sectional regression

17
Q

Interpret the impact of 1 unit in Ln Population Density on IMR

A

Linear Log model so:

1% change in Pop. Density leads to a 0.15 (15.1/100) increase in infant mortality

18
Q

What is a critique of cross-sectional regression?

A

Need to be aware of spatial autocorrelation; each spatial region is not independent

An effect that exists across group may not predict change over time

Think carefully about potential omitted confounders

19
Q

What does time series regression analyze?

A

Observations of one place or thing over time

20
Q

What are lags and leads in time series regressions?

A
  • Lags: Variables in the previous year
  • Leads: Variables in the next year

Allows us to avoid reverse causality sometimes ( as Today’s Y cannot affect past X)

21
Q

What is a problem with time series regressions?

A

Spurious regressions:

  • If the values for two variables generally increase over time, we cannot be certain whether changes in x cause changes in y

both may be functions of time –> leads to spurious regression

need series that are stationary over time

22
Q

What is a panel regression?

A

Uses both cross-sectional and time series variation since panel data is data on multiple entities across multiple time periods

Panel variable is the unit, time variable is the time

23
Q

What is the purpose of fixed effect regressions?

A

Controlling for all unit variables not change over time (time invariant)

24
Q

What are time fixed effects used for in panel regressions? How are they implemented?

A

Control for variables related to time that are common across all units

24
How do fixed effects regressors work
Add a dummy variable for each unit in the regression Removes unit differences in average levels of dependent variables from the regression, just retains changes across units over yi captures time-invariant factors specific to each unit
25
What are the pros of panel regression?
* Increases sample size * Allows for more variation in regression * Can control for omitted unobserved variables
26
What are the cons of panel regression?
* Not all omitted variables are captured by fixed effects * Unit fixed effects do not account for time-varying variables * Time fixed effects do not account for unit-varying variables Interactions; Rs may change over time, Parameter Stabiity (coefficient should be same across al sub groups) Need a balanced panel
27
What is required for a balanced panel?
Same units for the same time periods with minimal missing values
28
What did Berger (2018) study in relation to slavery?
If slavery was a cause of low upward mobility in the United States today
29
What is the estimated relationship between historical slavery and upward mobility?
Stark negative correlation
30
What is the significance of control variables in the study by Berger (2018)?
Controls varying by commuting zone
31
What are the strengths of panel regressions compared to other types of data?
Allow us to control for a wide range of unobserved variables