Linear Regression Flashcards

1
Q

interpret constant & coeffcient

A

constant = when education equals 0, income is 457
coeffcient = with every year of education the mean income increases by 104

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a prediction error?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Lin regression assumptions:

A
  1. Linear relationship
  2. Multivariate normality (all variables need to be normally distributed –> When the data is not normally distributed a non-linear transformation e.g., log-transformation might fix this issue)
  3. No or little multicollinearity (If multicollinearity is found in the data, centering the data (that is deducting the mean of the variable from each score) might help to solve the problem. However, the simplest way to address the problem is to remove independent variables with high VIF values)
  4. No auto-correlation (Autocorrelation occurs when the residuals are not independent from each other. For instance, this typically occurs in stock prices, where the price is not independent from the previous price)
  5. Homoscedasticity (The scatter plot is good way to check whether the data are homoscedastic (meaning the residuals are equal across the regression line).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does OLS work?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is r^2?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is R^2 calculated?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the loss function for linear regression?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does a lin regression model predict?

A

The mean value of y for a given value of x
(No probabilities, it’s a model of the mean)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can we make a constant more meaningful? (1)

A

centering: usually mean centered (subtract -12,5 years from years of education)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can we make a constant more meaningful? (2)

A

standardizing: subtract mean / by SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
A

1) For every 1-SD of education, mean of income rises by 402
2) For every 1-SD of education, mean of income rises by 0,3 SD of mean income

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why would you want to standardize?

A

Allows comparison

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How are the standardized coeffcients also called?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is true about correlations?

A

1) Standardizing gets rid of scale –> whole point
3) Perfect correlation = 0 error
4) just not linear -> just a measure for linear relationships!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why would we even need a regression, why not only calculate the conditional means?

A

1) reduce noise -> virtue of abstraction
2) prediction even for data that is not there
3) allows for more control i.e. mediation, moderation, controls, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why do we square residuals in r^2?

A

1) prevent cancelling out
2) bigger penalty for large residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q
A
  • no threshold will do
  • similar rationale with alpha value
  • highly noisy data in SS how could we possibly achieve it? or should even want to?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Is my R² too low?

A

Low R-Squared is often good BUT also a limitation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Is my R² too high?

A

High R-Squared is often not good
BUT can be

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Why a Low R-Squared is often good

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Why a Low R-Squared is also a limitation

A
22
Q

Why a High R-Squared is often not good

A
23
Q

Why a High R-Squared can be good

A

very accurate prediction if really captures the relationship

24
Q

What do we need to control for?

A
25
Q

How can Parent’s SES influence education –> income?

A

Confounding as well as Mediation, interlaced

26
Q

Income achieved or inhereted?

A

Reduction of FISEI much more than reduction of education –> rather achieved

27
Q

When do we need to put in a control variable?

A

1) not a good idea –> kitchen sink approach leads to overfitting (unless you want a really good in-sample prediction)
2) not enough - not clear in which direction the correlation works

28
Q

What is the collider bias?

A

When an exposure and an outcome independently cause a third variable –> ‘collider’. Inappropriately controlling for a collider variable, can induce a distorted association between the exposure and outcome, when in fact none exists.

29
Q
A
30
Q

What is happening here?

A

Expectation: predictor variable will get smaller when other variables are added to the regression model BUT sometimes a coefficient gets larger when other variables are added –> Special case of confounding: Surpression

Usuallyoccurs when there’s an inconsistency of signs:
- younger people more education (edu expansion)
- older people more income (curvilinear rel)
–> Age neg rel to edu BUT pos rel to income

31
Q
A
32
Q

Which of these var could be overcontrolled?

A

First three could be mediators, gender needs to be controlled bc moderator

33
Q

Guidelines for selecting explanatory variables:

A
34
Q

Gander Pay Gap
- Argumentation makes sense?

A

Conditioning on a mediator NOT a control
Why would you look at the gap after cancelling out reasons why it exists?

35
Q

What could be other mediators for gender –> income?

A
36
Q

How do you assess a mediator?

A
37
Q

How can you interpret the child effect theoretically?

A

Having child maybe relates to a higher income group as costly decision
also, usually older when having a child
could also be usually shared income when having a child

38
Q

How does an interaction effect look like for gender –> income with having a child or not

A

Symmetrical effect of child effect differed by gender OR gender effect differed by child status

39
Q
A

Main effects are defined for the
interacting variable equalling zero

40
Q
A
41
Q

How would this reg table look in a normal table?

A
42
Q

Why are margins larger at the beginning/end?

A

fewer people in sample with 0 or 30 years of education

43
Q

Why is different with standardizing?

A

more meaningful

44
Q
A
45
Q

Why is the effect more important than the statistical significance?

A

Can be not significant at 0 but highly significant at other ages

46
Q

Key takeaways from visualizations

A
47
Q

Three types of plots

A
  1. Coefficient plot
  2. Profile plot
  3. Conditional effect plot
48
Q

What is a coeffcient plot?

A
49
Q

What is a profile plot?

A
50
Q

What is a conditional effects plot?

A

Often used for interactions