Linear Regression Flashcards by Nicole Geist

interpret constant & coeffcient

constant = when education equals 0, income is 457
coeffcient = with every year of education the mean income increases by 104

How well did you know this?

Not at all

Perfectly

What is a prediction error?

How well did you know this?

Not at all

Perfectly

Lin regression assumptions:

Linear relationship
Multivariate normality (all variables need to be normally distributed –> When the data is not normally distributed a non-linear transformation e.g., log-transformation might fix this issue)
No or little multicollinearity (If multicollinearity is found in the data, centering the data (that is deducting the mean of the variable from each score) might help to solve the problem. However, the simplest way to address the problem is to remove independent variables with high VIF values)
No auto-correlation (Autocorrelation occurs when the residuals are not independent from each other. For instance, this typically occurs in stock prices, where the price is not independent from the previous price)
Homoscedasticity (The scatter plot is good way to check whether the data are homoscedastic (meaning the residuals are equal across the regression line).

How well did you know this?

Not at all

Perfectly

How does OLS work?

How well did you know this?

Not at all

Perfectly

What is r^2?

How well did you know this?

Not at all

Perfectly

How is R^2 calculated?

How well did you know this?

Not at all

Perfectly

What is the loss function for linear regression?

How well did you know this?

Not at all

Perfectly

What does a lin regression model predict?

The mean value of y for a given value of x
(No probabilities, it’s a model of the mean)

How well did you know this?

Not at all

Perfectly

How can we make a constant more meaningful? (1)

centering: usually mean centered (subtract -12,5 years from years of education)

How well did you know this?

Not at all

Perfectly

How can we make a constant more meaningful? (2)

standardizing: subtract mean / by SD

How well did you know this?

Not at all

Perfectly

1) For every 1-SD of education, mean of income rises by 402
2) For every 1-SD of education, mean of income rises by 0,3 SD of mean income

How well did you know this?

Not at all

Perfectly

Why would you want to standardize?

Allows comparison

How well did you know this?

Not at all

Perfectly

How are the standardized coeffcients also called?

How well did you know this?

Not at all

Perfectly

What is true about correlations?

1) Standardizing gets rid of scale –> whole point
3) Perfect correlation = 0 error
4) just not linear -> just a measure for linear relationships!

How well did you know this?

Not at all

Perfectly

Why would we even need a regression, why not only calculate the conditional means?

1) reduce noise -> virtue of abstraction
2) prediction even for data that is not there
3) allows for more control i.e. mediation, moderation, controls, etc.

How well did you know this?

Not at all

Perfectly

Why do we square residuals in r^2?

1) prevent cancelling out
2) bigger penalty for large residuals

How well did you know this?

Not at all

Perfectly

no threshold will do
similar rationale with alpha value
highly noisy data in SS how could we possibly achieve it? or should even want to?

How well did you know this?

Not at all

Perfectly

Is my R² too low?

Low R-Squared is often good BUT also a limitation

How well did you know this?

Not at all

Perfectly

Is my R² too high?

High R-Squared is often not good
BUT can be

How well did you know this?

Not at all

Perfectly

Why a Low R-Squared is often good

How well did you know this?

Not at all

Perfectly

Why a Low R-Squared is also a limitation

Study These Flashcards

Why a High R-Squared is often not good

Study These Flashcards

Why a High R-Squared can be good

Study These Flashcards

very accurate prediction if really captures the relationship

What do we need to control for?

Study These Flashcards

How can Parent's SES influence education --> income?

Confounding as well as Mediation, interlaced

Income achieved or inhereted?

Reduction of FISEI much more than reduction of education --> rather achieved

When do we need to put in a control variable?

1) not a good idea --> kitchen sink approach leads to overfitting (unless you want a really good in-sample prediction) 2) not enough - not clear in which direction the correlation works

What is the collider bias?

When an exposure and an outcome independently cause a third variable --> ‘collider’. Inappropriately controlling for a collider variable, can induce a distorted association between the exposure and outcome, when in fact none exists.

What is happening here?

Expectation: predictor variable will get smaller when other variables are added to the regression model BUT sometimes a coefficient gets larger when other variables are added --> Special case of confounding: Surpression Usuallyoccurs when there's an inconsistency of signs: - younger people more education (edu expansion) - older people more income (curvilinear rel) --> Age neg rel to edu BUT pos rel to income

Which of these var could be overcontrolled?

First three could be mediators, gender needs to be controlled bc moderator

Guidelines for selecting explanatory variables:

Gander Pay Gap - Argumentation makes sense?

Conditioning on a mediator NOT a control Why would you look at the gap after cancelling out reasons why it exists?

What could be other mediators for gender --> income?

How do you assess a mediator?

How can you interpret the child effect theoretically?

Having child maybe relates to a higher income group as costly decision also, usually older when having a child could also be usually shared income when having a child

How does an interaction effect look like for gender --> income with having a child or not

Symmetrical effect of child effect differed by gender OR gender effect differed by child status