- LRM is linear in its parameters - regressors = fixed/non-stochastic - exogeneity - expected value of error term = 0 given values of X - homoscedasticity - constant variance of each u given values of X - no multicollinearity - no linear relationship between regressions - u follows normal distribution

Descriptive Analysis and Linear Regression Flashcards by mallika kshatriya

Linear Regression Model

Yi = B1 + B2X2i + BkXki + ui
Yi = dependent variable
Xi = explanatory/independent/regressor
B1 = intercept/constant (average value of Y when X=0
B2 = slope coefficient

How well did you know this?

Not at all

Perfectly

stochastic error term

average effect of all unobserved variables

How well did you know this?

Not at all

Perfectly

objective of regression analysis

estimate values of Bs based on sample data

How well did you know this?

Not at all

Perfectly

OLS

Ordinary Least Squares - used to estimate regression coefficients
finds the pair of B1 and B2 (b1 and b2) that minimise RSS

How well did you know this?

Not at all

Perfectly

OLS assumptions

LRM is linear in its parameters
regressors = fixed/non-stochastic
exogeneity - expected value of error term = 0 given values of X
homoscedasticity - constant variance of each u given values of X
no multicollinearity - no linear relationship between regressions
u follows normal distribution

How well did you know this?

Not at all

Perfectly

OLS estimators are BLUE

best linear unbiased estimators

estimators are linear functions of Y
on average they are = to the true parameter values
they have minimum variance i.e. efficient

How well did you know this?

Not at all

Perfectly

standard deviation of error term =

standard error

= RSS/df

How well did you know this?

Not at all

Perfectly

n-k

degrees of freedom
n = sample size
k = no. of regressors

How well did you know this?

Not at all

Perfectly

hypothesis testing

construct Ho and Ha e.g B2 = 0 and B2 x 0

t = b2/se(b2)

How well did you know this?

Not at all

Perfectly

if t > cv from table

reject null

How well did you know this?

Not at all

Perfectly

type 1 error

incorrect rejection of true null

detecting an affect that is not present

How well did you know this?

Not at all

Perfectly

type 2 error

failure to reject false null

failing to detect present effect

How well did you know this?

Not at all

Perfectly

low p-value

suggests that estimated coefficient if statistically significance

How well did you know this?

Not at all

Perfectly

p-value < 0.01, 0.05, 0.1

statistically significant at 1%, 5%, 10% levels

How well did you know this?

Not at all

Perfectly

dummy variables

0 = absence
1 = presence

How well did you know this?

Not at all

Perfectly

e.g 1 if female, 0 if male

B2 would measure changes when you go from male to female
b1 = estimated wage for men
b2 = estimated diff btw men and women
b1+b2 = estimated wage for women

How well did you know this?

Not at all

Perfectly

if exogeneity assumption doesn’t hold

leads to bias estimates and therefore we need to adjust for omitted variables

How well did you know this?

Not at all

Perfectly

quadratic terms

Study These Flashcards

capture increasing/decreasing marginal effects

have to generate a new variable and add it to regression

marginal effect

Study These Flashcards

first derivative of regression functioned wrt variable of interest

interaction variable

Study These Flashcards

constructed by multiplying two regressors

allows the magnitude of the effect X has on Y to vary depending on the level of another X

interpreting

Study These Flashcards

how does the regression function respond to a change in a variable

if it is not linear (log-log)

Study These Flashcards

log-log model so that it is linear in parameters

take logs and add error term

log-lin model

Study These Flashcards

dependent variable in logs – %
explanatory variables in levels – units
B2 measures relative change in output Q for an absolute change in input

lin-log model

Study These Flashcards

estimates % growth in dependent variable for an absolute change in explanatory variable

lin-lin model

using a linear production function

testing for linear combinations

se -- t-stat -- compare to critical value -- create p-value -- reject/don't reject null

TSS

total sum of squares = ESS + RSS sum of squared deviations from the sample mean = how well we could predict outcome w/o any regressors

ESS

explained sum of squares = how much of that variation do our regressors predict

RSS

residual sum of squares = outcome variation that regressors don't explain

R^2

ESS/TSS overall measure of goodness-of-fit of the estimated regression line how much of variation is explained by regressors increases when u add more regressors

F-stat

tests significance of all coeffs (ESS/k-1) / (RSS/n-k) >critical value =reject null

dummy variable trap

situation of multicollinearity | to distinguish btw m categories we can only have m-1 dummies

perfect collinearity

perfect linear relationship between two or more regressors | one predictor variable can be used to predict another

imperfect collinearity

one dependent variable always equals to a linear combination of the other dependent variables plus a small error term

consequences of multicollinearity in the data

larger standard errors -- smaller t-ratio -- wider CI -- less likely to reject null

homoscedasticty

assumption that error term has has the same variance for all observations (doesn't always hold)

heteroscedasticity

error terms have unequal variances for different observations

consequences of heteroscedasticity

- OLS still consistent and unbiased - se either too large or too small so t-stats, F-stats, p-values etc will be wrong - OLS no longer efficient

dealing with heteroscedasticity

- use log transformation - keep using OLS and compute heteroscedasticty - weighted least squares

using a logarithmic transformation of the outcome variable

e.g. ln(wage) - these variables tend to have more variance at higher values

continuing to use OLS and computing heteroscedasticity - robust standard errors

regress y on x | corrects se to allow for heteroscedastcity

weighted least squares

more efficient than OLS in presences of heteroscedastcity

omission of relevant variables

they'll be captured by the error term | if they are correlated to the ones included then parameters are biased and exogeneity assumption doesn't hold

Descriptive Analysis and Linear Regression Flashcards

(43 cards)