Midterm Flashcards

1
Q

Causal relationship

A

a change in one variable (action) CAUSES change in another variable (result)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Correlation

A

the change between X and Y can partially be explained by other factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Error Term

A
  • Deviation of the observed Y from the true line
  • Represented by εi in structural equation
  • A theoretical representation of unobserved variables that explains for the remaining change not interpreted by the model (omitted variable absorbed by error term)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Residual

A
  • Deviation of the observed Y from the estimated line
  • Calculated by e_i = Y_i – (Y_i )̂
  • *Oberserved - Estimated**
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

R^2

A

Goodness of Fit

  • Ranges from 0 – 1
  • Closer to 1 = better fit
  • Adjusted R2: Includes “penalty” for adding additional regressors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

null hypothesis

A

The null hypothesis states “no difference” or “no effect”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

alternative hypothesis

A

The alternative hypothesis states there is a difference/effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

T-test

A
  • If the absolute value of the t-stat is bigger than the critical value (e.g. 1.96) it means we can reject the null hypothesis and accept the alternative that the true coefficient is not zero
  • *our variable is statistically significant at the 5% level of significance
  • This also means the p-value is smaller than 5% (0.05).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

T-test formula

A

Divide the coefficient by the standard error to get the t-value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

F-test

A

Test a set of regression coefficients for joint significance

  • H0: β1 = β2 = β3= 0 (ALL coefficients = 0)
  • HA: β1 ≠ 0 OR β2 ≠ 0 OR β3 ≠ 0 (at least 1 coefficient NOT equal to 0)

F-stat > Critical Value = Reject the Null
(p-value of F lower than the level of significance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

F-test formula

A

You want the F-stat high & probability low

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Interpreting Coefficients:

Level-Level

A

Y = β1 X1

on average a one-unit increase in X is associated with a β1-unit increase in Y, holding all else constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Interpreting Coefficients:

Log-Level

A

lnY= β1 X1

on average a one-unit increase in X is associated with a β1% increase in Y, holding all else constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Interpreting Coefficients:

Level-Log

A

Y= β1 lnX1

on average a 1% increase in X is associated with a β1-unit increase in Y, holding all else constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Interpreting Coefficients:

Log-Log

A

lnY= β1 lnX1

on average, a 1% increase in X is associated with a β1% increase in Y, holding all else constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Dummy/binary variable

A

Only has two possible values – e.g. X = 1 if female; X= 0 is male

Y = B0 + B1female

Ex: On average, being female is associated with a B1 difference in Y compared to male, holding all else constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Categorical Variable

A

A variable like “region” has multiple values (south, west, northeast, midwest) that should be transformed into individual dummy (0 or 1) variables

Y = B0 + B1south + B2west + B3 northeast

Ex: On average, living in the South is associated with a B1 change in Y compared to the Midwest, holding all else constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Interaction term

A

An independent variable in a regression equation that is the multiple of two or more other independent variables. Each interaction term has its own regression coefficient

Does the effect of work experience on salary differ between males and females?

Y = B0 +B1Experience + B2Female + B3(Experience*Female) + e

Ex: On average, a one-unit increase in experience has a B3 difference in Y for females compared to males, holding all else constant

This allows the effect of experience on income to vary by gender

B3 now measures the effect of an additional year of experience for females relative to males

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

7 Classical Assumptions

A
  1. Regression model is linear (in B’s), correctly specified, and has an additive error term
  2. The error term has a population mean of zero
  3. The explanatory variables are not correlated with the error term
  4. Observations of the error term are not correlated
  5. The error term has a constant variance
  6. The regressors are uncorrelated with each other
  7. Error term is normally distributed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Omitted Variable Bias

A

Y = β0 + β1X1 +e

where error term absorbs an omitted variable X2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Variable Inclusion Criteria

A

Theory: is there sound justification for including the variable?

Bias: do the coefficients for other variables change noticeably when the variable is included?

T-Test: is the variable’s estimated coefficient statistically significant?

R-square: has the R-square (adjusted R-square) improved?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

First-order serial correlation

A

occurs when the value of the error term in one period is a function of its value in the previous period; the current error term is correlated with the previous error term.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

DW Test

A

compare DW(d) to the critical values (𝐝_𝐋, 𝐝_𝐔)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Newey-West Standard Errors

A

-Designed to correct for the consequences of first-order serial correlation; they are technically still biased, but are more accurate than OLS standard errors so they can be used for t-tests and other hypothesis tests

Newey-West SE > OLS SE

-Larger standard errors produce lower t-scores, so coefficients won’t be as statistically significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Heteroskedasticity

A

happens when the standard errors of a variable, monitored over a specific amount of time, are non-constant. With heteroskedasticity, the tell-tale sign upon visual inspection of the residual errors is that they will tend to fan out over time, as depicted in the image below.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Pure Heteroskedasticity

A

occurs in correctly specified equations

27
Q

Impure Heteroskedasticity

A

arises due to model misspecification

28
Q

Multicollinearity

A

state of very high intercorrelations or inter-associations among the independent variables. It is therefore a type of disturbance in the data, and if present in the data the statistical inferences made about the data may not be reliable.

29
Q

Perfect Multicollinearity

A

virtually always the result of a definitional relationship between the independent variables, and is solved by dropping variables from the regression.

30
Q

Imperfect Multicollinearity

A

describes the existence of a strong (but not exact) linear relationship between two or more independent variables that can significantly affect the estimates of coefficients.

31
Q

Multicollinearity

A

Multicollinearity exists in every equation & the severity can change from sample to sample.

There are no generally accepted true statistical tests for multicollinearity.

VIF > 5 as a rule of thumb

32
Q

Outliers

A

A distinctly unusual observation or extreme value

33
Q

Unbiased

A

Parameter estimates are, on average, equal to the parameter’s true value in the population model

34
Q

Unbiased Equation

A

E(Bhat)=B

Distrobution of Bhat is centered around B

35
Q

Efficient

A

Has the lowest variance among unbiased estimators

36
Q

Multicollinearity

A

strong (but not exact) linear relationship between two or more regressors

37
Q

Best Linear Unbiased Estimator

A

If first 6 classical assumptions are met

38
Q

OLS stands for

A

Ordinary least squares

39
Q

Most Common remedies for multicollinearity

A
  1. Do Nothing
  2. Drop a redundant variable
  3. Increase the sample size
40
Q

Impure Serial Correlation

A

Serial correlation that is caused by a specification error such as an omitted variable or an incorrect functional form

41
Q

Pure Serial Correlation

A

This type of serial correlation occurs when the error in one period is correlated with the errors in other periods. The model is assumed to be correctly specified.

42
Q

Best remedy for impure serial correlation

A

attempt to find the ommitted variable or the correct functional form for the equation

43
Q

Stochastic Error Term

A

term that is added to aregression equation to introduce all the variation in the dependent variable that cannot be explained by the independent variables that have been included

Equation: Y = B0+B1X+e

44
Q

Residual Error Term

A

The difference between the estimated value of the dependent variable and the actual value of the dependent variable (observered-estimated)

Equation: ei=Yi-Y^i

45
Q

Durbin-Watson d statistic test

A

Used to determine if there is first-order serial correlation in the error term of an equation by examining residuals. Includes dL(lower bound) and dU (upper bound).

46
Q

Durbin-Watson assumptions

A
  1. regression model includes an intercept term
  2. serial correlation is first-order in nature
  3. regression model does not include a lagged dependent variable as an independent variable
47
Q

What if Durbin-Watson d-statistic is outside of upper limit?

A

We do not reject the null hypothesis of no autocorrelation since there is no statistical evidence of first order positive serial correlation

48
Q

White Test

A

Used to test for heteroskedasticity

49
Q

t-test formula

A

coefficient divided by standard error

50
Q

r2 formula

A

ess(model) divided by tss (total)

51
Q

Sign of bias = sign of OVB x sign correlation

A

sign of bias

52
Q

5% confidence

A

1.96

53
Q

1% confidence

A

2.787

54
Q

Omitted Variable (issue)

A
Bias in the coefficient estimates of the included X's
#3 OLS classical assumption
55
Q

Omitted Variable (correction)

A
Include the ommitted variable or a proxy
#3 OLS classical assumption
56
Q

Irrelevant variable

A

Inclusion of a variable that does not belong in the equation

57
Q

Incorrect Functional Form (issue)

A
The function form is inappropriate
#1 OLS classical assumption
58
Q

Incorrect Functional Form (correction)

A
Transform the variable or the equation to a different functional form
#1 OLS classical assumption
59
Q

Multicollinearity (issue)

A
Some of the ind. variables are imperfectly correlated
#6 OLS classical assumption
60
Q

Multicollinearity (correction)

A
Drop the redudant variables but often doing nothing is best
#6 OLS classical assumption
61
Q

Serial Correlation (issue)

A
Observations of the error term are correlated 
#4 OLS classical assumption
62
Q

Serial Correlation (correction)

A
if impure, fix the specification. consider Geralized Least Squares or Newey-West standard errors
#4 OLS classical assumption
63
Q

Heteroskedasticity (issue)

A
The variance of the error term is not constant for all observations 
#5 OLS classical assumption
64
Q

Heteroskedasticity (correction)

A

if impure, fix the specification. Use the HC standard errors or reformulate the variables