ECONOMETRICS Flashcards

1
Q

What is econometrics?

A

Econometrics is the branch of economics that applies statistical methods to estimate economic relationships. It helps in testing theories and making predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between deterministic and stochastic relationships?

A

Deterministic: A relationship where the outcome is exactly determined by the inputs (e.g., Force = Mass × Acceleration).
Stochastic: A relationship where the outcome has some randomness (e.g., House Price = B1 + B2 × Size + error term).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the population regression function (PRF)?

A

The PRF represents the true relationship between the dependent variable (Y) and independent variables (X’s) in the entire population:

Yi = B1 + B2X2i + B3X3i + ….. + BkXki +ui

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the sample regression function (SRF)?

A

The SRF is the estimated version of the PRF based on sample data: Yi = b1 + b2X2i + b3X3i + ….. + bkXki +u^i

Here, b1,b2 and b3 are estimated coefficients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Y^ (Y-hat) in regression?

A

Y^ is the predicted value of
𝑌
Y using the estimated regression equation:

Y = b1 + b2X2i + …..+ bkXk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the method of Ordinary Least Squares (OLS)?

A

OLS is a method used to estimate regression coefficients by minimizing the sum of squared errors (SSE).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the formula for the sum of squared errors (SSE)?

A

SSE = \sum (Y_i - Y^_i)^2

It measures how far off our predictions are from actual values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why do we prefer OLS?

A

OLS provides unbiased coefficient estimates with the smallest variance among all unbiased methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is R -squared?

A

measures how well the independent variables explain the variation in the dependent variable.
Formula:

R^2 = 1 - (SSE/SST)

It ranges from 0 to 1, where higher values indicate a better fit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is adjusted
R^2?

A

Adjusted R^2
adjusts for the number of predictors in the model. Unlike R^2 , it penalizes unnecessary variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the ANOVA F-statistic used for?

A

The F-statistic tests whether at least one independent variable is significantly related to
𝑌.
If the p-value is small, at least one
X variable significantly affects 𝑌

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the formula for the F-statistic?

A

F = (MSR / MSE)

MSR (Mean Square Regression) measures explained variance.
MSE (Mean Square Error) measures unexplained variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you test if a regression coefficient is significant?

A

t = (bk / SE (bk) )

Compare with critical value or use the p-value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you interpret regression coefficients?

A

If X is in natural log: A 1% increase in
X leads to a B% change in
Y.
If X is a dummy variable (0 or 1): It shows the difference in
Y between the two groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a confidence interval for a regression coefficient?

A

The confidence interval gives a range in which the true coefficient likely falls.

bj +/- t critical * SE (bk)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the assumptions needed for OLS to be BLUE?

A

BLUE = Best Linear Unbiased Estimator
Linearity:
𝑌
Y is a linear function of
𝑋
X.
No omitted variables: Model includes all relevant factors.
No perfect multicollinearity:
𝑋
X variables are not perfectly correlated.
Zero mean error: Expected value of residuals is zero.
Homoscedasticity: Errors have constant variance.
No autocorrelation: Residuals are not correlated with each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the Chow test?

A

The Chow test checks whether two groups of data (e.g., different time periods) have the same regression coefficients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a restricted least squares F-test?

A

It compares a restricted model (fewer variables) to an unrestricted model (all variables) to see if removing variables significantly worsens the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a subset F-test?

A

A subset F-test checks whether a group of independent variables (X’s) in a regression model can be removed without losing important information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How would you predict house prices using regression?

A

Collect data on house prices (
𝑌
Y) and predictors (
𝑋
X) such as square footage, number of bedrooms, etc.
Estimate regression coefficients using OLS.
Predict prices for new homes using:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is multiple regression?

A

Multiple regression extends simple regression by including more than one independent variable to explain the dependent variable.
Example:
Y_i = B1 + B2X2_i + B3X3_i + u_i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How do we interpret coefficients in a multiple regression model?

A

B2: The change in Y, on average, for a one-unit increase in X2, holding X3 constant.
B3: The change in Y, on average, for a one-unit increase in X3, holding X2 constant.
B1: The predicted value of Y when all X’s are zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Why is multiple regression useful?

A

It allows us to isolate the effect of one variable on Y while holding other factors constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the difference between simple and multiple regression graphs?

A

Simple regression: A straight line on a scatterplot.
Multiple regression: A plane (if 2 X’s) or a hyperplane (more than 2 X’s), which is harder to visualize.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What does R-squared (R^2) tell us in multiple regression?
It measures how much of the variation in Y is explained by the independent variables (X’s). Formula: R^2 = 1 - (SSE / SST) Higher R^2 means a better fit, but adding more variables always increases R^2.
26
Why do we use adjusted R-squared instead of R-squared?
Adjusted R^2 corrects for the number of variables in the model to prevent overfitting. Formula: Adjusted R^2 = 1 - (SSE / (n-k)) / (SST / (n-1))
27
When should you add or remove variables based on adjusted R^2?
If adjusted R^2 increases, keep the new variable. If adjusted R^2 decreases, the variable adds noise and should be removed.
28
What are the three types of sum of squares in regression?
Total Sum of Squares (SST): Measures the total variation in Y. Sum of Squares Regression (SSR): The variation in Y explained by the regression. Sum of Squared Errors (SSE): The unexplained variation in Y. Relationship: SST = SSR + SSE
29
What does the F-statistic tell us in multiple regression?
It tests whether at least one independent variable significantly predicts Y. Formula: F = MSR / MSE If the p-value is low, the model is statistically significant.
30
Why do we need to be careful with software when interpreting regression results?
Different books/software use different notations for regression terms. Example: SSR can mean Sum of Squared Regression or Sum of Squared Residuals, which are very different!
31
What are the Classical Linear Regression Assumptions?
For OLS to be the Best Linear Unbiased Estimator (BLUE), the following assumptions must hold: 1. The number of observations (n) must be greater than the number of coefficients (k). 2. The independent variables (X’s) must have independent variability. 3. The independent variables (X’s) and the error term (u) must not be correlated. 4. The error term (u) follows a normal distribution. 5. The error term (u) has a mean of zero. 6. The error term (u) has constant variance (homoscedasticity). 7. The error terms are not correlated across observations (no autocorrelation).
32
What does BLUE stand for in OLS?
Best: OLS provides the minimum variance among unbiased estimators. Linear: The regression is linear in the coefficients. Unbiased: The expected values of the estimated coefficients are equal to the true values. Estimator: It provides numerical estimates for the coefficients.
33
Why is the normality of error terms important?
While normality is not required for OLS to be BLUE, it is necessary for hypothesis testing, since test statistics like t and F assume normality in small samples.
34
What are the steps for hypothesis testing in OLS?
Null Hypothesis (H0): The coefficient equals zero (or some value). Alternative Hypothesis (H1): The coefficient is not equal to zero. Test Statistic: t = (b_j - B_j) / SE(b_j) Compare the test statistic with the critical value or check the p-value.
35
What factors influence the size of the standard error of a coefficient?
Smaller SSE leads to a smaller standard error. Larger sample size (n) reduces the standard error. Fewer independent variables (k) decrease the standard error. Higher variation in X results in a smaller standard error.
36
How do you decide whether to reject or fail to reject the null hypothesis?
If the test statistic > critical t-value → reject the null hypothesis. If the p-value < significance level (0.10, 0.05, 0.01) → reject the null hypothesis. The p-value represents the probability of making a Type I error (rejecting a true null hypothesis).
37
What is the confidence interval for a regression coefficient?
Formula: b_j ± (t-critical value × standard error of b_j) This provides a range where we are 90%, 95%, or 99% confident the true coefficient lies.
38
What does the F-statistic test in multiple regression?
The F-test checks if at least one independent variable significantly predicts Y. Null Hypothesis (H0): All regression coefficients (except the intercept) are zero. Formula: F = MSR / MSE = (SSR / (k - 1)) / (SSE / (n - k)) If the test statistic > critical value → reject H0, meaning at least one X is significant.
39
How do you decide which independent variables to include in a model?
If theory suggests the variable is important, include it. If the variable’s t-statistic is significant, keep it. If adjusted R-squared increases after adding the variable, keep it. If adjusted R-squared decreases, the variable may be unnecessary.
40
What is regression through the origin?
A model with no intercept (B1 = 0). Only used if there is a strong theoretical reason (e.g., consumption = 0 if income = 0).
41
Why would you rescale data in regression?
Rescaling (e.g., income in $10,000 instead of $1) makes coefficients easier to interpret. Standardizing variables (Z-scores) allows comparison of effect sizes across different variables.
42
What is a quadratic regression model?
A model where X appears as both a linear and squared term: Y = B1 + B2X + B3X² + u Used to model diminishing returns (e.g., study hours and GPA).
43
What is a log-linear model, and why is it used?
The dependent and/or independent variables are in logarithmic form: ln(Y) = B1 + B2 ln(X2) + B3 ln(X3) + u The coefficients represent elasticities, meaning B2 is the percentage change in Y for a 1% change in X2.
44
What is a semi-log model?
Only the dependent variable or independent variable is in logarithmic form. Example: ln(Y) = B1 + B2X + B3Z + u B2 represents the percentage change in Y for a one-unit change in X.
45
How do you interpret natural log transformations in regression?
If the dependent variable (Y) is logged: A one-unit change in X leads to a percentage change in Y. If both Y and X are logged: A one-percent change in X leads to a percentage change in Y (elasticity).
46
What is a dummy variable in regression?
A dummy variable is a variable that takes values of 0 or 1 to represent different categories (e.g., male = 1, female = 0).
47
How do you include dummy variables in a regression model?
f there are q categories, you include q-1 dummy variables. Example: If there are 3 schools, you need 2 dummy variables with the third school as the omitted category.
48
How do you interpret dummy variable coefficients?
The coefficient of a dummy variable shows the difference in the dependent variable (Y) relative to the omitted category. Example: If female is the omitted group and the coefficient on male is 0.2, then males have a 0.2 higher GPA, on average, than females.
49
What happens if you include all dummy variables instead of omitting one?
This causes the dummy variable trap (perfect multicollinearity), making the regression invalid.
50
What is an interaction effect in regression?
An interaction effect occurs when the impact of one independent variable on Y depends on another variable. Example: The effect of study hours on GPA might be different depending on which school a student attends.
51
How do you create an interaction effect?
Multiply two independent variables to create a new variable. Example: New variable = Study Hours × Business School New variable = Study Hours × Jepson School
52
How do you interpret an interaction coefficient?
Example: If the interaction term Study Hours × Business School has a coefficient of -0.05, it means that for Business students, each additional study hour has a 0.05 lower effect on GPA compared to the reference group.
53
What are the different types of functional forms in regression?
Linear Model: Y = B1 + B2X2 + u Quadratic Model: Y = B1 + B2X2 + B3X2² + u (captures non-linear relationships) Log-Linear Model: ln(Y) = B1 + B2ln(X2) + u (captures elasticities) Semi-Log Model: ln(Y) = B1 + B2X2 + u (percentage change in Y for a unit change in X)
54
When should you use a quadratic term in regression?
When the relationship between Y and X is not linear. Example: GPA and Study Hours – Diminishing returns to studying.
55
How do you interpret coefficients in log-linear and semi-log models?
Log-Linear Model (ln(Y) = B1 + B2ln(X)): B2 represents the percentage change in Y for a 1% change in X. Semi-Log Model (ln(Y) = B1 + B2X): B2 represents the percentage change in Y for a one-unit increase in X.
56
How do you decide whether a variable should be included in a regression model?
If theory suggests the variable is important, include it. If the t-statistic is significant, keep it. If adjusted R-squared increases after adding the variable, keep it. If adjusted R-squared decreases, the variable may not be necessary.
57
What happens if an important variable is omitted from a regression model?
Omitted variable bias occurs, meaning the estimates of the other coefficients may be biased.
58
What is a subset (partial) F-test?
A subset F-test checks whether a subset of independent variables significantly explains variation in Y. Null Hypothesis (H0): The subset of coefficients equals zero (e.g., B4 = B5 = 0). If the test statistic > critical F-value, reject H0, meaning at least one of the subset variables is significant.
58
How do you perform a subset F-test?
Run two regressions: 1. Unrestricted model (all variables included). 2. Restricted model (subset variables removed, forcing coefficients to be zero). Compute: F = ((SSE_restricted - SSE_unrestricted) / # restrictions) / MSE_unrestricted If F-stat > F-critical, reject H0.
59
What is restricted least squares (RLS)?
RLS imposes restrictions on coefficients to test if a specific functional form is valid. Example: Testing if GDP exhibits constant returns to scale Unrestricted: ln(Q) = B1 + B2 ln(K) + B3 ln(L) + u Restricted: ln(Q) - ln(K) = B1 + B3(ln(L) - ln(K)) + u (forces B2 + B3 = 1). Use an F-test to check if the restriction is valid.
60
What is the Chow test used for?
The Chow test determines whether regression coefficients are the same across two different groups or time periods. Example: Testing if the relationship between earnings and education differs before and after a policy change.
61
How do you perform the Chow test?
Run three regressions: 1. On the full sample (restricted model). 2. On subsample 1 (first group). 3. On subsample 2 (second group). Compute: F = ((SSE_restricted - (SSE1 + SSE2)) / k) / ((SSE1 + SSE2) / (n - 2k)) If F-stat > F-critical, reject H0, meaning the coefficients differ between the groups.
62
What happens if an independent variable (X) is correlated with the error term (u)?
OLS estimates become biased. Expected value of the coefficient (E(b)) is not equal to the true value (B). This is called omitted variable bias or an endogenous regressor problem.
63
What are examples of omitted variable bias?
Example: Estimating the effect of family income on test scores without controlling for parental education. If parental education affects both income and test scores, then the coefficient on income is biased.
64