Oral Exam Questions Flashcards
OLS. What does this abbreviation stand for? What does it mean?
Ordinary Least Squares. Method used in linear regression to find line of best fit for data points. Minimizes sum of squared differences between actual and predicted values on the line.
How do we test whether a coefficient is different from zero? [t-statistics]
We use t-test: (Beta^hat / SE of B^hat), to check if Beta is significantly different from 0 or if it is just random chance. If t-stat is large, Beta is likely different from zero. If t-stat is small, you can’t confidently say Beta is different from zero
How do we interpret the coefficient of interest? [e.g., test_score = α + β teacher_student_ratio + u]
The coefficient Beta (B) represents the impact of a one unit change in the teacher-student ratio on the test score. B>0, increase in impact, B<0 decrease in impact, B=0 no impact.
Economic significance. How do we measure it? Why do we need it?
Economic significance asks if an effect is big enough to matter in real life.
* Measured by the size of the coefficient in context.
* Needed because statistical significance alone doesn’t tell us if an effect is practically important.
How do we measure goodness of fit? [R-squared, intuition behind its construction, limitations]
We measure goodness of fit by the R-squared. It compares how well the model’s predictions match the actual data versus a baseline (mean of the data). Values range from 0 to 1 (higher is better).
* Doesn’t measure if the model is correct or if variables are relevant
Why do we include control variables in regressions?
We include control variables in regressions to account for factors that might affect the dependent variable. This helps isolate the effect of the main variable of interest by holding other factors constant, improving the accuracy of the coefficient estimates and reducing omitted variable bias.
How do we interpret the coefficient of interest in a multiple regression? [e.g., test_score = α +
β teacher_student_ratio + γ avg_f amily_income + u]
The coefficient Beta represents the change in test score for a one-unit increase in the teacher-student ratio, holding average family income constant.
What is adjusted R-squared? Why do we need it? [intuition behind its construction, limitations]
Adjusted R-squared accounts for the number of predictors in a model, penalizing excessive use of irrelevant variables. Provides a more accurate measure of goodness of fit but can be misleading if the model is incorrectly specified.
What is F-test? Why is t-test not enough?
F-test is used to determine if a group of variables collectively have a significant effect on the dependent variable in the model. The F-test compliments the t-test by evaluation the overall fit of the model, allowing us to see if at least one predictor significantly contributes to explaining variation in dependent variable.
Which hypotheses can we test with F-test?
We can test: Overall model significance, whether at least one of the regression coefficients is significantly different from zero. Multiple regression, whether additional predictors provide a significantly better fit than a simple model.
OLS assumption 1: Zero conditional mean. What is it? What happens if it doesn’t hold?
Zero conditional mean assumption states that: The expected value of the error term is zero given the independent variables. If it doesn’t hold, it leads to biased and inconsistent coefficient estimates, making it difficult to determine the true effects of the independent variables.
OLS assumption 2: Random sampling. What is it? What happens if it doesn’t hold?
Random Sampling Assumption: Data points are collected randomly from the population, ensuring that every individual has an equal chance of being selected. If it doesn’t hold, can result in sampling bias, leading to non-representative data.
OLS assumption 3: Rare outliers. What is it? What happens if it doesn’t hold?
Rare Outliers Assumption: Extreme values in the data are rare, and not unduly influence the regression results. If it doesn’t hold, outliers can skew the estimates and lead to misleading conclusions. May result in inflated coefficients, and reduced model accuracy.
OLS assumption 4: No multicollinearity. What is it? What happens if it doesn’t hold?
No Multicollinearity: The independent variables in a regression model should not be highly correlated. If it doesn’t hold, can lead to inflated standard errors, making it difficult to determine the individual effect of each predictor.
VIF. What is it? Why do we need it?
Measures how much of the variance of an estimated regression coefficient increases when independent variables are correlated.
We need VIF to detect multicollinearity. High VIF (Above 10) Indicated problematic correlation among predictors, suggesting that coefficients may be unreliable.
Which methods are used to deal with outliers?
Remove them if they are deemed irrelevant.
Trim the dataset for observations.
Winsorize the data, replacing extreme values with the nearest values within a specified range.
OLS property 1: Unbiased. What is it? Why is it important?
On average, the estimated coefficients equal the true population parameters. It is important because it ensures that the conclusions drawn from the regression are valid.
OLS property 2: Consistent. What is it? Why is it important?
As the sample size increases, the estimated coefficients converge to the true population parameters. It is important because it assures us that with enough data, our estimated will become more accurate and reliable.
OLS property 3: Normally distributed. What is it? Why is it important?
When the sample size is large, the sampling distribution of the coefficients approaches a normal distribution due to the central limit theorem.
Important because it enables valid hypothesis testing and construction of confidence intervals.
Heteroscedasticity of errors. What is it? Why should we care about it?
Occurs when the variance of the error terms in a regression is not constant across all levels of the independent variable. We care about it because it can lead to inefficient estimates and biased standard errors.
Biases: Sample selection bias. What is it? How can we correct it?
Occurs when the sample used is not representative of the population due to non-random selection process. Can lead to biased estimates and conclusions.
Biases: Omitted variable bias. What is it? How can we correct it?
Occurs when a relevant variable that affects the dependent variable is left out of the regression model.
Can lead to biased and inconsistent coefficient estimates. To remedy this, we can try to include the omitted variable.
Biases: Simultaneity bias. What is it? How can we correct it?
Simultaneity bias happens when the explanatory variable is correlated with the regression error term, ε. X causes Y, but Y also causes X…
Correct with Instrumental Variable regression, 2SLS.
Biases: Attenuation bias. What is it? How can we correct it?
Occurs when an independent variable is measured with error, leading to an underestimation of the true effect on the dependent variable. Results in biased, usually smaller, coefficient estimates.
Can correct by increasing sample size. (reduce impact of random errors).
Polynomials. How do we interpret coefficients? [e.g., wage = α + β1 age + β2 age2 + u]
Quadratic term indicates how the effect of age on wage changes as age increases. Positive B2 means impact of age on wage increases as age rises, and opposite.