Lecture 3 (Chapter 4 in book) Flashcards
Explain what an f-test is
F-test is the equivalent of a t-test but when you are testing for multiple variables. We want to test whether Y is being affected by say X1 and X2 out of X1,2,3
F-test:
Suppose a researcher wants to test whether the return on a stock (y) shows unit sensitivity to two factors (x2 and x3) among three considered. The regression is carried out on 144 monthly observations: The reg is:
Yt = β1 + β2X2 + β3X3 = β4X4 + ut
-What are the restricted and unrestricted regressions?
-If the two RRS are 436.1 and 397.2 respectively, perform the test
The unrestricted regression is the one in the equation
URSS = The equation that is given:
Yt = β1 + β2X2 + β3X3 + β4X4 + ut
RRSS = (Yt - X2t - X3t) = β1 + β4X4 + ut
So, this creates new terms such that Zt = Yt - X2t - X3t –> Zt = β1 + β4X4t + ut
FOR F-TEST:
RRSS: 436.1
URSS: 397.2
T= 144 Observation
k = 4 variables
m = 2 restrictions
Test statistic = (RRSS-URSS)/URSS * (T-k)/m
Test statistic = (436.1-397.2)/397.2 * (144-4)/2 = 6.86
Critical value: F(m,T-k) = F(2,140)
F(2,140) = 3.07 at 5% and 4.79 at 1% –> Conclusion: Reject H0 because 6.86 is higher
What is Goodness of Fit?
Goodness of fit measures how well the regression model actually fits the data.
Goodness of fit statistics are available to test how well the sample regression (SRF) fits the data - that is how close the fitted regression line is to all of the data points taken together.
We identify this by finding R^2. R^2 is the square of the values between the dependent variable and the corresponding fitted values from the model. A correlation coefficient must lie between -1 and 1. Therefore R^2 must be between 0 and 1.
R^2 close to 1 suggests a strong relationship between the model and the dependent variable. (Model captures much of the variability in the data).
Please provide the equation for R^2
R^2 = ESS/TSS
R^2 = ESS/TSS = (TSS-RSS)/TSS = 1- (RSS/TSS)
TSS: Total Sum of Squares
ESS: Explained Sum of Squares
RSS: Residual Sum of Squares
Suppose you’re analyzing the relationship between hours studied and exam scores for a group of students. You collect data on the number of hours each student studied and their corresponding exam score, and you fit a linear regression model to predict exam scores based on study hours.
After running the regression, you calculate an R² of 0.85.
Comment on what this means
With an R² score of 0.85 it means that 85% of the variability in the exam scores can be explained by the amount of hours a student has spent studying. (One’s test result is 85% determined by the amount of time spent studying for it.
What are some problems with R²?
A major issue with R² is that it always increases as you add variables (or at least does not decrease) - even if those variables do not have a meaningful relationship with the dependent variable (Yt).
As you add more predictors, even irrelevant ones, R^2 will typically increase because the model has more flexibility to fit the data, potentially “overfitting” it. Overfitting means the model is capturing noise or random fluctuations in the data rather than the true underlying relationships.
Please explain and provide the formula for adjusted R^2
Unlike R^2, adjusted R^2 takes into account the number of predictors in the model. As such, it penalizes the inclusion of irrelevant variables that do not include the model significantly.
Adjusted R^2 = 1 - [(1-R²)(n-1)/)n-p-1)]
-n is the number of data points
-p is the number of predictors (independent variables)
Adjusted R² will decrease if the added predictors do not improve the model’s explanatory power significantly. If adding a predictor does not explain a sufficient amount of variation in the dependent variable, Adjusted R²
What is a dummy variable?
When dealing with qualitative data, such as gender, industry, ratings it is difficult to analyse in econometrics.
Therefore we assign each term a dummy variable like male=0 female=1.