Econometrics - Theory Flashcards
Root MSE in STATA stands for :
SER
Total MS =
TSS
Residual SS =
SSR
Model SS =
ESS
TSS = ___ + ____
ESS + SSR
When analyzing STATA what do you have to assume unless specified otherwise?
That all 3 Least Squares Assumptions hold
and homoskedastic errors
What is the range of R squared?
0 to 1
A stock with Beta > ____ is riskier
1
A stock with Beta < ______ is less risky than the market portfolio
1
An empirical analysis is externally valid if _________
the conclusions can be generalized to other populations and other settings
Are results/studies regarding health in the United States externally valid?
No, because very few people in the US have health insurance and therefore results from the US cannot be generalized for other settings
An empirical analysis is internally valid when statistical inference _________
about the causal effects is valid for the population
For internal validity why should estimators be unbiased and consistent?
Because if they are not unbiased and consistent, answers don’t provide systematically skew results, providing accurate estimations close to the population average and consistency implies that as sample sizes increase, consistent estimators become more accurate, ensuring reliability.
The reason why we need the Large outliers are unlikely assumption is to derive that the OLS estimator is ____________
asymptotically normally distributed
We cannot calculate the OLS estimator if _________
there is perfect multiple linearity between explanatory variables; so there cannot be perfect multicollinearity
The first OLS assumption is not an assumption but a ___________
REQUIREMENT
List the threats to internal validity
- omitted variables
- functional form misspecification
- measurement error
- sample selection
- simultaneous causality
All of the threats to internal validity lead to a violation of: ________
OLS assumption #1 ; which states that the error term is not related to explanatory variables
If there are important explanatory variables missing from the model then _______
our results are biased and inconsistent, and therefore internal validity is not ensured
If a regressor correlates with the error term then it is _______
endogenous
If we ommit an exogenous variable,
Because labour market experience has a non-linear relationship with wages, if we only use linear parameteres we will be dealing with what problem:
Functional form misspecification
What is Sample Selection Bias?
Sample selection bias occurs when the process of selecting data is related to the dependent variable beyond its relationship with the regressors, leading to correlation between regressors and the error term, affecting OLS estimators’ consistency.
Can you explain how Sample Selection Bias manifests?
It arises when the selection process affecting data availability is tied to the dependent variable. For instance, in the 1936 polling example, selecting phone numbers of car owners introduced bias because car owners with phones were more likely to support a specific political party.
How can the Sample Selection Bias problem be described?
It can be viewed either as a consequence of nonrandom sampling or as a missing data issue. For instance, a random sample of car owners with phones isn’t the same as a random sample of voters.
What’s the optimal solution to address Sample Selection Bias?
The best solution is to design studies to avoid it. For instance, estimating the mean height of undergraduates should involve a random sample of all undergraduates, not just those entering a basketball court.
Simultaneity bias occurs if causality ______ in both directions
runs
Is internal validity an issue here: You want to investigate health costs in the Netherlands and you have a sample drawn from all customers of health insurance companies of the Netherlands.
Health insurance is compulsory in the Netherlands, so there is no problem with the selectivity of the sample if the sample is randomly drawn from all insurance companies.
Can confidence intervals be constructed in the usual way if the OLS estimator includes a measurement error, w, with finite fourth moment?
Assuming a homoskedastic wi and since the LSA conditions hold, the standard errors are calculated correctly and therefore also the confidence intervals.
To establish whether ommitted variables have a genuine effect we must look at and evaluate _________
t-values and p-values and then look at F test for UR and R
formula for t
B1 kapelusz - B1,0 / SE (B kapeluszek)
Is there a problem with internal validity here: You have a sample of adult males living in Amsterdam en you want to use this sample to estimate
the average height of Dutch adult males.
Yes, because Amsterdam will not be representative of the entire Dutch population as there are a lot of students and expats. Furthermore, young people tend to be taller. Furthermore, people from below the large rivers (Lek, Waal and Maas) are known to be shorter than those from above these rivers.
X under measurement error =
Real X + w (measurement error term)
A low F test in White Test suggests
strong evidence for heteroskedasticity
When dealing with a measurement error, how do you know if the confidence interval can be constructed in the usual way?
If the measurement error term, w, is homoskedastic, and if the LSA conditions hold
What happens when a redundant explanatory variable is added and it’s correlated with other variables in a model?
When a redundant variable is added and it’s correlated with other variables, it leads to inefficiency in the model. For instance, if the added variable is negatively correlated with one variable (let’s say ‘Jap’), it might be positively correlated with another variable (‘Time’). As a consequence, the standard deviations of the coefficient estimators for ‘Jap’ and ‘Time’ increase, making these estimators less accurate. This means that the t-ratios move towards 0 or the standard errors become larger.
For our instrument to be valid we need to make sure that :
the covariance between x and z is unequal to zero and the covariance between z and error term is equal to zero
Why are instruments usually different from exogenous variables in IV regression?
Instruments need to satisfy two critical conditions: exogeneity (uncorrelated with the error term) and relevance (correlated with the endogenous variable). Exogenous variables, by definition, are uncorrelated with the error term but using them as instruments might satisfy the relevance condition required for a valid instrument.
What are the conditions for a valid instrument?
Exogeneity –> uncorrelated with the error term
Relevance –> correlated with the explanatory variable
The instrument cannot be a part of the initial regression model
Explain the difference between exogenous and endogenous variables?
Exogenous variables are independent, and endogenous variables are dependent. Therefore, if the variable does not depend on variables within the model, it’s an exogenous variable. If the variable depends on variables within the model, though, it’s endogenous.
Explain this stata command : ivreg S (T = TF TM) SP IP
This command runs an instrumental variable regression where S is the dependent variable, T is the endogenous regressor, TF and TM are the instruments for T, while SP and IP are exogenous variables. It’s specifying that T is endogenous and should be instrumented by TF and TM.
Compute the stata output that runs an instrument variable regression:
ivreg S (T=TM TF) SP IP,
wherein s is the dependent variable, T is the endogenous one which is instrumented by tm and tf and sp and ip are the exogenous variables
Define endogeneity and explain why it’s a concern in regression analysis.
Endogeneity refers to a situation where an independent variable is correlated with the error term, leading to biased and inconsistent regression estimates due to omitted variable bias or simultaneous causation.
What does the first stage regression in 2SLS aim to accomplish?
The first stage regression in 2SLS aims to predict the potentially endogenous variable using instrumental variables, thereby creating adjusted values that aren’t correlated with the error term.
Which variables are used as instruments in the first stage of 2SLS, and what’s their role?
Instruments in the first stage of 2SLS are variables chosen for their lack of correlation with the error term but correlation with the potentially endogenous variable. For instance, TF and TM might be instruments for predicting T.
Why do we save predicted values in 2SLS regression, and what variable contains these values?
Predicted values are saved in the first stage to create a new variable (TFIT) that contains the predicted values of the potentially endogenous variable (T).
Describe the objective of the second stage regression in 2SLS.
The second stage regression in 2SLS seeks to estimate the relationship between the endogenous regressor and the predicted values of the potentially endogenous variable while controlling for exogenous variables.