Econometrics - Theory Flashcards

1
Q

Root MSE in STATA stands for :

A

SER

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Total MS =

A

TSS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Residual SS =

A

SSR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Model SS =

A

ESS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

TSS = ___ + ____

A

ESS + SSR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When analyzing STATA what do you have to assume unless specified otherwise?

A

That all 3 Least Squares Assumptions hold
and homoskedastic errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the range of R squared?

A

0 to 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A stock with Beta > ____ is riskier

A

1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A stock with Beta < ______ is less risky than the market portfolio

A

1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

An empirical analysis is externally valid if _________

A

the conclusions can be generalized to other populations and other settings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Are results/studies regarding health in the United States externally valid?

A

No, because very few people in the US have health insurance and therefore results from the US cannot be generalized for other settings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

An empirical analysis is internally valid when statistical inference _________

A

about the causal effects is valid for the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

For internal validity why should estimators be unbiased and consistent?

A

Because if they are not unbiased and consistent, answers don’t provide systematically skew results, providing accurate estimations close to the population average and consistency implies that as sample sizes increase, consistent estimators become more accurate, ensuring reliability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The reason why we need the Large outliers are unlikely assumption is to derive that the OLS estimator is ____________

A

asymptotically normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

We cannot calculate the OLS estimator if _________

A

there is perfect multiple linearity between explanatory variables; so there cannot be perfect multicollinearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The first OLS assumption is not an assumption but a ___________

A

REQUIREMENT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

List the threats to internal validity

A
  1. omitted variables
  2. functional form misspecification
  3. measurement error
  4. sample selection
  5. simultaneous causality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

All of the threats to internal validity lead to a violation of: ________

A

OLS assumption #1 ; which states that the error term is not related to explanatory variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

If there are important explanatory variables missing from the model then _______

A

our results are biased and inconsistent, and therefore internal validity is not ensured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

If a regressor correlates with the error term then it is _______

A

endogenous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

If we ommit an exogenous variable,

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Because labour market experience has a non-linear relationship with wages, if we only use linear parameteres we will be dealing with what problem:

A

Functional form misspecification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is Sample Selection Bias?

A

Sample selection bias occurs when the process of selecting data is related to the dependent variable beyond its relationship with the regressors, leading to correlation between regressors and the error term, affecting OLS estimators’ consistency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Can you explain how Sample Selection Bias manifests?

A

It arises when the selection process affecting data availability is tied to the dependent variable. For instance, in the 1936 polling example, selecting phone numbers of car owners introduced bias because car owners with phones were more likely to support a specific political party.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How can the Sample Selection Bias problem be described?

A

It can be viewed either as a consequence of nonrandom sampling or as a missing data issue. For instance, a random sample of car owners with phones isn’t the same as a random sample of voters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What’s the optimal solution to address Sample Selection Bias?

A

The best solution is to design studies to avoid it. For instance, estimating the mean height of undergraduates should involve a random sample of all undergraduates, not just those entering a basketball court.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Simultaneity bias occurs if causality ______ in both directions

A

runs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Is internal validity an issue here: You want to investigate health costs in the Netherlands and you have a sample drawn from all customers of health insurance companies of the Netherlands.

A

Health insurance is compulsory in the Netherlands, so there is no problem with the selectivity of the sample if the sample is randomly drawn from all insurance companies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Can confidence intervals be constructed in the usual way if the OLS estimator includes a measurement error, w, with finite fourth moment?

A

Assuming a homoskedastic wi and since the LSA conditions hold, the standard errors are calculated correctly and therefore also the confidence intervals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

To establish whether ommitted variables have a genuine effect we must look at and evaluate _________

A

t-values and p-values and then look at F test for UR and R

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

formula for t

A

B1 kapelusz - B1,0 / SE (B kapeluszek)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Is there a problem with internal validity here: You have a sample of adult males living in Amsterdam en you want to use this sample to estimate
the average height of Dutch adult males.

A

Yes, because Amsterdam will not be representative of the entire Dutch population as there are a lot of students and expats. Furthermore, young people tend to be taller. Furthermore, people from below the large rivers (Lek, Waal and Maas) are known to be shorter than those from above these rivers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

X under measurement error =

A

Real X + w (measurement error term)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

A low F test in White Test suggests

A

strong evidence for heteroskedasticity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

When dealing with a measurement error, how do you know if the confidence interval can be constructed in the usual way?

A

If the measurement error term, w, is homoskedastic, and if the LSA conditions hold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What happens when a redundant explanatory variable is added and it’s correlated with other variables in a model?

A

When a redundant variable is added and it’s correlated with other variables, it leads to inefficiency in the model. For instance, if the added variable is negatively correlated with one variable (let’s say ‘Jap’), it might be positively correlated with another variable (‘Time’). As a consequence, the standard deviations of the coefficient estimators for ‘Jap’ and ‘Time’ increase, making these estimators less accurate. This means that the t-ratios move towards 0 or the standard errors become larger.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

For our instrument to be valid we need to make sure that :

A

the covariance between x and z is unequal to zero and the covariance between z and error term is equal to zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Why are instruments usually different from exogenous variables in IV regression?

A

Instruments need to satisfy two critical conditions: exogeneity (uncorrelated with the error term) and relevance (correlated with the endogenous variable). Exogenous variables, by definition, are uncorrelated with the error term but using them as instruments might satisfy the relevance condition required for a valid instrument.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What are the conditions for a valid instrument?

A

Exogeneity –> uncorrelated with the error term
Relevance –> correlated with the explanatory variable
The instrument cannot be a part of the initial regression model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Explain the difference between exogenous and endogenous variables?

A

Exogenous variables are independent, and endogenous variables are dependent. Therefore, if the variable does not depend on variables within the model, it’s an exogenous variable. If the variable depends on variables within the model, though, it’s endogenous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Explain this stata command : ivreg S (T = TF TM) SP IP

A

This command runs an instrumental variable regression where S is the dependent variable, T is the endogenous regressor, TF and TM are the instruments for T, while SP and IP are exogenous variables. It’s specifying that T is endogenous and should be instrumented by TF and TM.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Compute the stata output that runs an instrument variable regression:

A

ivreg S (T=TM TF) SP IP,
wherein s is the dependent variable, T is the endogenous one which is instrumented by tm and tf and sp and ip are the exogenous variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Define endogeneity and explain why it’s a concern in regression analysis.

A

Endogeneity refers to a situation where an independent variable is correlated with the error term, leading to biased and inconsistent regression estimates due to omitted variable bias or simultaneous causation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What does the first stage regression in 2SLS aim to accomplish?

A

The first stage regression in 2SLS aims to predict the potentially endogenous variable using instrumental variables, thereby creating adjusted values that aren’t correlated with the error term.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Which variables are used as instruments in the first stage of 2SLS, and what’s their role?

A

Instruments in the first stage of 2SLS are variables chosen for their lack of correlation with the error term but correlation with the potentially endogenous variable. For instance, TF and TM might be instruments for predicting T.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Why do we save predicted values in 2SLS regression, and what variable contains these values?

A

Predicted values are saved in the first stage to create a new variable (TFIT) that contains the predicted values of the potentially endogenous variable (T).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Describe the objective of the second stage regression in 2SLS.

A

The second stage regression in 2SLS seeks to estimate the relationship between the endogenous regressor and the predicted values of the potentially endogenous variable while controlling for exogenous variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

How does the second stage regression address endogeneity in the model?

A

By using the predicted values of the potentially endogenous variable from the first stage, the second stage regression eliminates the endogeneity problem, providing unbiased estimates of the effect of the potentially endogenous variable on the dependent variable.

49
Q

What does the use of instrumental variables achieve in the context of endogeneity?

A

Instrumental variables help separate the correlation between the potentially endogenous variable and the error term, allowing estimation of causal relationships in the presence of endogeneity.

50
Q

Explain the difference between endogenous and exogenous variables in a regression model.

A

Endogenous variables are correlated with the error term, causing potential bias, while exogenous variables are not correlated with the error term and aren’t influenced by other variables in the model.

51
Q

How does the 2SLS method contribute to obtaining unbiased estimates in regression analysis?

A

The 2SLS method contributes to obtaining unbiased estimates by first predicting the potentially endogenous variable using instruments in the first stage, then using these predicted values in the second stage to estimate the relationship between the variables, addressing endogeneity concerns.

52
Q

What is the purpose of testing the strength of instruments in a 2SLS regression?

A

The purpose of testing instrument strength in 2SLS regression is to assess whether the chosen instruments (TF and TM) are sufficiently correlated with the potentially endogenous variable (T). thats why t is regressed on tf and tm ( to check for non-zero covariance)

53
Q

How can researchers assess the strength of instrument variable instruments (TF and TM) in Stata?

A

Researchers can assess instrument strength in Stata by using the regress command to estimate the first stage regression and then employing the test command to check the joint significance of the instruments.

54
Q

What does the command regress T TF TM SP IP accomplish in assessing instrument strength?

A

The command regress T TF TM SP IP runs a regression where T is regressed on TF, TM, SP, and IP, evaluating the relationship between the potentially endogenous variable and its instruments along with exogenous variables.

55
Q

How does the F-statistic obtained from the test help evaluate instrument strength?

A

The F-statistic obtained from the test command helps evaluate the joint significance of the instruments. A larger F-statistic indicates greater explanatory power of the instruments in predicting T.

56
Q

Why is an F-test used instead of a t-test in this context of instrument strength assessment of two IV (TF and TM)?

A

An F-test is used to assess joint significance because it checks whether both instruments together significantly contribute to explaining the variation in the potentially endogenous variable, unlike a t-test that examines individual coefficients.

57
Q

What does a larger F-value suggest in the context of instrument strength testing?

A

A larger F-value suggests that the instruments (TF and TM) are stronger and more relevant in predicting the potentially endogenous variable T, providing more support for their validity in addressing endogeneity.

58
Q

How does a significant F-statistic influence the credibility of instruments in 2SLS regression analysis?

A

A significant F-statistic strengthens the credibility of the instruments in 2SLS regression, indicating that they are sufficiently strong and relevant for predicting the potentially endogenous variable, thereby enhancing the reliability of the instrumental variable approach in addressing endogeneity.

59
Q

When do we have over-identified models?

A

When the number of instruments exceeds the number of endogenous variables

60
Q

When the model is over-identified we may not want to use all of the _____

A

instruments because the more instruments the larger the variance of the estimator, thereby it is less efficient

61
Q

Formula for the degree of overidentification

A

number of instruments minus the number of endogenous regressors

62
Q

Formula for the J statistic:

A

m*F , where m is the number of instruments

63
Q

the degrees of freedom of the asymptotic distribution of the J-statistic is :

A

m-k

64
Q

When is it impossible to statistically test the hypothesis that the instruments are exogenous?

A

It becomes impossible to statistically test the hypothesis of instrument exogeneity when there are as many instruments as there are endogenous regressors, making it exactly identified SO IN SHORT IF M=K

65
Q

Why is it that if the coefficients are overidentified we can test for the assumption that the instrumets are exogenous?

A

if the coefficients are overidentified, it is possible to test the overidentifying restrictions— that is, to test the hypothesis that the “extra” instruments are exogenous under the maintained assumption that there are enough valid instruments to identify the coefficients of interest.

66
Q

What does relevance mean ( one of the two validity conditions)?

A

The instrument should be correlated with the endogenous regressor

67
Q

What does exogeneity mean Iit is one of the two validity conditions)?

A

the instrument should be uncorrelated with error term u, or in other
words, there should be no direct effect of the instrument on the dependent variable Y through u (the error term).

68
Q

when evaluation maximum likelihood, we have to make assumptions on the ________ of some variables

A

distribuution

69
Q

The F test is applicable when the number of instruments is _______ to the number of endogenous variables

A

at least equal

70
Q

t critical value for significance level 5% (for one sided)

A

1.645

71
Q

t critical value for significance level 5% (for two sided)

A

1.96

72
Q

If we have a one - sided test with significance level 5%, then we should use a ____% confidence interval

A

90

73
Q

When explaining why a measurement error causes correlation with error term:

A

remember to show this mathematically as well - S = β0 + β1(T − ν) + β2SP + β3IP + u

74
Q

Examples of unordered discrete variables,

A

type of credit card, choice of streaming program,

75
Q

Examples of ordered discrete variables

A

schooling level,

76
Q

examples of binary variables

A

employed, having savings

77
Q

Why might a linear model not be ideal for modeling probabilities?

A

A linear model isn’t ideal for probabilities because it can predict values beyond the bounds of 0 and 1, which are the limits for probabilities. This can lead to unrealistic predictions such as probabilities greater than 1 or less than 0. SO FITTED/PREDICTED VALUES MIGHT BE OUTSIDE INTERVAL (0,1)

78
Q

What are the implications of LPM generating probabilities outside [0,1]?

A

Predicted probabilities outside this range can be nonsensical (less than 0 or greater than 1), challenging the fundamental laws of probability. This can lead to unrealistic interpretations of event likelihoods.

79
Q

Why is binary error term an issue in the Linear Probability Model?

A

This is because the error term can only take on two values for Y=1 and Y=0 and therefore the error term cannot be normally distributed, so using a normal distribution will be a poor approximation here ; therefore the least squares is not efficient

80
Q

Linear probability models, logit models and probit models are all models where the ________ is a _______ variable

A

dependent, binary (thus dummy)

81
Q

What is the advantage of logit model over LPM?

A

the bounded range of the probability means that the logit model gives much more consistent results than the LPM

82
Q

What makes interpreting marginal effects straightforward in the LPM? (Interpreting marginal effects is a benefit)

A

The LPM’s coefficients directly represent how the probability of an event (binary dependent variable) changes for every one-unit change in an independent variable, making it easy to understand and communicate the impact of regressors

83
Q

Why is the linear approximation of the LPM considered advantageous?

A

The LPM’s assumption of a linear relationship between independent variables and the probability of the dependent variable simplifies modeling and interpretation in scenarios where this linear approximation adequately captures the relationship.

84
Q

Under what conditions are estimators from the LPM unbiased and consistent?

A

Assuming certain conditions, like no omitted variable bias, no multicollinearity, and no endogeneity, estimators in the LPM are unbiased, indicating they are, on average, accurate in estimating true population parameters. Additionally, they are consistent, becoming more precise with larger sample sizes.

85
Q

What benefits does the simplicity of the LPM’s structure offer?

A

The model’s straightforward linear structure simplifies analysis and comprehension, making it accessible for those seeking a basic but interpretable approach to studying relationships between variables.

86
Q

Solutions to fitted values being outside the interval (0,1) in the Linear probability model:

A

Using Maximum Likelihood (

87
Q

What is the main reason why researchers apply MLE instead of LPM

A

The shortcoming of the LPM in that the predicted values can be outside the (0,1) interval/bound.

88
Q

In regular OLS estimation methods when we want to look at the marginal effect of the expectation of y:

A

we look at the derivative of y with respect to the explanatory variables

89
Q

The marginal effects in the logit and probit model are not _____ because of the non linear function form, but the sign is equal to the sign of the corresponding ___. “ The nonlinear nature of these models means that the marginal effects change depending on the values of the variables involved. However, the direction of the impact, whether positive or negative, aligns with the signs of the coefficients.

A

constant, B (estimated slope); “

90
Q

Logit model and Probit model derive the same _____

A

curves

91
Q

LPM has a _____ standard error

A

robust

92
Q

The coefficients for the LPM model will remain same, regardless of whether I use the ____ or not _____ model

A

robust or not robust, only standard error changes !

93
Q

Why use robust regression in Stata for linear probability estimation?

A

Robust regression in Stata is valuable for linear probability models, especially when dealing with binary outcomes (0 or 1). It helps address issues like heteroscedasticity and outliers in the data. By employing robust regression techniques, the analysis produces more reliable coefficients and standard errors, mitigating the impact of outliers and potential biases.

94
Q

What command in Stata is used for robust regression in linear probability estimation?

A

regress y x1 x2, robust
In this syntax, y represents the binary dependent variable, while x1 and x2 denote independent variables. The addition of robust prompts Stata to estimate coefficients using robust standard errors,

95
Q

Logit and Probit models are both inherently _________

A

homoskedastic

96
Q

How can iterations in logistic regression be understood? (real life example)

A

Think of tuning a radio station for a clear signal.

97
Q

Why is the probit model inherently homoskedastic?

A

The probit model, the error term follows a normal (Gaussian) distribution. The normal distribution is characterized by constant variance, which means the spread or dispersion of the errors remains the same across various levels of the predictors.

98
Q
A
99
Q

What’s the analogy for the second iteration in logistic regression? (fine tuning radio example)

A

It’s like fine-tuning the radio to reduce static. The model fine-tunes how predictors affect outcomes, reducing “noise” and improving the understanding of what’s happening in the data.

100
Q

How does log likelihood relate to these iterations?

A

Log likelihood values measure how much clearer the signal (or model fit) gets with each adjustment. The goal is to adjust until further changes don’t significantly improve the model’s clarity, indicating convergence.

101
Q

What does the stata output prob>chi2 for logit and probit models mean?

A

This is the probability of obtaining the chi-square statistic given that the null hypothesis is true. In other words, this is the probability of obtaining this chi-square statistic (71.05) if there is in fact no effect of the independent variables, taken together, on the dependent variable. So a small prob>chi suggests evidence against null

102
Q
A
103
Q

When explaining how there is measurement error, expand the model with the measurement error and trhen compare it to the original one

A
104
Q

Researchers can check whether instruments are strong enough by doing a ______ test

A

F test but the condition here is that the number of instruments is at least equal to the number of endogenous regressors. It’s also worth adding that the condition for the instruments to be strong is that the F value should be greater than 10

105
Q

Can it be tested whether the two instruments are exogenous? If yes explain how, if no explain why

A

We can check instruments for exogeneity when there are more instruments than the number of endogenous regressors. This is because here we have our coefficients are overidentified and it is possible that only one of the instruments is valid.

106
Q

Steps for J test for exogeneity of instruments:

A

(i) Regress the IV-residuals on all exogenous variables TF TM SP IP.
(ii) Calculate the partial F statistic (F) of removing TF and TM from the regression.
(iii) Calculate J = mF = 2F.
(iv) If J > χ2[df = m − k = 2 − 1 = 1; α = 0.05] = 3.84, then reject exogeneity of the
instruments.

107
Q

When they ask you which regression model do you prefer, what Test should you use?

A

The f test for restricted and unrestricted model, it allows you to have a joint hypothesis

108
Q

degrees of freedom for F test=

A

q (THE RESTRICTIONS)

109
Q

What happens when leave out a regressor that is positively correlated with the dependent variable?

A

Since the regressor has a positive effect on the dependent variable, if we leave it out, this means that another regressor will be overestimated as as part of the estimated effect of the regressor is not due to it itself, but rather due to the ommitted variable. To make this assumption we need to find a connection with the ommitted variable and another regressor.

110
Q
A
111
Q

Formula for F test within one sample

A

Remember that numerator is divided by k and denominator is just divided by N

112
Q

Formula for the strength of the instrument when we only have one instrument :

A

F = t squared

113
Q

What 2 conditions need to be true for the ommitted variable bias to be true?

A

Another independent variable in the restricted model should be correlated with the ommited variable AND the ommitted variable should be a determinant of the dependent variable

114
Q

How can I test if an omitted variable is correlated with another independent variable in Stata?

A

You can test this using an F-test comparing two regression models: one with the omitted variable and another without it. Run the unrestricted model with all relevant variables, including the omitted one, then run the restricted model without the omitted variable. Finally, use the test command to assess the joint significance of the omitted variable in the unrestricted model.

115
Q

Instruments that explain little variation in the endogenous regressor X are called __________

A

weak instruments

116
Q

If the F-statistic is less than 10, the instruments are weak such that the TSLS estimate of the coefficient on X is ____ and no _______statistical inference about its true value can be made.

A

biased, valid

117
Q

When does imperfect multicollinearity occur?

A

When a variable is correlated with another explanatory variable

118
Q

Exogeneity is tested using the ____ test

A

J

119
Q
A