Econometrics - Theory Flashcards

Question

How can the Sample Selection Bias problem be described?

Answer 1

It can be viewed either as a consequence of nonrandom sampling or as a missing data issue. For instance, a random sample of car owners with phones isn’t the same as a random sample of voters.

Answer 2

The best solution is to design studies to avoid it. For instance, estimating the mean height of undergraduates should involve a random sample of all undergraduates, not just those entering a basketball court.

Answer 3

Health insurance is compulsory in the Netherlands, so there is no problem with the selectivity of the sample if the sample is randomly drawn from all insurance companies.

Answer 4

Assuming a homoskedastic wi and since the LSA conditions hold, the standard errors are calculated correctly and therefore also the confidence intervals.

Answer 5

t-values and p-values and then look at F test for UR and R

Answer 6

B1 kapelusz - B1,0 / SE (B kapeluszek)

Answer 7

Yes, because Amsterdam will not be representative of the entire Dutch population as there are a lot of students and expats. Furthermore, young people tend to be taller. Furthermore, people from below the large rivers (Lek, Waal and Maas) are known to be shorter than those from above these rivers.

Answer 8

Real X + w (measurement error term)

Answer 9

strong evidence for heteroskedasticity

Answer 10

If the measurement error term, w, is homoskedastic, and if the LSA conditions hold

Answer 11

When a redundant variable is added and it's correlated with other variables, it leads to inefficiency in the model. For instance, if the added variable is negatively correlated with one variable (let's say 'Jap'), it might be positively correlated with another variable ('Time'). As a consequence, the standard deviations of the coefficient estimators for 'Jap' and 'Time' increase, making these estimators less accurate. This means that the t-ratios move towards 0 or the standard errors become larger.

Answer 12

the covariance between x and z is unequal to zero and the covariance between z and error term is equal to zero

Answer 13

Instruments need to satisfy two critical conditions: exogeneity (uncorrelated with the error term) and relevance (correlated with the endogenous variable). Exogenous variables, by definition, are uncorrelated with the error term but using them as instruments might satisfy the relevance condition required for a valid instrument.

Answer 14

Exogeneity --> uncorrelated with the error term Relevance --> correlated with the explanatory variable The instrument cannot be a part of the initial regression model

Answer 15

Exogenous variables are independent, and endogenous variables are dependent. Therefore, if the variable does not depend on variables within the model, it's an exogenous variable. If the variable depends on variables within the model, though, it's endogenous.

Answer 16

This command runs an instrumental variable regression where S is the dependent variable, T is the endogenous regressor, TF and TM are the instruments for T, while SP and IP are exogenous variables. It's specifying that T is endogenous and should be instrumented by TF and TM.

Answer 17

ivreg S (T=TM TF) SP IP, wherein s is the dependent variable, T is the endogenous one which is instrumented by tm and tf and sp and ip are the exogenous variables

Answer 18

Endogeneity refers to a situation where an independent variable is correlated with the error term, leading to biased and inconsistent regression estimates due to omitted variable bias or simultaneous causation.

Answer 19

The first stage regression in 2SLS aims to predict the potentially endogenous variable using instrumental variables, thereby creating adjusted values that aren't correlated with the error term.

Answer 20

Instruments in the first stage of 2SLS are variables chosen for their lack of correlation with the error term but correlation with the potentially endogenous variable. For instance, TF and TM might be instruments for predicting T.

Answer 21

Predicted values are saved in the first stage to create a new variable (TFIT) that contains the predicted values of the potentially endogenous variable (T).

Answer 22

The second stage regression in 2SLS seeks to estimate the relationship between the endogenous regressor and the predicted values of the potentially endogenous variable while controlling for exogenous variables.

Answer 23

By using the predicted values of the potentially endogenous variable from the first stage, the second stage regression eliminates the endogeneity problem, providing unbiased estimates of the effect of the potentially endogenous variable on the dependent variable.

Answer 24

Instrumental variables help separate the correlation between the potentially endogenous variable and the error term, allowing estimation of causal relationships in the presence of endogeneity.

Answer 25

Endogenous variables are correlated with the error term, causing potential bias, while exogenous variables are not correlated with the error term and aren't influenced by other variables in the model.

Answer 26

The 2SLS method contributes to obtaining unbiased estimates by first predicting the potentially endogenous variable using instruments in the first stage, then using these predicted values in the second stage to estimate the relationship between the variables, addressing endogeneity concerns.

Answer 27

The purpose of testing instrument strength in 2SLS regression is to assess whether the chosen instruments (TF and TM) are sufficiently correlated with the potentially endogenous variable (T). thats why t is regressed on tf and tm ( to check for non-zero covariance)

Answer 28

Researchers can assess instrument strength in Stata by using the regress command to estimate the first stage regression and then employing the test command to check the joint significance of the instruments.

Answer 29

The command regress T TF TM SP IP runs a regression where T is regressed on TF, TM, SP, and IP, evaluating the relationship between the potentially endogenous variable and its instruments along with exogenous variables.

Answer 30

The F-statistic obtained from the test command helps evaluate the joint significance of the instruments. A larger F-statistic indicates greater explanatory power of the instruments in predicting T.

Answer 31

An F-test is used to assess joint significance because it checks whether both instruments together significantly contribute to explaining the variation in the potentially endogenous variable, unlike a t-test that examines individual coefficients.

Answer 32

A larger F-value suggests that the instruments (TF and TM) are stronger and more relevant in predicting the potentially endogenous variable T, providing more support for their validity in addressing endogeneity.

Answer 33

A significant F-statistic strengthens the credibility of the instruments in 2SLS regression, indicating that they are sufficiently strong and relevant for predicting the potentially endogenous variable, thereby enhancing the reliability of the instrumental variable approach in addressing endogeneity.

Answer 34

When the number of instruments exceeds the number of endogenous variables

Answer 35

instruments because the more instruments the larger the variance of the estimator, thereby it is less efficient

Answer 36

number of instruments minus the number of endogenous regressors

Answer 37

m*F , where m is the number of instruments

Answer 38

It becomes impossible to statistically test the hypothesis of instrument exogeneity when there are as many instruments as there are endogenous regressors, making it exactly identified SO IN SHORT IF M=K

Answer 39

if the coefficients are overidentified, it is possible to test the overidentifying restrictions— that is, to test the hypothesis that the “extra” instruments are exogenous under the maintained assumption that there are enough valid instruments to identify the coefficients of interest.

Answer 40

The instrument should be correlated with the endogenous regressor

Answer 41

the instrument should be uncorrelated with error term u, or in other words, there should be no direct effect of the instrument on the dependent variable Y through u (the error term).

Answer 42

distribuution

Answer 43

at least equal

Answer 44

remember to show this mathematically as well - S = β0 + β1(T − ν) + β2SP + β3IP + u ∗

Answer 45

type of credit card, choice of streaming program,

Answer 46

schooling level,

Answer 47

employed, having savings

Answer 48

A linear model isn't ideal for probabilities because it can predict values beyond the bounds of 0 and 1, which are the limits for probabilities. This can lead to unrealistic predictions such as probabilities greater than 1 or less than 0. SO FITTED/PREDICTED VALUES MIGHT BE OUTSIDE INTERVAL (0,1)

Answer 49

Predicted probabilities outside this range can be nonsensical (less than 0 or greater than 1), challenging the fundamental laws of probability. This can lead to unrealistic interpretations of event likelihoods.

Answer 50

This is because the error term can only take on two values for Y=1 and Y=0 and therefore the error term cannot be normally distributed, so using a normal distribution will be a poor approximation here ; therefore the least squares is not efficient

Answer 51

dependent, binary (thus dummy)

Answer 52

the bounded range of the probability means that the logit model gives much more consistent results than the LPM

Answer 53

The LPM's coefficients directly represent how the probability of an event (binary dependent variable) changes for every one-unit change in an independent variable, making it easy to understand and communicate the impact of regressors

Answer 54

The LPM's assumption of a linear relationship between independent variables and the probability of the dependent variable simplifies modeling and interpretation in scenarios where this linear approximation adequately captures the relationship.

Answer 55

Assuming certain conditions, like no omitted variable bias, no multicollinearity, and no endogeneity, estimators in the LPM are unbiased, indicating they are, on average, accurate in estimating true population parameters. Additionally, they are consistent, becoming more precise with larger sample sizes.

Answer 56

The model's straightforward linear structure simplifies analysis and comprehension, making it accessible for those seeking a basic but interpretable approach to studying relationships between variables.

Answer 57

Using Maximum Likelihood (

Answer 58

The shortcoming of the LPM in that the predicted values can be outside the (0,1) interval/bound.

Answer 59

we look at the derivative of y with respect to the explanatory variables

Answer 60

constant, B (estimated slope); "

Answer 61

robust or not robust, only standard error changes !

Answer 62

Robust regression in Stata is valuable for linear probability models, especially when dealing with binary outcomes (0 or 1). It helps address issues like heteroscedasticity and outliers in the data. By employing robust regression techniques, the analysis produces more reliable coefficients and standard errors, mitigating the impact of outliers and potential biases.

Answer 63

regress y x1 x2, robust In this syntax, y represents the binary dependent variable, while x1 and x2 denote independent variables. The addition of robust prompts Stata to estimate coefficients using robust standard errors,

Answer 64

homoskedastic

Answer 65

Think of tuning a radio station for a clear signal.

Answer 66

The probit model, the error term follows a normal (Gaussian) distribution. The normal distribution is characterized by constant variance, which means the spread or dispersion of the errors remains the same across various levels of the predictors.

Answer 67

It's like fine-tuning the radio to reduce static. The model fine-tunes how predictors affect outcomes, reducing "noise" and improving the understanding of what's happening in the data.

Answer 68

Log likelihood values measure how much clearer the signal (or model fit) gets with each adjustment. The goal is to adjust until further changes don't significantly improve the model's clarity, indicating convergence.

Answer 69

This is the probability of obtaining the chi-square statistic given that the null hypothesis is true. In other words, this is the probability of obtaining this chi-square statistic (71.05) if there is in fact no effect of the independent variables, taken together, on the dependent variable. So a small prob>chi suggests evidence against null

Answer 70

F test but the condition here is that the number of instruments is at least equal to the number of endogenous regressors. It's also worth adding that the condition for the instruments to be strong is that the F value should be greater than 10

Answer 71

We can check instruments for exogeneity when there are more instruments than the number of endogenous regressors. This is because here we have our coefficients are overidentified and it is possible that only one of the instruments is valid.

Answer 72

(i) Regress the IV-residuals on all exogenous variables TF TM SP IP. (ii) Calculate the partial F statistic (F) of removing TF and TM from the regression. (iii) Calculate J = mF = 2F. (iv) If J > χ2[df = m − k = 2 − 1 = 1; α = 0.05] = 3.84, then reject exogeneity of the instruments.

Answer 73

The f test for restricted and unrestricted model, it allows you to have a joint hypothesis

Answer 74

q (THE RESTRICTIONS)

Answer 75

Since the regressor has a positive effect on the dependent variable, if we leave it out, this means that another regressor will be overestimated as as part of the estimated effect of the regressor is not due to it itself, but rather due to the ommitted variable. To make this assumption we need to find a connection with the ommitted variable and another regressor.

Answer 76

Remember that numerator is divided by k and denominator is just divided by N

Answer 77

F = t squared

Answer 78

Another independent variable in the restricted model should be correlated with the ommited variable AND the ommitted variable should be a determinant of the dependent variable

Answer 79

You can test this using an F-test comparing two regression models: one with the omitted variable and another without it. Run the unrestricted model with all relevant variables, including the omitted one, then run the restricted model without the omitted variable. Finally, use the test command to assess the joint significance of the omitted variable in the unrestricted model.

Answer 80

weak instruments

Answer 81

biased, valid

Answer 82

When a variable is correlated with another explanatory variable