Dick butt (week 1-7) Flashcards

1
Q

What are the assumptions of MLR?

A
MLR 1: Linear in parameters
MLR 2: Random sampling 
MLR 3: No perfect collinearity
MLR 4: Zero conditional mean
MLR 5: Homoscedasticity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does MLR1-4 ensure?

A

Unbiasedness of the OLS estimators

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In a regression Y = beta0 + x1beta1 + x2beta2 + u, if x2 is omitted, which of the following are correct?

A) When beta2 > 0 and corr(x1, x2) > 0, there is a positive bias
B) When beta2 < 0 and corr(x1, x2) > 0, there is a negative bias
C) When beta2 > 0 and corr(x1, x2) < 0, there is a positive bias
D) When beta2 < 0 and corr(x1, x2) < 0, there is a negative bias
E) A and B are correct
F) All of the above are correct

A

E) A and B are correct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define the causal effect of x on y

A

How does variable y change if variable x is changed but all other relevant factors are held constant?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is cross-sectional data?

A

Data collected by observing many subjects at one point or period in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is time series data?

A

Observations of a variable or several variables over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Time series observations are typically…

A

Serially correlated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is pooled cross-sectional data?

A

Two or more cross sections combined into one data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Cross sections in pooled cross-sectional data are…

A

Drawn independently of each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is panel or longitudinal data?

A

The same cross-sectional units are followed over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are 3 attributes of panel data?

A

1) Has both cross-sectional and time series dimensions
2) Can be used to account for time-invariant unobservables
3) Can be used to model lagged responses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the error term ‘u’ capture?

A

1) Randomness in behaviour
2) Variables left out of the model
3) Deviations from linearity
4) Errors in measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the key assumption about the error term in the regression model?

A

U is mean independent of x: E(u|x) = E(u) i.e. knowing x does not imply anything about u, thus The zero conditional mean independence assumption is: E(u|x) = E(u) = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does the zero conditional mean imply about the expected value of the dependent variable?

A

This means that the average value of the dependent variable y across the population can be expressed as a linear function of the explanatory variable x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are regressions?

A

Linear functions with a constant and slope coefficients which illustrate how y changes as x changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which of the following statements about the Zero Conditional Mean Assumption are true?

A) It can be written as E(u|x) = E(u) = 0
B) The error is always centered in our prediction.
C) By calculating the expected value (average) of the disturbance term given the value(s) X, it must equal to the average of u, where the avg. of u = 0.
D) u does not vary with x on average.
E) All of the above

A

E) All of the above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How are OLS estimates obtained?

A

1) Fitting a line through the sample points
2) RSS minimized
3) Becomes least squares

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How do you derive the OLS estimator?

A

1) Define fitted values for y and residuals
2) Choose parameters to minimize sum of squares
3) Take derivates of parameters and set them equal to 0, leading to first order conditions
4) Solve for the intercept
5) Then solve for estimated coefficient by substituting the solutions for the intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the functions of a multiple regression model?

A
  • Explains variable y in terms of variables x1 to xk
  • Incorporates more explanatory factors into the model
  • Explicitly holds fixed factors that otherwise would be within the disturbance term → makes the conditional mean independence more likely to hold
  • Allows for more flexibility in analysis → can hold certain variables fixed to analyse the impact of one particular variable on y
  • Simple regression model, there would be an biased estimate where one factor would inherently include the impact of the other that has not been included
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Logarithmic models show the elasticities between y and x, while still possibly being linear in parameters

True
False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do you interpret a multiple regression model?

A

the dependent variable changes if the nth independent variable is increased by one unit, holding all other independent variable and the error term constant (ceteris paribus)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Linear in parameters

A

In the population, the relationship between y and x is linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Random Sampling

A

The data is a random sample drawn from the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

No perfect collinearity

A

None of the explanatory variables are constant and there are no exact linear relationships among the explanatory variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Zero conditional mean

A

The value of the explanatory variables must contain no more information about the mean of the unobserved factors so the regressors must be exogenous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Homoscedasticity

A

The value of the explanatory variables must contain no information about the variance of the unobserved factors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Random (or Stochastic) Variable

A

A measurable function from a set of possible outcomes to a measurable space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Static Model

A

A contemporaneous relationship between y and z.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Dynamic Model

A

A model where the past changes can affect the future.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Temporary change in z

A

Suppose that z is equal to c in all time periods before time t. At time t, z increases by one unit to c + 1 and then reverts to its previous level at time t + 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Normality

A

The error is independent of the explanatory variables and is normally distributed with zero mean and variance sigma^2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

BLUE

A

The error is independent of the explanatory variables and is normally distributed with zero mean and variance sigma^2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

The equation E(u|x) = E(u) = 0 implies what about the error?

A) The error is always centered in our prediction
B) The error is usually centered in our prediction
C) The error cannot be predicted

A

A) The error is always centered in our prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What does OLS aim to do?

A

It aims to find the best possible fit for the regression. That means errors/residuals are as small as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

How are sample estimates of u (regression residuals) found?

A

Sample estimates of u (regression residuals) are found by looking at a sample of y values indexed by i (from 1 to n), then removing the fitted (predicted) value of y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

For OLS estimators, how do we find the slope coefficient?

A

the covariance of the values of x and y divided by variance of x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

For OLS estimators, how do we find the constant (intercept)?

A

average of y values, subtracted by the estimated slope coefficient multiplied by the average of x values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

How is the OLS estimator derived?

A

Step 1: Define fitted values for y and residuals
Step 2: Choose parameters to minimize sum of squares
Step 3: Take derivatives of parameters and set them equal to 0, leading to first order conditions
Step 4: Solve for estimated constant (intercept)
Step 5: Then solve for estimated coefficient parameter by substituting the solution for the intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What are 5 attributes of a multiple linear regression model?

A

1) Explains variable y in terms of variables x1 to xk
2) Incorporates more explanatory factors into the model
3) Explicitly holds fixed factors that otherwise would be within the disturbance term → makes the conditional mean independence more likely to hold
4) Allows for more flexibility in analysis → can hold certain variables fixed to analyse the impact of one particular variable on y
5) In a simple regression model, there would be a biased estimate where one factor would inherently include the impact of the other that has not been included. In multiple linear regression, this is minimized.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

The model has to be linear in parameters, not in the variables. Thus logarithmic models can still be linear in parameters.

True
False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

In a semi-logarithmic model, how do we interpret the regression?

A

If the regression has log(y), the interpretation of this regression coefficient changes, and becomes the natural logarithm of y → i.e. percentage change in y if x is increased by one unit, given that x is non-logarithmic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

In a log-log model, how do we interpret the regression?

A

Now it is an elasticity → percentage change in y/percentage change in x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Why do we introduce non-linearities? 3 reasons…

A

1) To estimate different relationships
2) Introducing logarithms may provide a more accurate/relevant interpretation of the true relationship between the variables
3) Fits the data better

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

The sample average of residuals is always = 0

True
False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

The sample covariance between each independent variable and the OLS residuals = 0

True
False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Which of the following are correct?

A) Sample averages of y and x’s lie on the regression line
B) Sum of squared residuals of y and x’s lie on the regression line
C) The standard errors for each measurement lie on the regression line

A

A) Sample averages of y and x’s lie on the regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Jeopardy- The proportion of the variation in the dependent variable that is explained by the explanatory variable

A

R^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

A high R^2 does not necessarily indicate that the regression has a causal interpretation

True
False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What are the 3 measures of variation?

A

1) Total Sum of Squares TSS
2) Explained Sum of Squared ESS
3) Residual Sum of Sqaures RSS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

What is the decomposition of the total variation?

A

TSS = SSE + SSR

Total variation = explain part + unexplained part

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Which of the following are true?
A) SSR never increases when we add additional explanatory variables to the model, thus R^2 will never decrease if another explanatory variable is added
B) An increase in R^2 is not a good tool for deciding if an additional variable should be included
C) Even if the R^2 is small, the regression may still provide good estimates of ceteris paribus effects
D) All of the above
E) None of the above

A

D) All of the above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

R^2 is equal to the squared correlation coefficient between the actual and fitted value of the independent variable

True
False

A

False

R^2 is equal to the squared correlation coefficient between the actual and fitted value of the DEPENDENT variable

53
Q

What is the difference between the estimator and the estimate?

A

The estimator acts as a rule that assigns each possible outcome of the sample a value of Theta, whereas the estimate is a numerical value taken on by an estimator in a particular sample of data

54
Q

Jeopardy- An estimator whose expected value equals the population value

A

Unbiased estimator

55
Q

What are the 5 assumptions of simple linear regression

A

Assumption SLR.1 → Linear in parameters
Assumption SLR.2 → Random sampling
Assumption SLR.3 → Sample variation in explanatory variable; the values of explanatory variables are not all the same (otherwise it would be impossible to study how different values of x-variables leads to different y-variable values
Assumption SLR.4 → Zero conditional mean
Assumption SLR.5- Homoskedasticity: the value of the explanatory variable must contain no information about the variability of the unobserved factors

56
Q

On average, estimated coefficients will be equal to…

A

The values that characterise the true relationship between y and x in the population

57
Q

Estimates will have different sampling variabilities, how is this measured?

A

This is measured by the variances of the estimators of constant and slope coefficients

58
Q

If the variability is very large, it could indicate that the results are insignificant, why?

A

With larger variability, standard errors will increase, this impacts the value of the inferences we can make about the data.

59
Q

The sampling variability of the estimated regression coefficients will be…

  1. (Larger/Smaller) for a higher variability in unobserved influences
  2. (Larger/Smaller) for a higher variability in explanatory variables
A
  1. Larger for a higher variability in unobserved influences

2. Smaller for a higher variability in explanatory variables

60
Q

If unbiased, expected value of the estimated variance is equal to the true population variance

True
False

A

True

61
Q

What are standard errors and what do they do?

A

They are estimated standard deviations of the regression coefficients. They measure how precisely the regression coefficients are estimated.

62
Q

What do we mean when we say that OLS is unbiased under MLR.1-MLR.4?

A

We mean that the procedure by which the OLS estimates are obtained is unbiased when that procedure is applied across all possible random samples.

63
Q

MLR.3 - No perfect collinearity

This only rules out perfect collinearity between exp. Variables, imperfect correlation is allowed

True
False

A

True

64
Q

Should we eliminate an explanatory variable that is a perfect linear combination of another exp. Variable?

Yes
No

A

Yes

65
Q

Why are constant variables ruled out from the regression based on MLR.3 no perfect collinearity?

A

Because they are collinear with the intercept

66
Q

What does it mean if a variable is endogenous?

A

Explanatory variables that are correlated with the error term

67
Q

What are endogenous variables in violation of and why?

A

MLR.4 Zero Conditional Mean, because we expect the expected values of all observations to be uncorrelated with u and thus = 0, whereas endogenous variables are variables that correlation with error term u.

68
Q

What does it mean if a variable is exogenous?

A

Explanatory variables that are uncorrelated with the error term are called exogenous

69
Q

TSS automatically increases as n increases

True
False

A

True

70
Q

What is a random variable?

A

A random variable is a variable that takes on a set of all possible numerical values that are determined probabilistically

  • Since a random variable is a collection of possibilities, only a realization of a random variable is observed.
  • Uppercase letter denote RVs and lowercase letters denote their realizations
71
Q

What are continuous random variables?

A

A continuous random variable is one that takes on any real value with zero probability and thus is taking any value within a range

72
Q

How do we show discrete random variables?

A

the sum of PDFs over all values of xi such that xi ≤ x

73
Q

What is inference?

A

A conclusion reached on the basis of evidence and reasoning

74
Q

What is the purpose of statistical inference?

A

To obtain information about a population from information contained in a sample

75
Q

What is random sampling?

A

each individual in population is collected randomly and are independently and identically distributed

76
Q

What is stratified sampling?

A

divide population up into some non-overlapping groups then do a simple random sampling from each group

77
Q

What is cluster sampling?

A

population is divided into groups, some groups are randomly selected, then all individuals within the group are measured

78
Q

What is cluster sampling?

A

population is divided into groups, some groups are randomly selected, then all individuals within the group are measured

79
Q

The t-stat will be used to test the null hypothesis → the farther the estimated coefficient is away from zero, the less likely it is that the null hypothesis holds true

True
False

A

True

80
Q

What does the t-stat measure?

A

the t-stat measures how many estimated SDs the estimated coefficient is away from zero

81
Q

How would you discuss statistical significance?

A
  1. If a variable is statistically significant, discuss the magnitude of the coefficient to get an idea of its economic or practical importance
  2. The fact that a coefficient is statistically significant does not necessarily mean it is economically or practically significant!
82
Q

How would you discuss economic significance?

A
  1. If a variable is statistically and economically important but has the ``wrong“ sign, the regression model might be misspecified … OR this might be the truth!
  2. If the sample size is small, effects might be imprecisely estimated so that things which are economically important may still be statistically insignificant.
83
Q

Define p-value

A

The p-value is the smallest significant level at which the null hypothesis would be rejected → an alternative to the classical approach to hypothesis testing

84
Q

What is testing for exclusion restrictions?

A

When testing multiple linear restrictions using F-tests, it is possible that a group of variables have no effect on the dependent variable

85
Q

After testing multiple linear restrictions, we find that none of the variables are statistically significant individually… What do we do?

A

we can test how the model would fit if these variables were dropped from the regression (testing for exclusion of multiple variables) → restricted model
i. Check the RSS → if it increases, test if this increase is statistically significant

86
Q

A f-distributed variable only takes on positive values

True
False

A

True

87
Q

What are the properties of OLS that hold for any sample/sample size?

A

Expected values/unbiasedness under MLR.1 – MLR.4
Variance formulas under MLR.1 – MLR.5
Gauss-Markov Theorem under MLR.1 – MLR.5
Exact sampling distributions/tests under MLR.1 – MLR.6

88
Q

What are the properties of OLS that hold in large samples?

A
  1. Consistency under MLR.1 – MLR.4

2. Asymptotic normality/tests under MLR.1 – MLR.5 (without assuming normality of the error term)

89
Q

What do we mean by consistency?

A

An estimator is consistent if the estimate converges in probability to the true population value

90
Q

How do we interpret consistency?

A

Consistency means that

  • the probability that the estimate is arbitrarily close to the
  • true population value can be made arbitrarily high by increasing the sample size.
  • Consistency is a minimum requirement for sensible estimators
  • As the sample size grows large, it is more and more unlikely for an estimator to be far away from the true values.
  • With larger sample size, we have more information and the estimator should get closer and closer (in probability sense) to its true value.
91
Q

If an estimator is not consistent then it does not help us learn about the parameter of interest, even with an unlimited amount of data

True
False

A

True

92
Q

Unbiased estimators are not necessarily consistent, but those whose variances shrink to zero as the sample size grows are consistent.

True
False

A

True

93
Q

In large samples, the standardized estimates are normally distributed

True
False

A

True

94
Q

What are discrete dummy variables?

A

Discrete = takes on two values (e.g. yes/no; male/female)

95
Q

What are categorical dummy variables?

A

Categorical = takes on a limited number of values (e.g states)

96
Q

What do dummy variables do?

A

Dummy (or Indicator) Variables are qualitative measures indicating the presence or absence of an attribute or category.

97
Q

What are 3 important impacts of adding a dummy variable to a regression?

A
  1. The inclusion of a dummy variable allows us to estimate separate intercepts, but the same slope, for different groups
  2. The intercept depends on whether d=0 or d=1
    𝛽0 intercept for category assigned to 0 (the base category) – (𝛽0 + 𝛿0) intercept for category assigned to 1
  3. dummy variable coefficient 𝛿0 measures the difference in the intercept between the two groups
98
Q

What is the dummy variable trap?

A

If the model has a constant term, this will lead to perfect collinearity between the explanatory variables; the constant term is an x variable which takes the value 1 for all observations. This leads to perfect collinearity, and the model cannot be estimated.

99
Q

How do we avoid the dummy variable trap?

A

Omit one category- for gender, which has 2 categories, either male or female must be omitted If we have a categorical variable with m categories: => include (m − 1) dummy variables to avoid DVT

OR

e.g., for states, which has 6 categories, only include 5 dummy variables

100
Q

How do you test for differences across groups? Say you want to test whether females are different from males in the model.

A
  1. Create unrestricted and restricted models
  2. Null hypothesis → all interaction effects are zero, that is the same regression coefficients apply to men and women
  3. To test in a group, use f-test

OR

  1. Run separate regressions for men and for women; the unrestricted SSR is given by the sum of the SSR of these two regressions
  2. Run regression for the restricted model and store SSR
  3. If the test is computed in this way it is called the Chow-Test

Important: Test assumes a constant error variance across groups

101
Q

What is a linear probability model (LPM)?

A

Linear regression when the dependent variable is binary

102
Q

What are the disadvantages of the LPM?

A
  1. Predicted probabilities may be larger than one or smaller than zero.
  2. Marginal probability effects are sometimes logically impossible.
  3. The linear probability model is necessarily heteroskedastic so heteroskedasticity consistent standard errors need to be computed.
103
Q

What are the advantages of the LPM?

A
  1. Easy estimation and interpretation

2. Estimated effects and predictions are often reasonably good in practice.

104
Q

What are the types of specification errors?

A
  1. Choice of independent variables
    - Over-specification & sampling variance effects
    - Omitted variables & Endogeneity
  2. Heteroskedasticity
  3. Functional Form
  4. Measurement Error
  5. Missing Data, Non-random Sampling & Outliers
105
Q

What are some issues with over-specifying a model and why?

A

Even though it satisfies MLR.1-MLR.4,

  1. X3 may be correlated with x1 and x2
  2. x3 has no effect on y after we control for x1 and x2

Inclusion of x3 has no cost in terms of bias in the estimates of any of the parameters, because E(b3hat) = b3 = 0
- However, including irrelevant variables may increase the - sampling variance (more on this shortly)

106
Q

What happens if we omit a relevant variable?

A

Omitting a relevant variable causes bias when the omitted variable is correlated with any of the other explanatory variables in the model. The estimators are also inconsistent.

107
Q

Including an irrelevant variable does not cause bias, but omitting a relevant variable does.

True
False

A

True

108
Q

What are the sources of endogeneity?

A
  1. Omitted Variables
    - In many cases important characteristics cannot be observed AND these are often correlated with observed explanatory information.
  2. Measurement error: variables are measured with error
  3. Simultaneity: two or more variables are simultaneously determined
    - X causes Y but Y also causes X, X is jointly determined with Y
    - Quantity and price by demand and supply
    - Investment And Productivity
    - Sales and advertising
109
Q

What is the result of endogeneity?

A

The OLS estimator is biased and inconsistent

110
Q

What are solutions for endogeneity?

A
  • Proxy variables method for omitted regressors (W 9.2)
  • IV is the most well-known method to address endogeneity problems
  • Fixed effects methods if 1) panel data is available, 2) endogeneity is time-constant, and 3) regressors are not time-constant
  • Random effects methods 1) again need panel data; 2) requires stronger assumptions
111
Q

What are proxy variables and why do we use them?

A

These are variables that are used instead of the variable of interest when that variable of interest cannot be measured directly.

112
Q

What are the assumptions necessary for the proxy variable method to be valid?

A
  1. We hope there is at least an imperfect linear relationship between the proxy and the unobserved variable
  2. The error is uncorrelated with all the explanatory variables (𝑥1,𝑥2 and 𝑥3∗) AND uncorrelated with the proxy 𝑥3
    a) ZCM assumption for all variables used in the model
    b) In other words, the proxy is “just a proxy” for the omitted variable, it does not belong into the population regression and it is uncorrelated with the population regression error
  3. The proxy variable is a “good” proxy for the omitted variable
    a) Correlated With the omitted Variable
    b) And Using Other variables in addition will not help to predict the omitted variable
113
Q

What is functional form misspecification?

A

The regression model suffers from functional form misspecification when it does not properly account for the relationship between the dependent variable and the (observed) explanatory variables
E.g. a key variable has been omitted & that variable is a function of the other variable(s) in the model

114
Q

How do you test for functional form misspecification?

A

Do the RESET test

115
Q

What are the limitations of the RESET Test?

A
  • It does not provide direction on how to proceed if a model is rejected. Just tells you the current one is misspecified.
  • You might use it to decide between two possible models – but the test may not provide a clear winner – could accept both or reject both…
116
Q

What are nested vs. non-nested tests?

A
  1. Nested tests → use F-tests for exclusion restrictions

2. Non-nested tests → where the alternative model has different explanatory variables

117
Q

How do you conduct a non-nested test?

A
  1. Obtain the fitted values from the alternative model, and include as one of the explanatory variables in the null model.
  2. If the null model is correct, the coefficient on the fitted value from the other model should be insignificant; if not, reject the null model.
118
Q

What is measurement error?

A

Sometimes we have the variable we want, but it may be measured with error.

119
Q

What are the consequences of Measurement Error in the dependent variable?

A
  1. If e0 and xj (as well as xj with u) are uncorrelated, OLS is unbiased and consistent
  2. While unbiased, variances are larger than with no measure error
  3. New composite error: u + e0 • If E(e0) ≠ 0 then b0 will be biased – not particularly worrying
120
Q

What are the consequences of Measurement Error in the independent variable?

A
  1. Under CEV assumption, OLS is biased and inconsistent because the mismeasured variable is endogenous
  2. The effect of the mismeasured variable suffers from attenuation bias, i.e. the magnitude of the effect will be attenuated towards zero
  3. In addition, if it is multivariate regression, the effects of the other explanatory variables will be biased and inconsistent due to the measurement error in 𝑥1∗, unless they are uncorrelated with 𝑥1
121
Q

What is missing data and what are the consequences?

A

Missing data is a special case of sample selection, such as non random sampling. If the data is missing at random – it is just as though you have a smaller sample: Your findings will just be less precisely estimated. If the data is missing in a non-random way, then we violate
our Random Sampling assumption.

122
Q

If the sample selection is based on independent variables there is no problem, because the regression conditions on the independent variables

True
False

A

True

123
Q

In general, sample selection is not a problem if it is uncorrelated with the error term of a regression = exogenous sample selection

True
False

A

True

124
Q

Sample selection is a problem, if it is based on the dependent variable or on the error term = endogenous sample selection

True
False

A

True

125
Q

What are the issues with having extreme values and outliers?

A
  1. Extreme values and outliers may be a particular problem for OLS because the method is based on squaring deviations
  2. If outliers are the result of mistakes that occured when keying in the data, one should just discard the affected observations
  3. If outliers are the result of the data generating process, the decision whether to discard the outliers is not so easy
126
Q

What is least absolute deviations estimation (LAD)?

A

The least absolute deviations estimator minimizes the sum of absolute deviations (instead of the sum of squared deviations, i.e. OLS)

127
Q

What are the advantages of LAD?

A
  1. It may be more robust to outliers as deviations are not squared
  2. The LAS estimator estimates the parameters of the conditional median (instead of the conditional mean with OLS)
  3. Is a special case of quantile regression,
    which estimates parameters of conditional quantiles
128
Q

When should we use LAD?

A

Use LAD when you have an outlier but you still want to account for everything → OLS estimator will be biased, so use LAD instead

129
Q

What are the disadvantages of using LAD?

A
  1. More computationally intensive than OLS
  2. All statistical inference involving the LAD estimators is justified only as the sample size grows.
  3. It does not always consistently esti- mate the parameters appearing in the conditional mean function
  4. LAD is intended to estimate the effects on the conditional median. Generally, the mean and median are the same only when the distribution of y given the covariates x1, p, xk is symmetric about b0 1 b1x1 1 p 1 bkxk. (Equivalently, the population error term, u, is symmetric about zero.)
  5. When LAD and OLS are applied to cases with asymmetric distributions, the estimated partial effect of, say, x1, obtained from LAD can be very different from the partial effect obtained from OLS.