Dick butt (week 1-7) Flashcards
What are the assumptions of MLR?
MLR 1: Linear in parameters MLR 2: Random sampling MLR 3: No perfect collinearity MLR 4: Zero conditional mean MLR 5: Homoscedasticity
What does MLR1-4 ensure?
Unbiasedness of the OLS estimators
In a regression Y = beta0 + x1beta1 + x2beta2 + u, if x2 is omitted, which of the following are correct?
A) When beta2 > 0 and corr(x1, x2) > 0, there is a positive bias
B) When beta2 < 0 and corr(x1, x2) > 0, there is a negative bias
C) When beta2 > 0 and corr(x1, x2) < 0, there is a positive bias
D) When beta2 < 0 and corr(x1, x2) < 0, there is a negative bias
E) A and B are correct
F) All of the above are correct
E) A and B are correct
Define the causal effect of x on y
How does variable y change if variable x is changed but all other relevant factors are held constant?
What is cross-sectional data?
Data collected by observing many subjects at one point or period in time
What is time series data?
Observations of a variable or several variables over time
Time series observations are typically…
Serially correlated
What is pooled cross-sectional data?
Two or more cross sections combined into one data set
Cross sections in pooled cross-sectional data are…
Drawn independently of each other
What is panel or longitudinal data?
The same cross-sectional units are followed over time
What are 3 attributes of panel data?
1) Has both cross-sectional and time series dimensions
2) Can be used to account for time-invariant unobservables
3) Can be used to model lagged responses
What does the error term ‘u’ capture?
1) Randomness in behaviour
2) Variables left out of the model
3) Deviations from linearity
4) Errors in measurement
What is the key assumption about the error term in the regression model?
U is mean independent of x: E(u|x) = E(u) i.e. knowing x does not imply anything about u, thus The zero conditional mean independence assumption is: E(u|x) = E(u) = 0
What does the zero conditional mean imply about the expected value of the dependent variable?
This means that the average value of the dependent variable y across the population can be expressed as a linear function of the explanatory variable x
What are regressions?
Linear functions with a constant and slope coefficients which illustrate how y changes as x changes
Which of the following statements about the Zero Conditional Mean Assumption are true?
A) It can be written as E(u|x) = E(u) = 0
B) The error is always centered in our prediction.
C) By calculating the expected value (average) of the disturbance term given the value(s) X, it must equal to the average of u, where the avg. of u = 0.
D) u does not vary with x on average.
E) All of the above
E) All of the above
How are OLS estimates obtained?
1) Fitting a line through the sample points
2) RSS minimized
3) Becomes least squares
How do you derive the OLS estimator?
1) Define fitted values for y and residuals
2) Choose parameters to minimize sum of squares
3) Take derivates of parameters and set them equal to 0, leading to first order conditions
4) Solve for the intercept
5) Then solve for estimated coefficient by substituting the solutions for the intercept
What are the functions of a multiple regression model?
- Explains variable y in terms of variables x1 to xk
- Incorporates more explanatory factors into the model
- Explicitly holds fixed factors that otherwise would be within the disturbance term → makes the conditional mean independence more likely to hold
- Allows for more flexibility in analysis → can hold certain variables fixed to analyse the impact of one particular variable on y
- Simple regression model, there would be an biased estimate where one factor would inherently include the impact of the other that has not been included
Logarithmic models show the elasticities between y and x, while still possibly being linear in parameters
True
False
True
How do you interpret a multiple regression model?
the dependent variable changes if the nth independent variable is increased by one unit, holding all other independent variable and the error term constant (ceteris paribus)
Linear in parameters
In the population, the relationship between y and x is linear
Random Sampling
The data is a random sample drawn from the population
No perfect collinearity
None of the explanatory variables are constant and there are no exact linear relationships among the explanatory variables
Zero conditional mean
The value of the explanatory variables must contain no more information about the mean of the unobserved factors so the regressors must be exogenous
Homoscedasticity
The value of the explanatory variables must contain no information about the variance of the unobserved factors.
Random (or Stochastic) Variable
A measurable function from a set of possible outcomes to a measurable space.
Static Model
A contemporaneous relationship between y and z.
Dynamic Model
A model where the past changes can affect the future.
Temporary change in z
Suppose that z is equal to c in all time periods before time t. At time t, z increases by one unit to c + 1 and then reverts to its previous level at time t + 1.
Normality
The error is independent of the explanatory variables and is normally distributed with zero mean and variance sigma^2.
BLUE
The error is independent of the explanatory variables and is normally distributed with zero mean and variance sigma^2.
The equation E(u|x) = E(u) = 0 implies what about the error?
A) The error is always centered in our prediction
B) The error is usually centered in our prediction
C) The error cannot be predicted
A) The error is always centered in our prediction
What does OLS aim to do?
It aims to find the best possible fit for the regression. That means errors/residuals are as small as possible
How are sample estimates of u (regression residuals) found?
Sample estimates of u (regression residuals) are found by looking at a sample of y values indexed by i (from 1 to n), then removing the fitted (predicted) value of y
For OLS estimators, how do we find the slope coefficient?
the covariance of the values of x and y divided by variance of x
For OLS estimators, how do we find the constant (intercept)?
average of y values, subtracted by the estimated slope coefficient multiplied by the average of x values.
How is the OLS estimator derived?
Step 1: Define fitted values for y and residuals
Step 2: Choose parameters to minimize sum of squares
Step 3: Take derivatives of parameters and set them equal to 0, leading to first order conditions
Step 4: Solve for estimated constant (intercept)
Step 5: Then solve for estimated coefficient parameter by substituting the solution for the intercept
What are 5 attributes of a multiple linear regression model?
1) Explains variable y in terms of variables x1 to xk
2) Incorporates more explanatory factors into the model
3) Explicitly holds fixed factors that otherwise would be within the disturbance term → makes the conditional mean independence more likely to hold
4) Allows for more flexibility in analysis → can hold certain variables fixed to analyse the impact of one particular variable on y
5) In a simple regression model, there would be a biased estimate where one factor would inherently include the impact of the other that has not been included. In multiple linear regression, this is minimized.
The model has to be linear in parameters, not in the variables. Thus logarithmic models can still be linear in parameters.
True
False
True
In a semi-logarithmic model, how do we interpret the regression?
If the regression has log(y), the interpretation of this regression coefficient changes, and becomes the natural logarithm of y → i.e. percentage change in y if x is increased by one unit, given that x is non-logarithmic
In a log-log model, how do we interpret the regression?
Now it is an elasticity → percentage change in y/percentage change in x
Why do we introduce non-linearities? 3 reasons…
1) To estimate different relationships
2) Introducing logarithms may provide a more accurate/relevant interpretation of the true relationship between the variables
3) Fits the data better
The sample average of residuals is always = 0
True
False
True
The sample covariance between each independent variable and the OLS residuals = 0
True
False
True
Which of the following are correct?
A) Sample averages of y and x’s lie on the regression line
B) Sum of squared residuals of y and x’s lie on the regression line
C) The standard errors for each measurement lie on the regression line
A) Sample averages of y and x’s lie on the regression line
Jeopardy- The proportion of the variation in the dependent variable that is explained by the explanatory variable
R^2
A high R^2 does not necessarily indicate that the regression has a causal interpretation
True
False
True
What are the 3 measures of variation?
1) Total Sum of Squares TSS
2) Explained Sum of Squared ESS
3) Residual Sum of Sqaures RSS
What is the decomposition of the total variation?
TSS = SSE + SSR
Total variation = explain part + unexplained part
Which of the following are true?
A) SSR never increases when we add additional explanatory variables to the model, thus R^2 will never decrease if another explanatory variable is added
B) An increase in R^2 is not a good tool for deciding if an additional variable should be included
C) Even if the R^2 is small, the regression may still provide good estimates of ceteris paribus effects
D) All of the above
E) None of the above
D) All of the above