Metrics Flashcards
Multivariate linear regression assumptions
-MLR1. Linear in parameters, the relationship between explanatory and explained variable is linear + an error term.
-MLR2. Random sampling, a random sampling method is used to estimate population parameters.
-MLR3. No perfect collinearity, .Independent/explanatory variables cannot be perfectly correlated, example multiples of one another. Multi collienartiy doesnt bias estimators but increases variance.
-MLR 4. Zero conditional mean assumption, states there is no correlation between independent variables and the unobserved variables captured by U. E(u|x1, x2, …, xk) = E(u) = 0. No correlation exists between variables captured by error term and independent variables. MLR 4 holds is all independent variables are exogenous.
-MLR 5. Homoskedasticity assumes the variances of the error term is constant. If this is violated heterosledastity exists within the model.
-The assumptions above, which are used to show the unbiasedness of the multivariate regression model are typically known as the Gauss-Markov assumptions.
-If all. hold, The Gauss-Markov Theorem states OLS estimated parameters are the best unbiased linear estimates BLUE
Describe heteroskedaskitiy and consequences surrounding it
-Heteroskedacitiy occurs when there is non constant variance within the error terms in a regression model.
-Consequences:
-Heteroskedasticity doesn’t bias the estimated coefficient values, however estimates will be inefficient (higher SE’s)
-Incorrect statistical inference, presence of heteroskedacitiy leads to the SE’s being incorrectly estimated/interpreted.
-Result in incorrect inference due to inflated T and F statistics. More likely to mistakenly reject H0.
-Results Hypothesis tests based on standard errors will be biased. incorrect conclusions about potential causal relationships between variables, interpretation is biassed.
Interpretation of R^2
-R^2 is a measure of goodness of fit, measures the proportion of variance in the dependent variable explained by the independent variables/OLS regression. Measures how well the estimated regression fits the data.
-Never decreases as more independent variables are added.
Eg) R^2 of 0.66 = 66% of the variance in the dependent variable is represented/accounted for by the estimated regression/independent variables. ADD context
Describe OLS method, ie how OLS estimators are obtained
-Method of OLS is used to find the sample estimates of the population coefficients for the independent variables and produce a linear relationship between this variables.
-A residual is defined as the difference between the actual values recorded and the estimated/fitted values (values predicted by the model)
-The OLD method minimises the sum of squared residuals: the following equation and rearragen to solve for the coeffeint estimates.
-OLS produces a regression line model in which minimises the sum of the squared residuals therefore produces estimates of the coefficients and intercepts.
Causes of heteroskedasticity
-Aggregated data such as industry/country level data.
-Omitted variables: the error term contains unobserved independent variables which necessarily do not have constant variance.
-Misspescication of the form of the regression: error term may be multiplative rather than additivae, fro example if multiplative impact is greater.
-Extreme outliers
proxy vafriable
-Proxy variables refer to variables that are easily measured that are included within a model to act as a variable that cannot be easily measured/no data for it. A proxy variable is one that is hypothesized to be linearly related to the missing variable.
-eg iq as a proxy for ability
Why would a natural log of the dependent variable be taken
To reduce the influence of outliers
To interpret marginal effects as elasticities
To improve the distribution of the residuals
Linearise an economic model
Which of the following is of greater concern (multiple answers possible):
A. Missing data in the dependent variable
B. Missing data in the independent variable
C. Measurement error in the dependent variable
D. Measurement error in the independent variable
-Missing data in the dependent variable as this can lead to biased results thorough endogenous sample selection.
-Measurement error in the independent variable leads to potential correlation with the error term thus ZCM violated.
Interpretation of standardised BETA coeffient, labelled beta in stata
A one standard deviaiton increase in independent variable leads to a BETA SD increasr in dependet variable
What are unstandardised and standardised (BETA) used for
-Unstandardised used to interpret effect of x on y
-Standardised used to compare the effect of different x’s on y (eg) which has a bigger effect).
Consequence of using a poor proxy in a regression model
Have no impact on the bias on and increase the standard errors of the estimator for the independent variable
Advantages of including proxy variables
Unbiased coefficients for the variable of interest
Valid standard errors for the variable of interest
Correct statistical inference (e.g., t-tests, F-tests)
Precise R2
-Instrumental variable estimation can be used under three specific conditions:
-Instrument variable is a variable that is related to an explanatory variable x, but does not directly affect the dependent variable y, only indirectly through x (indirect effect on dependent variable: z effects x → effects y). Used to remove endogeneity
in a regression
-The IV must be correlated with the explanatory variable it is instrumenting (cov(z,x) != 0)
-The IV must be uncorrelated with the error term,(Cov(z,u)=0).
-It must not be in the main regression equation already.
breasuh pagan and whites test method