Metrics Flashcards

1
Q

Multivariate linear regression assumptions

A

-MLR1. Linear in parameters, the relationship between explanatory and explained variable is linear + an error term.

-MLR2. Random sampling, a random sampling method is used to estimate population parameters.

-MLR3. No perfect collinearity, .Independent/explanatory variables cannot be perfectly correlated, example multiples of one another. Multi collienartiy doesnt bias estimators but increases variance.

-MLR 4. Zero conditional mean assumption, states there is no correlation between independent variables and the unobserved variables captured by U. E(u|x1, x2, …, xk) = E(u) = 0. No correlation exists between variables captured by error term and independent variables. MLR 4 holds is all independent variables are exogenous.

-MLR 5. Homoskedasticity assumes the variances of the error term is constant. If this is violated heterosledastity exists within the model.

-The assumptions above, which are used to show the unbiasedness of the multivariate regression model are typically known as the Gauss-Markov assumptions.

-If all. hold, The Gauss-Markov Theorem states OLS estimated parameters are the best unbiased linear estimates BLUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe heteroskedaskitiy and consequences surrounding it

A

-Heteroskedacitiy occurs when there is non constant variance within the error terms in a regression model.

-Consequences:
-Heteroskedasticity doesn’t bias the estimated coefficient values, however estimates will be inefficient (higher SE’s)
-Incorrect statistical inference, presence of heteroskedacitiy leads to the SE’s being incorrectly estimated/interpreted.
-Result in incorrect inference due to inflated T and F statistics. More likely to mistakenly reject H0.
-Results Hypothesis tests based on standard errors will be biased. incorrect conclusions about potential causal relationships between variables, interpretation is biassed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Interpretation of R^2

A

-R^2 is a measure of goodness of fit, measures the proportion of variance in the dependent variable explained by the independent variables/OLS regression. Measures how well the estimated regression fits the data.

-Never decreases as more independent variables are added.

Eg) R^2 of 0.66 = 66% of the variance in the dependent variable is represented/accounted for by the estimated regression/independent variables. ADD context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe OLS method, ie how OLS estimators are obtained

A

-Method of OLS is used to find the sample estimates of the population coefficients for the independent variables and produce a linear relationship between this variables.

-A residual is defined as the difference between the actual values recorded and the estimated/fitted values (values predicted by the model)

-The OLD method minimises the sum of squared residuals: the following equation and rearragen to solve for the coeffeint estimates.

-OLS produces a regression line model in which minimises the sum of the squared residuals therefore produces estimates of the coefficients and intercepts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Causes of heteroskedasticity

A

-Aggregated data such as industry/country level data.

-Omitted variables: the error term contains unobserved independent variables which necessarily do not have constant variance.

-Misspescication of the form of the regression: error term may be multiplative rather than additivae, fro example if multiplative impact is greater.

-Extreme outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

proxy vafriable

A

-Proxy variables refer to variables that are easily measured that are included within a model to act as a variable that cannot be easily measured/no data for it. A proxy variable is one that is hypothesized to be linearly related to the missing variable.

-eg iq as a proxy for ability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why would a natural log of the dependent variable be taken

A

To reduce the influence of outliers
To interpret marginal effects as elasticities
To improve the distribution of the residuals
Linearise an economic model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which of the following is of greater concern (multiple answers possible):
A. Missing data in the dependent variable
B. Missing data in the independent variable
C. Measurement error in the dependent variable
D. Measurement error in the independent variable

A

-Missing data in the dependent variable as this can lead to biased results thorough endogenous sample selection.
-Measurement error in the independent variable leads to potential correlation with the error term thus ZCM violated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Interpretation of standardised BETA coeffient, labelled beta in stata

A

A one standard deviaiton increase in independent variable leads to a BETA SD increasr in dependet variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are unstandardised and standardised (BETA) used for

A

-Unstandardised used to interpret effect of x on y
-Standardised used to compare the effect of different x’s on y (eg) which has a bigger effect).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Consequence of using a poor proxy in a regression model

A

Have no impact on the bias on and increase the standard errors of the estimator for the independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Advantages of including proxy variables

A

Unbiased coefficients for the variable of interest
Valid standard errors for the variable of interest
Correct statistical inference (e.g., t-tests, F-tests)
Precise R2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

-Instrumental variable estimation can be used under three specific conditions:

A

-Instrument variable is a variable that is related to an explanatory variable x, but does not directly affect the dependent variable y, only indirectly through x (indirect effect on dependent variable: z effects x → effects y). Used to remove endogeneity
in a regression

-The IV must be correlated with the explanatory variable it is instrumenting (cov(z,x) != 0)
-The IV must be uncorrelated with the error term,(Cov(z,u)=0).
-It must not be in the main regression equation already.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

breasuh pagan and whites test method

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly