Pensum Flashcards

1
Q

What are the steps in empirical analysis?

A
  1. Careful formulation for each research question (RQ)
  2. Construct economic model
  3. Turn into econometric model
    You now have an econometric model. The model is a outcome of your variable coiches, hypothesis development, data gathering and estimation of model parameters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Cross-Sectional data?

A

Cross-sectional data is a sample on different entities, for example firms, households, companies, cities, states, and countries that are observed at a given point in time or in a given period.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Time-series data?

A

Data for a single entity (firms, households, companies, cities, states, countries) collected at multiple time periods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Panel Data?

A

Also called longitudinal data, are data for multiple entities in which each entity is observed at two or more periods.

Balanced & Unbalanced.

Pooled OLS: If individual effect does not exist. Does not take time-specific effects and variation across entities into account

Fixed Effects: 1) Control for unobserved variables that vary across entities but not over time, and 2) time specific effects that don’t vary across entities.

Can control for biases that control across entities, but not over time. For example, if you are analyzing Norwegian exports to the EU region, this variable can control for the French price sensitivity, which might be different to Polands.

You can also control for time specific effects. From the last example, if there EU issues a law, this will affect all of the buyers (not vary across entities).

Random Effects:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between cross-sectional, time-series and panel data?

A

Cross sectional data consists of multiple entities observed at a single time period.
Time-series data consists of a single entity observed at multiple time-periods
Panel data consists of multiple entities over two or more time-periods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe a simple regression model

A

OLS chooses the regression coefficients with estimates that are as close as possible to the observed data, where closeness is measured by the sum of squares mistakes made in the predicting Y given X. It gives us estimates and extends this to a linear regression.
y is the linear regression model with one single regression, in which y is the dependent variable and X is the independent variable or the regressor. The first part of the equation, B0 + B1*x is the population regression line or the population regression function. This is the relationship that holds between y and x on average, over the population.

The intercept β_0 and the slope β_1 are the coefficients of the population regression line, also known as the parameters of the population regression line. u is the error term. In context, u is the difference between y and its predicted value.

B1 MEASURES THE MARGINAL EFFECT ON Y FOR A UNIT CHANGE IN X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you estimate the coefficients in OLS?

A

Finding the OLS is about finding the predicted value of Y which minimizes the total squared estimation mistakes. This is also called an estimator. An estimator is a function of a sample of data to be drawn randomly form a population. Given estimates β ̂_0,β ̂_1 of β_0,β_1, we can predict y with y ̂

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a linear model?

A

Linear model means that the change in y is independent from the level of x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain Standard Deviation and Variation

A

Both Standard Deviation and Variation measures the “spread” of s probability distribution. The Variation is measured in squared unites, while standard deviation is the square root of this number.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Casual Effect

A

Casualty means that there is a specific action that leads to a specific measurable consequence. For example, there might be a correlation between people eating apples and car accidents. The correlation is probably random, and eating apples will probably not reduce the chance of getting in a car accident.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the difference between Experimental data and observational data

A

Experimental data comes from an experiment that is designed to investigate the casual effect. Observational data is obtained by measuring actual behaviour outside of an experiment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Sample Space and events

A

Sample space is the set of all possible outcomes. An event is what gives the outcomes. So one event might have a huge sample space; lots of things can happen from that event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Probability Distribution of a random variable

A

The probability distribution lists all possible values for the variable and the probability that each value will occur. These probabilities sum to 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Joint probability and distribution

A

Joint probability is the probability of two events happening together (think venn-diagram). The joint distribution is the probability that X and Y take on certain values. Lets say that X is 1 when its raining and 0 when its not. Y is 1 when there is more than 10 degrees outside and 0 otherwise. The joint distribution of this is the probabilities of how these two scenarios happen, with 4 different outcomes. Each outcome has a probability and summed together they give a value of 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Marginal probability distribution

A

Just another name for its probability distribution. Term is used to distinguish the distribution of Y alone from the joint distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Conditional Distribution

A

The distribution of a random variable Y conditional on another variable X taking on a specific value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Conditional Expectation

A

The mean of the conditional distribution of Y given X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Law of iterated Expectations

A

The mean height of adults is the weighted average of the mean height for men and the mean height for women, weighted by the proportions of men and women. Mean of Y is the weighted average og the conditional expectation of Y given X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Covariance

A

A measure to which extent two random variables move together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the Standard Error in a regression?

A

The standard error of a regression estimates the standard deviation of the error term in the regression. it does this by measuring the mean distance between the observed value, and the value on the regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Kurtosis

A

Kurtosis is how much mass the distribution has in its tails, and is therefore a measure of how much of the variance of Y that arises from extreme values. Extreme values are called outliers. The greater the kurtosis of a distribution is, the more likely it is to have outliers.

The kurtosis of a distribution is a measure of how much mass is in its tails and therefore is a measure of how much of the variance of Y arises from extreme values. BTW: An extreme value of Y is called an outlier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Skewness

A

Skewness can be quantified as a representation of the extent to which a given distribution varies from a normal distribution. A normal distribution has a skew of zero, represented with equal weight on each tail.
If you are measuring height, you might get a mean of 172 with the tails being equally weighted.

If you are measuring income for people working 100%, few people will have an income under 300K. From 300K to 600K, there will probably be a steep increase. From 600K and to infinity, there will be fewer and fewer people, and the curve will be less and less steep. This means that we get the “long tail” on the right side. “long tail” on right side can be called a “positive skew”, so we can say that the distribution is positively skewed.
If we have an easy exam, and a lot of people get A’s or B’s, we will have a negative skew. The long tail will be on the left side, and slowly increase until it hits C or B. From there it will go steeply up. ‘

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

I.I.D

A

Independent and Identically distributed

Independent: The result from one event does not have any impact on the other event. So if you roll two dices, the result you got on the first dice does not affect the sum you will get on the second.

Identically: if you flip a coin (heads/tails) each throw gives you a 50/50 chance. The probability does not change over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Chi-Squared

A

DISTRIBUTION:
The distribution is asymmetrical, with a mean of zero and a standard deviation of one. It is positively skewed. It can be tested on categorical variables, which are variables that only falls into one category (male vs female etc.)
Chi-squared tests can be used when we:
1) Need to estimate how closely an observed distribution matches an expected one
2) need to estimate if two random variables are independent.

GOODNESS OF FIT:
When you have one independent variable, and you want to compare and observed frequency to a theoretical. For example, does age and car accidents have a relation?
H0: no relation between age and var accidents
HA: There is a relation between age and car accidents
Chi-Squared value that’s greater than our critical value implies that there is a relation between age and car accident, hence reject the hull hypothesis. It means that there most likely is a relation, but does not tell us how large that relation is.
Another example is if you flip a coin 100 times. You would expect it to get 50/50 head/tails. The further away from 50/50, the less goodness of fit.
Tests how well a sample of a data matches the known characteristics at the larger population that the sample is trying to represent. For example, the x^2 tells us how well the actual results from 100 coin flips compare to the theoretical model which assumes 50/50. The further away from 50/50, the less goodness of fit (and more likely to conclude that this is not a representative coin).

TEST FOR INDEPENDENCE:
Categorical data for two independent variables, and you want to see id there is an association between them.
Does gender have any significance on Driving test outcome? Is there a relation between student gender and course choice? Reasearcher collect data and compare the frequencies at which rate male and female students select among the different classes. The x^2 for independence tells us how likely it is that random chance can explain the observed difference.
P-value smaller than 0,05: Chi-square value bigger than critical: there is some relation in gender and driving test scores. Reject H0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Normal Distribution

A

Normal Distribution is defined by its bell shape, where both sides are equally weighted. It has a mean of 0, a standard deviation of 1 and a kurtosis of 3. Normal distributions are symmetrical.

  • bell shape
  • both sides equally weighted
  • mean of 0
  • std of 1
  • kurtosis of 3
  • no skewness
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Student t

A

Similar to the normal distribution which has […..]. The difference is that this one got heavier tails, or in other words, a greater kurtosis. This leads to more variance of Y from outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

F Distribution

A

If you have a chi-squared distributed variable and you divide it by another chi-squared variable, we have a F-distribution. The F-distribution is a probability density function that is used especially in analysis of variance and is a function of the ratio between two independent random variables each of which has a chi-square distribution and is divided by its number of degrees of freedom.

A intuitive explanation:

Lets say that you want to study a new vaccine. Group A gets put on 10 mg, B on 5 mg and C on placebo.

The Mean Square Error (MSE) is the variance of group A plus variance of group B plus variance of group C.
You take the variance for each group and find its mean. You then add the 3 means together and divide it by 3, to get the mean of the whole sample. This is the Mean Square Error (MSE).

You then find the Mean Square between the groups (MSB). This is found by multiplying each groups sample mean by the number of items in each group.
The F-statistics is then the ratio of these two variances: F = MSB/MSE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

The central limit theorem

A

The central limit theorem states that, under general conditions, the distribution of a sample approximates a normal distribution as the sample size becomes larger. Each variable themselves in the distribution can be random, but the more we add, the closer to normal distribution we get, and the closer we get to the real population distribution.

With higher N you have a higher probability of having normality and consistency. Consistency is the probability that the estimate is close to the true value when the sample increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Heteroscedasticy

A

When the error term doesn’t have a constant variance.
The error term u is homoscedastic if the variance of the conditional distribution of u given X is constant and does not depend on X. Otherwise, the error therm is heteroskedastic.

Homosced: The error has a constant variance
Heterosced: The error has not a constant variance.

The distribution of the errors u is for various values of X. imagine a plot where the variance is large and one where it is small and compact.
The coefficient is unbiased. HETEROSCED LEADS TO BIASED STANDARD ERRORS, NOT UNBIASED COEFFICIENTS. Because of that, we can use clustered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Multicollinearity

A

When an independent variable is highly correlated with one or more of the other independent variables in a multiple regression equation. This can affect B1 and B2 etc. as we don’t get their real values.

Can use Volatility inflation factor to test if there is multicollinearity. A rule of thumb is that there is multicollinearity if VIF > 10. The solution to this problem is simply just to drop the variable

LEADS TO UNBIASED STANDARD ERRORS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

P-value

A

P-value is the smallest significance level at which the null hypothesis could be rejected.

The p-value is the probability that we observe test results as extreme as the
ones we have observed, given that the null hypothesis is true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Binary dependent variable

A

Linear probability model (LPM), the predicted probability that y=1. Our dummy variables Is the dependent variable. One disadvantage is that LPM says that a change in the probability given a change in x is the same for all values of x.

PROBIT:
Standard normal cumulative distribution. Models the probability that y=1 using cumulative normal distribution. So we get the marginal probability depending on the value of X + a probability between 0 and 1.

LOGIT:
The probability that y=1 using cumulative standard logistic distribution. The Beta will be different from the PROBIT. The coefficient is given in log-odds form, which is the logarithm of the odds ratio.
It is common to look at both these models marginal effect, which is the effect on the dependent variable given a small change in the regressor.

MAXIMUM LIKELYHOOD
(MLE). Likelihood function is the conditional density of y1..yn given x1….xn treated as a function of the unknown parameters, alpha and Beta. We maximize the probability of ebserving the data given the assumed model. The MLE value is the value of alpha, beta1…Bn that best describes the full distribution of the data. In large samples, the MLE will be consistent, normally distributed and efficient, as it has the smallest variance of all estimators. 

MEASURES OF FIT
Logit, Probit.
R2 makes no sense here, not wven for LPM. So we use two other measures:
1.the fraction correctly predicted = fraction of y’s for which the predicted probability is > 50% when y is 1, or is <50% when y is 0.
2. the pseudo R2 (McFadden R2) measures the improvement in the value of the log likelihood, relative to having no x’s.

HYPOTHESIS TESTING
Use usual t-test and confidence intervals
For joint hypothesis, use likelihood ratio test that compares likelihood functions of the restricted and unrestricted models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What are the differences between a one-sided and a two-sided test?

A

Only use one-sided test when there is a clear reason for doing so. For example from economic theory or empirical evidence. Has more statistical pwer to detect an effect in one direction than a two-tailed test. Will occur when effects only can exist in one direction, or if the researchers only care about one direction (not recommended tho).
The difference lies in the alternative hypothesis. In one case you are testing if B1 is only greater or only lower than 0. In the other case, you are testing with the possibility of both scenarios.
Same nullhypothesis, different alterniative hypothesis. Construction of the t-statistics is the same. Only difference is how you interpret the t-statistics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is a two-tailed test and how do you perform it?

A

If we want to test if the mean is statistically and significantly equal to x, we can do a two-sided hypothesis. In other words, we want to test if B1 = 0. That gives us:
H0: B1 = 0
HA: B1 is not 0
Think of how normal distribution looks. If we use a significance level of 0.05 (or alpha = 0,05), the two tailed test will test the probability with an alpha of 0,025 on both tails.

  1. First compute the standard error of Y, which is an estimator of the standard deviation of the sampling distribution of Y.
  2. Compute the t-statistics
  3. Compute the p-value. P- value is the smallest significance level at which the null hypothesis could be rejected.
  4. Or use t-statistics: Reject H0 if the t-statistic is larger than absolute value of 1,96

Alternatively to the third step, you could compare the t-statistics to the critical value appropriate for the test with its significance level, that is the absolute value of 1,96 if you are testing on a 5%. Reject H0 if the t-statistic is larger than absolute value of 1,96.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is a one-sided test and how to perform it?

A

In a one-sided test, the alternative hypothesis will be if B1 is either lower or if its higher than for example 0. A one-sided test should only be used when there is a clear reason of doing so. This reason can come from economic theory, your knowledge etc. You now test with an alpha of 0,05 on one tail. Not with 0,025 on each tail.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Confidence interval for a regression coefficient

A

For B1: the set of values that cannot be rejected using a two-sided hypothesis test with a 5% significance. An interval that has 95% proability of containing the true value of B1.

If we run a regression and create a confidence interval at 95%, the true population would be between the confidence interval 95% of the time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

When is it appropriate to do a Two-Sided Hypothesis?

A
  1. Testing hypothesis about the population mean
  2. Testing hypothesis about the slope B1
  3. Reporting regression equations and application to test scores
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

How to test when X is an Binary/Dummy variable

A

Because D can only take two values, there is no “line”, so no sense to talk about a slope. Therefore, we refer to B1 as the coefficient on D. The best way to consider the regression is to compare the regression when D = 0 to when D = 1. The tests are the same as with normal regressions. The exception is when B1 (the coefficient to D) is 0. That’s why we can test the nullhypothesis that B1 is 0. We divide B1 on its standard error to look at its t-statistics. If it is higher than 1,96, we think that it is not 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Explain R2, ESS, SSR

A

The R-squared (R^2) represents the proportion of the variance for a dependent variable. It ranges between 0 and 1, but will usually fall in some place in between the extreme values. R2 tells us how much of the variance of y that is explained by our regressors. When you add one regressor, R2 will always increase. Adjusted R2 will not always increase, as it penalizes adding more regressors.

If β_1=0, X explains nothing of the variance in Y, and R-squared should be 0.

The Explained Sum of Squares (ESS) is the sum of the differences between the predicted value and the mean of the dependent variable

The Sum of Squared Residuals (SSR) is the difference between the observed value and the predicted value. We can interpret the residual as the remaining, or the unexplained.

The Total Sum of Squares is the difference between the observed dependent variable, and its mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Explain what the standard error of the regression is

A

The standard error of the regression (S), also known as the standard error of the estimate, represents the average distance that the observed values fall from the regression line. The Standard Error of the Regression estimates the standard deviation of the regression error u. It does this by measuring the mean distance between the observed value, and the value on the regression line.

The Standard Error of the regression can be used to make confident intervals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What is the Least Square Assumptions?

A

Assumption 1: The Error Term has Conditional Mean of Zero

No matter which value we chose for X, the error term u must not show any systematic pattern and must have a mean of 0. In other words, the OLS regression will on average be equal to zero. This assumption still allows for over and underestimations of Y, but the OLS estimates will fluctuate around Y’s actual value

Assumption 2: For all n are Independently and Identically Distributed

This is a statement of how the sample is drawn. All n need to be Independently Distributed. This means that the outcome of the first value in the sample, can not affect on another. The Distribution is random, and not one of the values are affecting one another – they have all their individual independence. Identically Distributed means that the probability of any specific outcome is the same. For Example, if you flip a coin 100 times, the probability of heads will always be 50/50, and will not change throughout the experiment.

Assumption 3: Large Outliers Are Unlikely

X and Y have finite kurtosis, as several outliers can give wrong estimations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What is heteroscedasticity and homoscedasticity?

A

The error term u is homoscedastic if the variance of the conditional distribution of u given X is constant and does not depend on X. Otherwise, the error therm is heteroskedastic.

Homosced: The error has a constant variance
Heterosced: The error has not a constant variance.

The distribution of the errors u is for various values of X. imagine a plot where the variance is large and one where it is small and compact.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What is goodness of fit/measures of fit?

A

When you have estimated a linear regression, you will wonder how good the regression line describes the data. The R2 and the standard error measure how well the OLS regression line fits the data.

The R-squared (R^2) represents the proportion of the variance for a dependent variable. It ranges between 0 and 1, but will usually fall in some place in between the extreme values.

If β_1=0, X explains nothing of the variance in Y, and R-squared should be 0.

The Explained Sum of Squares (ESS) is the sum of the differences between the predicted value and the mean of the dependent variable

The Sum of Squared Residuals (SSR) is the difference between the observed value and the predicted value. We can interpret the residual as the remaining, or the unexplained.

The Total Sum of Squares is the difference between the observed dependent variable, and its mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What is the Standard error of the regression?

A

The Standard Error of the Regression estimates the standard deviation of the regression error u. It does this by measuring the mean distance between the observed value, and the value on the regression line.
The Standard Error of the regression can be used to make confident intervals.

SSR/(n-2)

n-2 because it is correcting a bias with two regressor coefficients that were estimated (β_1 and β_0 )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What are the Least squares assumptions?

A

Assumption 1: The Error Term has Conditional Mean of Zero

No matter which value we chose for X, the error term u must not show any systematic pattern and must have a mean of 0. In other words, the OLS regression will on average be equal to zero. This assumption still allows for over and underestimations of Y, but the OLS estimates will fluctuate around Y’s actual value
In AM Football, the score is given by: Score = 6 * Touchdown + 1Extrapoints + 3 Field Goals +2safeties
If you ran the regression: Score = b1 * Touchdown + b2
fieldgoals + e, b1 would be larger than the value of 6. The error term contains biases.

Assumption 2: For all n are Independently and Identically Distributed

This is a statement of how the sample is drawn.
All n need to be Independently Distributed. This means that the outcome of the first value in the sample, can not affect on another. The Distribution is random, and not one of the values are affecting one another – they have all their individual independence.
Identically Distributed means that the probability of any specific outcome is the same. For Example, if you flip a coin 100 times, the probability of heads will always be 50/50, and will not change throughout the experiment.
If the sampling is random, then it is representative for the population. As an example, you don’t only go to Texas if you want to research the average American income.

Assumption 3: Large Outliers Are Unlikely

X and Y have finite kurtosis, as several outliers can give wrong estimations. Large outliers will mess up our distribution and make OLS misleading

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

How can you get rid of outliers?

A

Winzoring: Transforming extreme values.
Triming: deleting extreme values

For example, you can winzorize/trim at 5% interval. This will affect 2,5% of the empirical distribution to the right and left.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Properties of Beta0 and Beta1

A

UNBIASED:
The expectation of beta0 is the real beta0 and the expectation of beta1 is the real beta1. The estimated coefficient might have a smaller or larger value, much depending on our sample size. However, on average, they will be equal to its true value (relationship between x and y).If 10 people estimate B1, they all will probably miss my some value. If 100 more does it we get closer to the real value. And so it goes on.

CONSISTENCY:
The probability that the estimate is close to the true population will increase by increasing the sample size.

NORMALITY:
If n Is large enough, we can use critical values for a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What are the assumptions in multiple regression?

A

1: Error Term has a conditional mean of zero
2: I.I.D
3: Large outliers unlikely
4: No perfect multicollinearity

Perfect multicollinearity is when one of the regressors is an exact linear function of the other regressors. This includes a variable that is included twice in the regression, or a dummy variable trap. Perfect multicollinearity occurs if two or more regressors are perfectly correlated. In reality, we will not often see two regressors that are perfectly correlated. That is why it most often occurs from the dummy trap or by including the same regressor twice. Can use Volatility inflation factor to test if there is multicollinearity. A rule of thumb is that there is multicollinearity if VIF > 10. The solution to this problem is simply just to drop the variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What is multicollinearity, and how can we test for it?

A

Perfect multicorr uccurs if two or more regressors are perfectly correlated. In reality, we will not often see two regressors that are perfectly correlated. That is why it most often occurs from the dummy trap or by including the same regressor twice. Can use Volatility inflation factor to test if there is multicollinearity. A rule of thumb is that there is multicollinearity if VIF > 10. The solution to this problem is simply just to drop the variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

What are the problems and solutions with heteroscedasticy?

A

PROBLEM:
- Coefficients are unbiased and consistent
- Standard errors are biased
- OLS t statistic does not follow a t distribution
- (Fail to) reject H0 too often or not often enough
SOLUTION
- Use heteroskedasticity robust standard errors
- Prudent to assume errors are heteroskedasticity unless there is a compelling reason
Implementation see Lab example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

What is a dummy trap?

A

The dummy trap occurs when you add a dummy variable for each redundant. For example, adding a dummy variable for both male and female. Since the value of male can be predicted by excluding one of these dummy’s, we will experience perfect multicollinearity by adding both.

52
Q

Whats the difference between biases and heterosced + multicorr?

A

They lead to violation of LS.1, hence the coefficient is biased. In contrast, heteroscedasticy and multicollinearity lead to biased standard errors, not biased errors.

53
Q

What is omitted variable biases?

A

Does NHH provide students with the skills that are rewarded in the labour market? Have two variables, salary and NHH. BUT: salary does also depend on intelligence. NHH has noe influence on the IQ of a person, but NHH selects student based on IQ. Hence, the IQ can be seen as an omitted bias.

In a population X, where Z determine Y. but we only estimate the model with X. that means that the influence of Z gets in our error term.

As long as there is a non-zero covariance between X and Z, then the zero mean assumption is violated.

54
Q

Simultanely bias

A

Happens when one or more of the independent variables are jointly determined with the dependent variable. Supply/demand a good example.

Quantity and price
Investments and Productivity
Sales and advertisement
This leads to violation of LS.1, hence our coefficient is biased.

55
Q

Sample Selection bias

A

A type of bias that arises by choosing non-random data for statistical analysis. For example when people volunteer for a study. Those who volunteer might share the same characteristics.

For example, you want to study the context between veganism and undergraduate students. You send out a survey to the students in class of art and culture. Because this is not a random draw sample, it is not representative for the target population. These students might be more liberal etc.

56
Q

Measurement error in independent variable

A

Data is often measured with error. For example, this can be
Reporting error
Coding error
Estimation error

57
Q

Explain a non-linear function

A

The effect one unit x has on y depends on the value of x. in other words, its marginal effect is not constant.

58
Q

What types of interactions between independent variables do we have?

A

Interaction between two binary variables
Interaction between binary and continuous variable
Interaction between two continuous variables

59
Q

Explain interactions between two binary variables

A

D1 and D2 is dummy variables. B1 is the effect of changing D=0 to D=1. In this specification, this effect does not depend on the value of D2.
To allow the effect of changing D1 to depend on D2, include the “interaction term” D1 * D2 as a regressor

So if our regression is y = b0 + b1D1 + b2D2 + b3 * D1 * D2 + u:
B3 is the incremental effect o changing D1 from 0 to 1 when D2 = 1

In short terms: if we have y = b0 + b1D1 + b2D2, the value of b1 is only dependent on the value og D1. What if it in reality also depends on the value of D2? That is why we fix it to:
y = b0 + b1D1 + b2D2 + b3 * D1 * D2 + u:
b3 will now capture the last effect, which is how D1 and D2 will interfer.

60
Q

Interactions between binary and continuous variables

A

y = b0 + b1x + b2D + u
y = b0 + b1x + b2D + b3D*x + u
To allow the effect of changing x to depend on D, include the “interaction term” D * x as a regressor

61
Q

Interactions between binary and continuous variables

A

y = b0 + b1x + b2D + u
y = b0 + b1x + b2D + b3D*x + u
To allow the effect of changing x to depend on D, include the “interaction term” D * x as a regressor

62
Q

Binary and continuous interactions: two regression lines

A

y = β0 + β1x + β2D + β3D × x + u
When Di = 0: y = β0 + β1x + u.
When Di = 1: y = (β0 + β2) + (β1 + β3)x + u

By playing with specification of the regression equation we can have regression lines that are different: Both in slopes and intercepts (current specification). In slopes only (drop β2D from regression equation). In intercepts only (drop β3D × x from regression equation).

We are now allowing x to depend on D. The value of x might be different based of D. we get the different alternatives:
B0 + b1 + b2D: this allows for a different intercept, but has the same slope
B0 + b1x + d2D + b3(x * D): allows for different intercept and slope
B0 + b1x + b2 (x*D): same intercept, allows for different slope

Steps:

  1. Test if the two lines in fact are the same. This is done by F-statistical testing with joint hypothesis. Test if b2 and b3 are 0. If so, that leaves us with b0 + b1x which is the same slope and same intercept as the original
  2. Test if the two lines have the same slope

If so, b3(x + d) = 0, because b3 = 0.
So divide b3 on its standard error and find its t-statistics.
3.Test id they have the same intercept
y = b0 + b1x + b2D + b3(x * D)
y = b0 + b1X + B2D
To have the same intercept, b2D must equal 0, leaving us with intercept b0 and not b0 + b2. Find b2’s t-stat by dividing its value on its standard error.

63
Q

Interpreting coefficients and Hypothesis test

A

y = β0 + β1x + β2D + β3D × x + u
Compute y before and after change in x
Before: y = β0 + β1x + β2D + β3D × x.
After: y + 4y = β0 + β1(x + 4x) + β2D + β3D × (x + 4x).
Substract “before” from “after”.
4y = β14x + β3D × 4x. I 4y 4x = β1 + β3D.
The effect of x depends on D.
β3 = increment to the effect of x, when D = 1.

y = β0 + β1x + β2D + β3D × x + u
The two regression lines have the same slope H0: β3 = 0.
The two regression lines have the same intercept H0: β2 = 0.
The two regression lines are the same H0: β2 = β3 = 0.

64
Q

Interactions between two continuous variables

A

y = β0 + β1x1 + β2x2 + u
x1 and x2 are continuous variables.
As specified, the effect of x1 does not depend on x2.
As specified, the effect of x2 does not depend on x1.
To allow the effect of x1 to depend on x2, include the “interaction term” x1 × x2 as a regressor:
y = β0 + β1x1 + β2x2 + β3x1 × x2 + u

65
Q

Explain the interpretation of the different log equations

A

Lin-log; 1% increase in X gives a 0,01 * b1 increase in y
Log-lin: one unit increase in x gives 100% * b1 increase in y
Log-Log: 1 % increase in x gives b1% increase in y.

66
Q

What is nonlinear functions

A

Regressions where one unit change in x’s impact on y will differ dependent on the value of x. in other words, it does not have a constant marginal cost. the relation between x and y is not linear. One unit change from 5 to 6 will have a different impact than 10-11.

67
Q

What is a general strategy for modelling a nonlinear function

A

1.use your economic knowledge
2.estimate a regression using OLS
3. test if non-linear is better than linear. Can be done by t-stat and f-stat
4-plot the nonlinear function. Does the regressor describe the data well? Does the regression fit the scatterplot?
5.estimate effect on Y of a change in x

68
Q

What is polynomials

A

Polynomials is made by having x with powers. Most usually 2, 3 or 4 powers. A power of 2 makes a quadratic function while power of 3 makes a cubic function. The amount of degrees/powers can be denoted as r. r degrees gives r-1 bends.

69
Q

What is the steps with polynomials

A

Find out how many powers that is suitable for your regression by hypothesis testing. Some times you might know what powers that are most suitable by economic theory or by just looking at the scatterplot.

  1. pick a maximum value for r, lets say 3
  2. test for non-linear versus linear. Can test for several powers by using F-stat. The nullhypothesis will be that b2 = 0, b3 = 0 etc. The alternative hypothesis is that at least one is not 0.
  3. can then test if power 3 is suitable. Use t-stat to see if the beta coefficient is 0. H0: b = 0, HA: b is not 0.
  4. if you do not reject h0, eliminate one power from the beta
  5. continue until you find the highest power that is statistically significant.
70
Q

How to test cubic vs. linear and cubic vs quadratic?

A

Cubic versus linear: F-test

Cubic versus quadratic: t-test

71
Q

Explain the quadratic model

A

The quadratic model has power of 2. That means that it will have a turning point. This leaves us with a miximum value. All results after this maximum value should not be interpreted as they are not reliable. So we are given a specific range to exploit.

72
Q

Linear probability model

A

Natural starting point is the linear regression model with a single regressor. In the LPM, the predicted value of y is interpreted as the predicted probability that y = 1, and β is the change in that predicted probability for a unit increase in x

y = α + βx + u
The LPM models Prob(y = 1|x) as a linear function of x

73
Q

Why is the linear regression model sometimes called the linear probability model?

A

When y is binary, the linear regression model y = α + βx + u is called linear probability model because Prob(y = 1|x) = α + βx

74
Q

Advantages and disadvantages with LPM?

A

Advantages:
Simple to estimate and interpret
Inference the same as for multiple regression (note: LPM is inherently heteroskedastic)
Disadvantages:

A LPM says that the change in the predicted probability for a given change in x is the same for all values of x, but that does not always make sense
Also, LPM predicts probabilities that can be < 0 and > 1
Overall:
we need a non-linear model: probit and logit regression

75
Q

Logit versus Probit

A

The main reason is historical: logit is computationally faster and easier, but that does not matter nowadays

In practise, logit and probit are very similar
Empirical results typically should not hinge on the logit versus probit choice

76
Q

How do we interpret the marginal effects in this chapter? (logit, probit)

A

Marginal effect: the effect on the dependent variable that results from changing an independent variable by a small amount

Probit and logit are non-linear functions
Hence, the ultimate effect of one unit change in a regressor (x) on predicted probabilities is different from different values of x

It is common to report marginal (or partial) effects instead of coefficients
Marginal effect at the means I Average marginal effect

77
Q

Maximum Likelihood

A
The likelihood function is the conditional density of y1, . . . , yn given x1, . . . , xn treated as a function of the unknown parameters α, β 
Maximizing the probability of observing the data given the assumed model 
For probit (with one explanatory variable):

The maximum likelihood estimator (MLE) is the value of α, β1, . . . , βk that maximinize the likelihood function
The MLE is the value of α, β1, . . . , βk that best describe the full distribution of the data
In large samples, MLE is: consistent I normally distributed, efficient (has the smallest variance of all estimators)

78
Q

Measures of Fit for Logit, Probit

A

The R 2 and Adjusted R2 do not make sense here (even for LPM). So, two other specialised measures are used
The fraction correctly predicted = fraction of y’s for which the predicted probability is > 50% when yi is 1, or is < 50% when yi is 0

79
Q

Hypothesis testing for Logit Probit

A

Usual t-tests and confidence intervals can be used by testing h0: b=0.

For joint hypothesis test, use likelihood ratio test that compares likelihood functions of the restricted and unrestricted models. For joint hypothesis, there is for example H0: B0 = 0, B1 = 0, several ones you test. Because there is several, you should use F-statistics because it is designed for testing several at a time. You run one regression under the null hypothesis. You then compare the fits of this regression – its R2 – to the fitted one. If the unrestricted fits better, reject h0.

80
Q

What is panel data

A

Panel data is suitable if you have data with different entities that go over time. Pooled OLS will ignore the individual dimensions for each entity and their time. It will assume that there are no variation in the data. Random effects can be used when there is random random variation. This model allows us to think that each entity can have their own time specifics that differ. Balanced data has no missing periods, while unbalanced has that.

81
Q

Why are panel data useful

A

With panel data we can control for factors that (1) vary across entities but do not vary over time; could cause omitted variable bias if they are omitted; are unobserved or unmeasured – and therefore cannot be included in the regression using multiple regression.

82
Q

What can the fixed effects model control for

A

Z – varies across different entities but do not over time. Z is suppose to correct the model for different omitted variable biases. This variable do not vary over time. Z can for example the the local attitude towards drinking. It can be different in Texas compared to California. Assumes within the entity that the effect does not change. For example, the attitude does not change in Texas, nor in Cali.

Another example can be if studying what price Norway can export for to different countries. For example Z can then cover Frances’ price sensitivity that might be different from what we see in China. It is important to note that France’s or China’s price sensitivity will not change with this example, because z does not take time into account

S – will take care of the time specific effects that do not vary across different entities. This can eg be national laws or national better safety for cars. In the study of norway’s export, it can for example control for the covid-19 lockdown.

83
Q

What are the assumptions?

A

Same as OLS with some small adjustments

  1. Error term must have a conditional mean of zero: so there shall be no interactions between x and u. a good profit one year does not mean anything the next year,
  2. I.I.D: this one is not the same as in a regular OLS. The standard error will not be appropriate to use, because we will use robust ones to find the clustered standard errors. This allows usto have both heteroskedasticy and autocorrelation. The autocorrelation makes sense, because performance one day can affect the entitys performance the net day. This is for WITHIN an entity. Firm A’s performance one day do not affect Firm B, so in that sense it is individually.
84
Q

How can the fixed regression be done?

A
  1. Binary regression
  2. Entity demeanded
  3. First diff specification

First difference specification only works when t=2. In this case you substract one regression from the other, for example numbers from 1988 minus 1982 – and then assume that z has not changed
1 and 2 will give the same results. In 1. You include t-1 dummy variables, all except one, because you don’t want to fall in the dummy trap
Number 2 is the best alternative. Easiest and most appropriate. In a big dataset. In a large dataset, it will be a pain in the ass to handle all the dummies from number 1.

85
Q

How can you do entity demeandet regression in R

A

Use plm-package with the “within” function. Next use “coeftest” and its function “vcovGC”. This will extract the heteroskedasticy and handle the autocorrelation.

86
Q

Why are panel data useful?

A

With panel data you can control for factors that: (1) vary across entities but do not vary over time, (2) could cause omitted variable bias if they are omitted, (3) are unobserved or unmeasured – and therefore cannot be included in the regression using multiple regression.
The key idea: if an omitted variable does not change over time, then any changes in y over time cannot be caused by the omitted variable

87
Q

Describe the differences and equalities in the regressions if you have a entity fixed regression

A

The intercept is unique for each entity, but the slope is the same for all.

Recall that shifts in the intercept can be represented using binary regressors

88
Q

What is Time Fixed Effects regression, and when can it be useD?

A

An omitted variable might vary over time but not across states. This can for example be safer cars or changes in national laws. These produce intercepts that change over time. We use S to find the combined effect of variables with changes over time that are same for each entity.

89
Q

Why do we use clustered standard error?

A

The usual OLS standard errors will in general be wrong, because they assume that the error term is not autocorrelated. The solution for this is to use clustered standard errors. We allow for autocorrelation WITHIN entities. Are also robust for heteroskedasticy within and across entities.

90
Q

What is instrument variables?

A

X is correlated with the error term. Think of X as a two parts: one part that is correlated and one that is not correlated with the error term. We isolate the part that correlates with the error term by using instruments.

One study from USA was on how being in the military affected the future income. One can with good reason believe that people that volunteer for the military comes from poor neighborhoods which is correlated with less future income. Besides that, there is also good reason to believe that people from good neighborhoods have more money and power, hence not joining the military. To fix this bias, there there was used an instrument on people getting drafted to the military. Being drafted to the military is completely random. By using this as an instrument, one was able to split the X into two parts and eliminate the effect that people in the military might come from poor neighborhoods.

91
Q

Endogenity and Exogenity

A

Engogen: Variables that are correlated with the error term
Exogen: Variables that are not correlated with the error term. (but things outside of the model?

92
Q

What are the conditions for a valid instrument?

A

RELVEVANT and EXOGENOUS: The two conditions for a valid instrument:

  1. Instrument relevance: if an instrument is relevant, then variation in the instrument is relevant to the variation in X.
  2. Instrument Exogeneity: Z is correlated with Y solely through its correlation with X.
    Relevant: It is relevant in a way that the IV actually affects X
    Exogenous: Our IV only affect Y through X.
92
Q

What are the conditions for a valid instrument?

A

RELVEVANT and EXOGENOUS: The two conditions for a valid instrument:

  1. Instrument relevance: if an instrument is relevant, then variation in the instrument is relevant to the variation in X.
  2. Instrument Exogeneity: Z is correlated with Y solely through its correlation with X.
    Relevant: It is relevant in a way that the IV actually affects X
    Exogenous: Our IV only affect Y through X.
93
Q

How do we use IV ?

A

Lets call the instrument for Z. if it satisfy the two conditions of relevance and exogeneity, we can estimate B1 by using an IV estimator called two least squares (TSLS). TSLS is calculated in two stages. First stage splits X into two parts; one part that is problematic and might be correlated to the error term, and one other part that is problem-free. The second stage uses the problem free part to estimate B1.

In the first stage you regress x on its instrument that gives X(hatt).

You then put X(hatt) in the regression.

From our example, the intuition is now that we only regress future salary on those who got draftet, thus eliminates the bias that veterans usually earn less.

94
Q

How many instruments can we have? What do we call these models?

A

It is UNDERIDENTIFIED if it has less IV than endogeneous variables. Can not be computed

It is EXACTLY IDENTIFIED if it has the same number of IV’s and endogenous. It can now be computed but cannot be tested. Hence, one would need a good storytelling, economic knowledge to be certain that it is the right one to use.

It is OVER-IDENTIFIED if it has more instruments than endogenous variables. It can now be tested if the instruments are RELEVANT and EXOGENOUS.

95
Q

How do we test the instrument variables?

A

1.First step is to test for relevance:

First we make a regression of X and all of its IV’s.

H0: Instruments do not have any effect on X. If H0 is rejected, the instruments are relevant. A general rule of thumb for this testing is to look at the F-statistics. If it is greater than 10, we reject H0, concluding that the instruments are relevant. If it is lower than 10, it implies that the instruments are weak.

  1. Second step is to test for Exogeneity:

We use a J-test one the error term to see if the instruments are exogenous. If they are exogenous, it means that all of our IV’s has a conditional mean of 0.

The null hypothesis of the J-test is that all of our instruments are exogen variables and that they don’t have any relation to the error term. This gives us a chi-squared distribution with (m – k) numbers of freedom where m is the number of instruments used an k is the number of endogen variables. We then compute an F-test and look at the p-value. A p-value over 0,05 tells us that all of our IV’s are exogenous with 95% certainty. So we want to keep H0.

When we do a J-test, our nullhypothesis is that all of our instrument is exogenous variables. And that they don’t have any relation to the error term. This gives us a chi-squared distribution with (m – k) numbers of freedom, where m is # of instruments and k is # of endogen variables. We then compute an F-test and look the p-value. WE WANT A P-VALUE THAT’S HIGHER THAN 0,05 OR 0,10, BECAUSE THIS TELLS US THAT OUR IV’S ARE EOGENOUS. SO WE WANT TO KEEP THE H0 IN THIS CASE, OR WE WANT TO FAIL TO REJECT THE H0.

96
Q

What is the problem of testing joint hypotheses with t-tests?

A

If we were to run t-tests on all and reject the whole regression if one turned
out significant, the size of the test would depend on the correlation between t1
and t2.

97
Q

What is meant by the size of a test?

A

The size of a test is the actual rejection rate under the null hypothesis

98
Q

What is the point of the F-statistic?

A

We can run joint hypothesis tests. The test is done by measuring the reduced
sum of squared errors by adding the variables. If the sum of squared errors are
reduced enough, we reject H0 (all coefficients are equal to zero)

99
Q

What is a p-value?

A

o The p-value is the probability that we observe test results as extreme as the
ones we have observed, given that the null hypothesis is true.

Probability of type-1 error (the rejection of a true null hypothesis). P-value gives probability
for observing our value, given that the H0 is true. Usually we use 5% significance level,
meaning that in 1/20 times we could randomly get our value even though the H0 is true.

100
Q

What is meant by unbiasedness of an estimator?

A

An estimator whose expected value (the mean of its sampling distribution) is
equal to the population value. Of course, we need to increase our sample size
to be sure that the mean value actually is correct.

101
Q

What is meant by estimator consistency

A

An estimator converges in probability to the correct population value as the
sample size increases (the normal distribution gets tighter and tighter)

102
Q

What is meant by estimator efficiency?

A

A more efficient estimator needs less observations in order to achieve a given
performance. If we can get smaller standard deviations, our estimator
becomes more efficient!

103
Q

What is meant by asymptotic efficiency?

A

For consistent estimators, with asymptotically normal distributions, the
estimator with the smallest asymptotic variance

104
Q

What is meant by asymptotic normality?

A

The sampling distribution of a properly normalized estimator converges to the
standard normal distribution

105
Q

Central limit theorem:

A

The distribution of sample means approximates a normal distribution (also known as a “bell curve”), as the sample size becomes larger. Given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population. Furthermore, all the samples will follow an approximate normal distribution pattern, with all variances being approximately equal to the variance of the population, divided by each sample’s size

106
Q

Degrees of freedom:

A

The number of elements than can vary

107
Q

What are the tests for coefficients and means?

A

T-test and F-test

108
Q

What are the tests for Normality?

A

Jarque-Bera
Shapiro-Wilk
Kolmogorov-Smirnov

109
Q

What are the tests for heteroscedasticy?

A

Goldfeld-Quant
Breusch-Pagan
White

110
Q

What are the tests for serial correlation

A

Durbin-Watson

Breusch Goodfrey

111
Q

What are the test for multicollinearity

A

Volatility Inflation Test

112
Q

What are the test for endogeneity?

A

Hausman test

J-test (IV)

113
Q

What are the test for Binary dependent variable?

A

Likelihood ratio test
Wald test
Score test (Lagrange Multiplier test)

114
Q

What are the tests for stationarity?

A

Dickey-Fuller (full root)
Augmented dickey Fuller
Dickey Fuller test with time trend

115
Q

What are the tests for instrument validity?

A

F-test for relevance

J-test for exxogeneity

116
Q

What are the key assumptions of a regerssion analysis?

A

Main purpose of regression is to take samples from a population and produce estimates in which we can perform inference on the population with. For the inference to be valid, certain assumptions must be satisfied.

  1. Normality
  2. Consistency: As the sample size increases, the estimates produced by the estimator
    “converge” to the true value of the parameter being estimated. Increasing the sample size is allowed because it is seen as increasing “n” closer and closer to the true population size.
  3. Unbiasednes: : A statement about the expected value of the sampling distribution of theestimator. Is not affected by size. Only satisfied when the SR/MR 1-5 are satisfied = Gauss Markov theorem = BLUE estimator. Unbiased equals “forventningsrett” estimator!!
  4. Efficiency: An estimator is efficient if it gets closer to the true parameter more often than other estimators (i.e. it has lower variance around the true parameter) -> BLUE. If BLUE, there exist no other estimators that better explains the true population.
117
Q

Panel Data:

A

Random effects, Fixed effects and First Differences to control for unobserved variables that: (1) change across entities but not over time or (Entity Fixed) (2) change over time but not across entities (Time Fixed)

Random effects can be applied if unobserved factors are uncorrelated with the independent variables, but only then. Less extreme than FE and FD, so could be an appropriate measures in some occasions.

Use of Fixed Effects solves for unobserved heterogeneity and is consistent regardless of the covariance of the unobserved effects and the independent variables are zero or not. Since this covariance very rarely are equal to zero, we often must use FE/FD instead of Random Effects.

All other assumptions are equal in the three approaches.

However, disadvantage by using FD is that we lose one period T (or i) of observations by taking the first difference to get rid of the constant term, and by using FE we ensure that we cannot estimate any variables that are constant over time (or entity). Stricter approaches.
FD and FE is however equal approaches, yielding the exact same result, if T = 2.

Unobserved heterogeneity: existence of unmeasured/unobserved differences between (1) entities or (2) across time, that are associated or correlated with the independent variables. If not corrected for this will lead to contradicting the zero conditional mean through
endogeneity, leads to bias.

118
Q

Binary dependent variable:

A

In models with binary dependent variables, the regression function is interpreted as a
conditional probability function of the binary dependent variable. Three choices: (1) LPM (2)
Probit and (3) Logit. Probit and Logit models allows for non-linear relationship between
regressors and dependent variable.
Assumptions for probit and logit:
• Linear in parameters
• Random sampling
• No perfect multicollinearity
• Zero conditional mean of errors
• Homoskedasticity
Assumptions on parameters:
• Consistency: when sample increase, the estimated B will converge to the real B.
• Unbiasedness: The expected value of the estimated B will be equal to the true B.
Kjøpt av duy kim tran - astudenten - duy_kim@hotmail.com - på Filefora.no
16
R2 has no meaningful interpretation due to the regression line never being able to fit the data
perfectly if the dependent variable is binary and regressors are continuous, as R2 relies on a
linear relationship between X and Y -> Use correctly predicted proportion (“hitrate”) or
PseudoR2 (McFadden), which compares the maximum likelihood with X as opposed to
without X. Maximum likelihood estimators are normally distributed in large samples, can do
inference! ML estimates the unknown parameters by choosing them such that the likelihood
of drawing the sample we observe is maximized (hence, estimates the optimal alpha and
betas)
Using robust standard errors is imperative as the residuals in a linear probability model
always are heteroskedastic
Interpretation of Probit coefficient …
For standard independent variables: one unit change in X is associated with B1 change in z
For log-transformed variables: one unit change log X is associated with B1 change in z
z is subsequently interpreted as an associated probability drawn from the cumulative
normal standard distribution (CDF)
Due to the less straightforward and economic/logic interpretation of probit coefficients it is a
more common approach to report the marginal effects. One feature of the probit is that each x
will have a different effect on z! (Marginal effects will differ for different X)
The benefit of the S-shape is that it predicts conditional probabilities in the interval 0 to 1!
T-statistics and confidence intervals can be used due to large sample approximation (CLT)
The only virtual difference between probit and logit is the distribution function, which also
has implications for the interpretation. Logit uses the standard logistic distribution function.
Log-odds interpretation (prob of success / prob of failure) – difficult to comprehend

119
Q

What are the threats to Internal Validity for Idealized Experiments?

A
  1. Failure to Randomize
    If the treatment is not assigned randomly, but instead is based on characteristics or preferences of the subject. This makes the experimental outcome reflect both the effect of the treatment and the effect of the nonrandom assignment.
    For example, if you have a group of individuals and split them systematically in one control group and one treatment group based on their last name, there might be some ethnical differences in the lasts names. Furthermore, there might be several differences in different ethnicities and this might again lead to a bias. So, there can become a correlation between X and u.
    Can test for random receipt of treatment. Use the hypothesis that the coefficients on W1….Wn are 0 in a regression of X on W1….Wn and then compute F-statistics whether the coefficients on the W’s are 0.
  2. Failure to Follow the Treatment protocol / Partial Compliance
    People in the experiment does not do what they are told. Called partial compliance with the treatment protocol. Some controls get treatment, some “treated” get controls. This failure leads to bias in the OLS estimator.
    - 10% of students switched groups because of behavior problems
  3. Attrition
    Some subjects drop out. If the reason for attribution is related to the treatment itself, then the attrition can result in bias in the OLS
    - Students move out of district
    - Students leave for other schools
  4. Experimental effects
    Subjects behaviour might be affected by being in a experiment. (Hawthorne Effect)
  5. Small Sample Sizes
    Experiments on humans might be expensive. This can result in a small sample. A small sample means that the causal effect is estimated imprecisely and raises threats to the validility of confidence intervals and hypothesis tests.
120
Q

What are the threats to External Validity for Idealized Experiments?

A
  1. Nonrepresentative Sample
    The population studied and the population of interest might differ
    - A study will often use volunteers. These volunteers are often motivated. These volunteers might have a similar type of personality, resulting with an estimated average treatment effect that is not informative for the whole population
  2. Nonrepresentative program or policy
    The policy or program of interest must be sufficiently similar to the program studied to permit generalizing results.
    Experimental program is often small scale and tightly monitored. The quality of the actual program, when widely implemented, might therefore be lower than the experimental program.
  3. General Equilibrium effects
    Turning a small experimental program into a widespread, permanent program might change the economic environment.
    Sometimes causal relationships only work when they are applied to some people.
    - Imagine that you do a economic training program in 10 villages in Zimbabwe. This leads to an 40% increase in wages for those included in the study. Imagine that this was done nationwide. As more people become skilled, wages will go down leading to decrease in wage gains.
121
Q

What is Quasi-Experiments / Natural Experiments?

A

Randomness is introduced by variations in individual circumstances that make it appear “as if” the treatment is randomly assigned.

Two types of Quasi-Experiments:

  1. Whether an individual (entity) receives treatment is “as if” randomly assigned, possible conditional on certain characteristics

“Treatment (d) “as if” randomly assigned

• For example a new policy measure that is implemented in one but not in another are, whereby the implementation is “as if” randomly assigned.

  • Does immigration reduce wages? Eco theory suggest that if the supply of labor increases, wages will fall. However immigrants tend to go to cities with high labor demand, so the OLS estimator of the effect on wages of immigration will be biased. Was done a Quasi on Cubans that moved to Miami. Estimated the causal effect on wages of an increase in immigration by comparing the change in wages of low-skilled workers in Miami to the change in wages of similar workers in comparable U.S cities. He found no effect.
    2. Whether an individual receives treatment partially determined by another variable that is “as if” randomly assigned

“A variable (z) that influences treatment (d) is “as if” randomly assigned: use IV regressions

• The variable that is “as if” randomly assigned can then be used as an instrument variable in a 2SLS regression analysis.

122
Q

What is the Difference-in-Difference estimator?

A

Change in Y = Y (after) – Y (before)
Change Y = a + B1*G + u
Treatment group G turns 1 for treatment group and 0 for control group. B1 is the DID-estimator.

PANEL DATA FORMULATION:
y = a + B1DG + B2D + B3G
D – Turns 1 after treatment and 0 before
G – Turns 1 for treatment group and 0 for control group
D * G – interaction term. Effect of being in the treatment group after treatment was received
B1 – DID-estimator

123
Q

What is the parallel trend assumption?

A

Parallel trend assumption requires that in the absence of treatment, the difference between the treatment and control group is the same.

We cannot test for this, but if the treatment and control firms seem similar before the treatment, this is more likely to be the case

124
Q

What is the best possible way to measure causal effect?

A

Conceptually, the way to estiamte a causal effect is in an ideal randomized controlled experiment, but performing experiments in economic applications can be unethical, impractical, or too expensive

125
Q

What is Stationary? Why is it important?

A

A time series is stationary if it has a constant mean, variance and autocorrelation.

A time series can consist of trend and seasonality. When analyzing time series, you don’t want your results to be affected by these factors. To remove these factors when working with time series, we use first difference, and sometimes the logarithm of the first difference.

Tests:
Dickey-Fuller
Augmented Dickey Fuller

126
Q

What are the Gauss-Markov Conditions?

A
  1. Linear in parameters
  2. Zero conditional mean of errors
    a. E[ut | Xjk] = 0. The error and all the independent variables have to be
    independent
  3. No perfect collinearity
  4. Homoscedasticity
    a. Variance of the error term given all observations of our independent variable
    has to be constant
  5. No serial correlation
    a. The covariance between two error terms, given all independent variables, have
    to be zero