Pensum Flashcards

Question

Normal Distribution

Answer 1

Normal Distribution is defined by its bell shape, where both sides are equally weighted. It has a mean of 0, a standard deviation of 1 and a kurtosis of 3. Normal distributions are symmetrical. - bell shape - both sides equally weighted - mean of 0 - std of 1 - kurtosis of 3 - no skewness

Answer 2

Similar to the normal distribution which has […..]. The difference is that this one got heavier tails, or in other words, a greater kurtosis. This leads to more variance of Y from outliers.

Answer 3

If you have a chi-squared distributed variable and you divide it by another chi-squared variable, we have a F-distribution. The F-distribution is a probability density function that is used especially in analysis of variance and is a function of the ratio between two independent random variables each of which has a chi-square distribution and is divided by its number of degrees of freedom. A intuitive explanation: Lets say that you want to study a new vaccine. Group A gets put on 10 mg, B on 5 mg and C on placebo. The Mean Square Error (MSE) is the variance of group A plus variance of group B plus variance of group C. You take the variance for each group and find its mean. You then add the 3 means together and divide it by 3, to get the mean of the whole sample. This is the Mean Square Error (MSE). You then find the Mean Square between the groups (MSB). This is found by multiplying each groups sample mean by the number of items in each group. The F-statistics is then the ratio of these two variances: F = MSB/MSE.

Answer 4

The central limit theorem states that, under general conditions, the distribution of a sample approximates a normal distribution as the sample size becomes larger. Each variable themselves in the distribution can be random, but the more we add, the closer to normal distribution we get, and the closer we get to the real population distribution. With higher N you have a higher probability of having normality and consistency. Consistency is the probability that the estimate is close to the true value when the sample increases

Answer 5

When the error term doesn’t have a constant variance. The error term u is homoscedastic if the variance of the conditional distribution of u given X is constant and does not depend on X. Otherwise, the error therm is heteroskedastic. Homosced: The error has a constant variance Heterosced: The error has not a constant variance. The distribution of the errors u is for various values of X. imagine a plot where the variance is large and one where it is small and compact. The coefficient is unbiased. HETEROSCED LEADS TO BIASED STANDARD ERRORS, NOT UNBIASED COEFFICIENTS. Because of that, we can use clustered

Answer 6

When an independent variable is highly correlated with one or more of the other independent variables in a multiple regression equation. This can affect B1 and B2 etc. as we don’t get their real values. Can use Volatility inflation factor to test if there is multicollinearity. A rule of thumb is that there is multicollinearity if VIF > 10. The solution to this problem is simply just to drop the variable LEADS TO UNBIASED STANDARD ERRORS

Answer 7

P-value is the smallest significance level at which the null hypothesis could be rejected. The p-value is the probability that we observe test results as extreme as the ones we have observed, given that the null hypothesis is true

Answer 8

Linear probability model (LPM), the predicted probability that y=1. Our dummy variables Is the dependent variable. One disadvantage is that LPM says that a change in the probability given a change in x is the same for all values of x. PROBIT: Standard normal cumulative distribution. Models the probability that y=1 using cumulative normal distribution. So we get the marginal probability depending on the value of X + a probability between 0 and 1. LOGIT: The probability that y=1 using cumulative standard logistic distribution. The Beta will be different from the PROBIT. The coefficient is given in log-odds form, which is the logarithm of the odds ratio. It is common to look at both these models marginal effect, which is the effect on the dependent variable given a small change in the regressor. ``` MAXIMUM LIKELYHOOD (MLE). Likelihood function is the conditional density of y1..yn given x1….xn treated as a function of the unknown parameters, alpha and Beta. We maximize the probability of ebserving the data given the assumed model. The MLE value is the value of alpha, beta1…Bn that best describes the full distribution of the data. In large samples, the MLE will be consistent, normally distributed and efficient, as it has the smallest variance of all estimators. ``` MEASURES OF FIT Logit, Probit. R2 makes no sense here, not wven for LPM. So we use two other measures: 1.the fraction correctly predicted = fraction of y’s for which the predicted probability is > 50% when y is 1, or is <50% when y is 0. 2. the pseudo R2 (McFadden R2) measures the improvement in the value of the log likelihood, relative to having no x’s. HYPOTHESIS TESTING Use usual t-test and confidence intervals For joint hypothesis, use likelihood ratio test that compares likelihood functions of the restricted and unrestricted models

Answer 9

Only use one-sided test when there is a clear reason for doing so. For example from economic theory or empirical evidence. Has more statistical pwer to detect an effect in one direction than a two-tailed test. Will occur when effects only can exist in one direction, or if the researchers only care about one direction (not recommended tho). The difference lies in the alternative hypothesis. In one case you are testing if B1 is only greater or only lower than 0. In the other case, you are testing with the possibility of both scenarios. Same nullhypothesis, different alterniative hypothesis. Construction of the t-statistics is the same. Only difference is how you interpret the t-statistics.

Answer 10

If we want to test if the mean is statistically and significantly equal to x, we can do a two-sided hypothesis. In other words, we want to test if B1 = 0. That gives us: H0: B1 = 0 HA: B1 is not 0 Think of how normal distribution looks. If we use a significance level of 0.05 (or alpha = 0,05), the two tailed test will test the probability with an alpha of 0,025 on both tails. 1. First compute the standard error of Y, which is an estimator of the standard deviation of the sampling distribution of Y. 2. Compute the t-statistics 3. Compute the p-value. P- value is the smallest significance level at which the null hypothesis could be rejected. 4. Or use t-statistics: Reject H0 if the t-statistic is larger than absolute value of 1,96 Alternatively to the third step, you could compare the t-statistics to the critical value appropriate for the test with its significance level, that is the absolute value of 1,96 if you are testing on a 5%. Reject H0 if the t-statistic is larger than absolute value of 1,96.

Answer 11

In a one-sided test, the alternative hypothesis will be if B1 is either lower or if its higher than for example 0. A one-sided test should only be used when there is a clear reason of doing so. This reason can come from economic theory, your knowledge etc. You now test with an alpha of 0,05 on one tail. Not with 0,025 on each tail.

Answer 12

For B1: the set of values that cannot be rejected using a two-sided hypothesis test with a 5% significance. An interval that has 95% proability of containing the true value of B1. If we run a regression and create a confidence interval at 95%, the true population would be between the confidence interval 95% of the time.

Answer 13

1. Testing hypothesis about the population mean 2. Testing hypothesis about the slope B1 3. Reporting regression equations and application to test scores

Answer 14

Because D can only take two values, there is no “line”, so no sense to talk about a slope. Therefore, we refer to B1 as the coefficient on D. The best way to consider the regression is to compare the regression when D = 0 to when D = 1. The tests are the same as with normal regressions. The exception is when B1 (the coefficient to D) is 0. That’s why we can test the nullhypothesis that B1 is 0. We divide B1 on its standard error to look at its t-statistics. If it is higher than 1,96, we think that it is not 0.

Answer 15

The R-squared (R^2) represents the proportion of the variance for a dependent variable. It ranges between 0 and 1, but will usually fall in some place in between the extreme values. R2 tells us how much of the variance of y that is explained by our regressors. When you add one regressor, R2 will always increase. Adjusted R2 will not always increase, as it penalizes adding more regressors. If β_1=0, X explains nothing of the variance in Y, and R-squared should be 0. The Explained Sum of Squares (ESS) is the sum of the differences between the predicted value and the mean of the dependent variable The Sum of Squared Residuals (SSR) is the difference between the observed value and the predicted value. We can interpret the residual as the remaining, or the unexplained. The Total Sum of Squares is the difference between the observed dependent variable, and its mean.

Answer 16

The standard error of the regression (S), also known as the standard error of the estimate, represents the average distance that the observed values fall from the regression line. The Standard Error of the Regression estimates the standard deviation of the regression error u. It does this by measuring the mean distance between the observed value, and the value on the regression line. The Standard Error of the regression can be used to make confident intervals.

Answer 17

Assumption 1: The Error Term has Conditional Mean of Zero No matter which value we chose for X, the error term u must not show any systematic pattern and must have a mean of 0. In other words, the OLS regression will on average be equal to zero. This assumption still allows for over and underestimations of Y, but the OLS estimates will fluctuate around Y’s actual value Assumption 2: For all n are Independently and Identically Distributed This is a statement of how the sample is drawn. All n need to be Independently Distributed. This means that the outcome of the first value in the sample, can not affect on another. The Distribution is random, and not one of the values are affecting one another – they have all their individual independence. Identically Distributed means that the probability of any specific outcome is the same. For Example, if you flip a coin 100 times, the probability of heads will always be 50/50, and will not change throughout the experiment. Assumption 3: Large Outliers Are Unlikely X and Y have finite kurtosis, as several outliers can give wrong estimations.

Answer 18

The error term u is homoscedastic if the variance of the conditional distribution of u given X is constant and does not depend on X. Otherwise, the error therm is heteroskedastic. Homosced: The error has a constant variance Heterosced: The error has not a constant variance. The distribution of the errors u is for various values of X. imagine a plot where the variance is large and one where it is small and compact.

Answer 19

When you have estimated a linear regression, you will wonder how good the regression line describes the data. The R2 and the standard error measure how well the OLS regression line fits the data. The R-squared (R^2) represents the proportion of the variance for a dependent variable. It ranges between 0 and 1, but will usually fall in some place in between the extreme values. If β_1=0, X explains nothing of the variance in Y, and R-squared should be 0. The Explained Sum of Squares (ESS) is the sum of the differences between the predicted value and the mean of the dependent variable The Sum of Squared Residuals (SSR) is the difference between the observed value and the predicted value. We can interpret the residual as the remaining, or the unexplained. The Total Sum of Squares is the difference between the observed dependent variable, and its mean.

Answer 20

The Standard Error of the Regression estimates the standard deviation of the regression error u. It does this by measuring the mean distance between the observed value, and the value on the regression line. The Standard Error of the regression can be used to make confident intervals. SSR/(n-2) n-2 because it is correcting a bias with two regressor coefficients that were estimated (β_1 and β_0 )

Answer 21

Assumption 1: The Error Term has Conditional Mean of Zero No matter which value we chose for X, the error term u must not show any systematic pattern and must have a mean of 0. In other words, the OLS regression will on average be equal to zero. This assumption still allows for over and underestimations of Y, but the OLS estimates will fluctuate around Y’s actual value In AM Football, the score is given by: Score = 6 * Touchdown + 1*Extrapoints + 3* Field Goals +2*safeties If you ran the regression: Score = b1 * Touchdown + b2*fieldgoals + e, b1 would be larger than the value of 6. The error term contains biases. Assumption 2: For all n are Independently and Identically Distributed This is a statement of how the sample is drawn. All n need to be Independently Distributed. This means that the outcome of the first value in the sample, can not affect on another. The Distribution is random, and not one of the values are affecting one another – they have all their individual independence. Identically Distributed means that the probability of any specific outcome is the same. For Example, if you flip a coin 100 times, the probability of heads will always be 50/50, and will not change throughout the experiment. If the sampling is random, then it is representative for the population. As an example, you don’t only go to Texas if you want to research the average American income. Assumption 3: Large Outliers Are Unlikely X and Y have finite kurtosis, as several outliers can give wrong estimations. Large outliers will mess up our distribution and make OLS misleading

Answer 22

Winzoring: Transforming extreme values. Triming: deleting extreme values For example, you can winzorize/trim at 5% interval. This will affect 2,5% of the empirical distribution to the right and left.

Answer 23

UNBIASED: The expectation of beta0 is the real beta0 and the expectation of beta1 is the real beta1. The estimated coefficient might have a smaller or larger value, much depending on our sample size. However, on average, they will be equal to its true value (relationship between x and y).If 10 people estimate B1, they all will probably miss my some value. If 100 more does it we get closer to the real value. And so it goes on. CONSISTENCY: The probability that the estimate is close to the true population will increase by increasing the sample size. NORMALITY: If n Is large enough, we can use critical values for a normal distribution

Answer 24

1: Error Term has a conditional mean of zero 2: I.I.D 3: Large outliers unlikely 4: No perfect multicollinearity Perfect multicollinearity is when one of the regressors is an exact linear function of the other regressors. This includes a variable that is included twice in the regression, or a dummy variable trap. Perfect multicollinearity occurs if two or more regressors are perfectly correlated. In reality, we will not often see two regressors that are perfectly correlated. That is why it most often occurs from the dummy trap or by including the same regressor twice. Can use Volatility inflation factor to test if there is multicollinearity. A rule of thumb is that there is multicollinearity if VIF > 10. The solution to this problem is simply just to drop the variable

Answer 25

Perfect multicorr uccurs if two or more regressors are perfectly correlated. In reality, we will not often see two regressors that are perfectly correlated. That is why it most often occurs from the dummy trap or by including the same regressor twice. Can use Volatility inflation factor to test if there is multicollinearity. A rule of thumb is that there is multicollinearity if VIF > 10. The solution to this problem is simply just to drop the variable

Answer 26

PROBLEM: - Coefficients are unbiased and consistent - Standard errors are biased - OLS t statistic does not follow a t distribution - (Fail to) reject H0 too often or not often enough SOLUTION - Use heteroskedasticity robust standard errors - Prudent to assume errors are heteroskedasticity unless there is a compelling reason Implementation see Lab example

Answer 27

The dummy trap occurs when you add a dummy variable for each redundant. For example, adding a dummy variable for both male and female. Since the value of male can be predicted by excluding one of these dummy’s, we will experience perfect multicollinearity by adding both.

Answer 28

They lead to violation of LS.1, hence the coefficient is biased. In contrast, heteroscedasticy and multicollinearity lead to biased standard errors, not biased errors.

Answer 29

Does NHH provide students with the skills that are rewarded in the labour market? Have two variables, salary and NHH. BUT: salary does also depend on intelligence. NHH has noe influence on the IQ of a person, but NHH selects student based on IQ. Hence, the IQ can be seen as an omitted bias. In a population X, where Z determine Y. but we only estimate the model with X. that means that the influence of Z gets in our error term. As long as there is a non-zero covariance between X and Z, then the zero mean assumption is violated.

Answer 30

Happens when one or more of the independent variables are jointly determined with the dependent variable. Supply/demand a good example. Quantity and price Investments and Productivity Sales and advertisement This leads to violation of LS.1, hence our coefficient is biased.

Answer 31

A type of bias that arises by choosing non-random data for statistical analysis. For example when people volunteer for a study. Those who volunteer might share the same characteristics. For example, you want to study the context between veganism and undergraduate students. You send out a survey to the students in class of art and culture. Because this is not a random draw sample, it is not representative for the target population. These students might be more liberal etc.

Answer 32

Data is often measured with error. For example, this can be Reporting error Coding error Estimation error

Answer 33

The effect one unit x has on y depends on the value of x. in other words, its marginal effect is not constant.

Answer 34

Interaction between two binary variables Interaction between binary and continuous variable Interaction between two continuous variables

Answer 35

D1 and D2 is dummy variables. B1 is the effect of changing D=0 to D=1. In this specification, this effect does not depend on the value of D2. To allow the effect of changing D1 to depend on D2, include the “interaction term” D1 * D2 as a regressor So if our regression is y = b0 + b1*D1 + b2*D2 + b3 * D1 * D2 + u: B3 is the incremental effect o changing D1 from 0 to 1 when D2 = 1 In short terms: if we have y = b0 + b1D1 + b2D2, the value of b1 is only dependent on the value og D1. What if it in reality also depends on the value of D2? That is why we fix it to: y = b0 + b1*D1 + b2*D2 + b3 * D1 * D2 + u: b3 will now capture the last effect, which is how D1 and D2 will interfer.

Answer 36

y = b0 + b1x + b2D + u y = b0 + b1x + b2D + b3D*x + u To allow the effect of changing x to depend on D, include the “interaction term” D * x as a regressor

Answer 37

y = b0 + b1x + b2D + u y = b0 + b1x + b2D + b3D*x + u To allow the effect of changing x to depend on D, include the “interaction term” D * x as a regressor

Answer 38

y = β0 + β1x + β2D + β3D × x + u When Di = 0: y = β0 + β1x + u. When Di = 1: y = (β0 + β2) + (β1 + β3)x + u By playing with specification of the regression equation we can have regression lines that are different: Both in slopes and intercepts (current specification). In slopes only (drop β2D from regression equation). In intercepts only (drop β3D × x from regression equation). We are now allowing x to depend on D. The value of x might be different based of D. we get the different alternatives: B0 + b1 + b2D: this allows for a different intercept, but has the same slope B0 + b1x + d2D + b3(x * D): allows for different intercept and slope B0 + b1x + b2 (x*D): same intercept, allows for different slope Steps: 1. Test if the two lines in fact are the same. This is done by F-statistical testing with joint hypothesis. Test if b2 and b3 are 0. If so, that leaves us with b0 + b1x which is the same slope and same intercept as the original 2. Test if the two lines have the same slope If so, b3(x + d) = 0, because b3 = 0. So divide b3 on its standard error and find its t-statistics. 3.Test id they have the same intercept y = b0 + b1x + b2D + b3(x * D) y = b0 + b1X + B2D To have the same intercept, b2D must equal 0, leaving us with intercept b0 and not b0 + b2. Find b2’s t-stat by dividing its value on its standard error.

Answer 39

y = β0 + β1x + β2D + β3D × x + u Compute y before and after change in x Before: y = β0 + β1x + β2D + β3D × x. After: y + 4y = β0 + β1(x + 4x) + β2D + β3D × (x + 4x). Substract “before” from “after”. 4y = β14x + β3D × 4x. I 4y 4x = β1 + β3D. The effect of x depends on D. β3 = increment to the effect of x, when D = 1. y = β0 + β1x + β2D + β3D × x + u The two regression lines have the same slope H0: β3 = 0. The two regression lines have the same intercept H0: β2 = 0. The two regression lines are the same H0: β2 = β3 = 0.

Answer 40

y = β0 + β1x1 + β2x2 + u x1 and x2 are continuous variables. As specified, the effect of x1 does not depend on x2. As specified, the effect of x2 does not depend on x1. To allow the effect of x1 to depend on x2, include the “interaction term” x1 × x2 as a regressor: y = β0 + β1x1 + β2x2 + β3x1 × x2 + u

Answer 41

Lin-log; 1% increase in X gives a 0,01 * b1 increase in y Log-lin: one unit increase in x gives 100% * b1 increase in y Log-Log: 1 % increase in x gives b1% increase in y.

Answer 42

Regressions where one unit change in x’s impact on y will differ dependent on the value of x. in other words, it does not have a constant marginal cost. the relation between x and y is not linear. One unit change from 5 to 6 will have a different impact than 10-11.

Answer 43

1.use your economic knowledge 2.estimate a regression using OLS 3. test if non-linear is better than linear. Can be done by t-stat and f-stat 4-plot the nonlinear function. Does the regressor describe the data well? Does the regression fit the scatterplot? 5.estimate effect on Y of a change in x

Answer 44

Polynomials is made by having x with powers. Most usually 2, 3 or 4 powers. A power of 2 makes a quadratic function while power of 3 makes a cubic function. The amount of degrees/powers can be denoted as r. r degrees gives r-1 bends.

Answer 45

Find out how many powers that is suitable for your regression by hypothesis testing. Some times you might know what powers that are most suitable by economic theory or by just looking at the scatterplot. 1. pick a maximum value for r, lets say 3 2. test for non-linear versus linear. Can test for several powers by using F-stat. The nullhypothesis will be that b2 = 0, b3 = 0 etc. The alternative hypothesis is that at least one is not 0. 3. can then test if power 3 is suitable. Use t-stat to see if the beta coefficient is 0. H0: b = 0, HA: b is not 0. 4. if you do not reject h0, eliminate one power from the beta 5. continue until you find the highest power that is statistically significant.

Answer 46

Cubic versus linear: F-test | Cubic versus quadratic: t-test

Answer 47

The quadratic model has power of 2. That means that it will have a turning point. This leaves us with a miximum value. All results after this maximum value should not be interpreted as they are not reliable. So we are given a specific range to exploit.

Answer 48

Natural starting point is the linear regression model with a single regressor. In the LPM, the predicted value of y is interpreted as the predicted probability that y = 1, and β is the change in that predicted probability for a unit increase in x y = α + βx + u The LPM models Prob(y = 1|x) as a linear function of x

Answer 49

When y is binary, the linear regression model y = α + βx + u is called linear probability model because Prob(y = 1|x) = α + βx

Answer 50

Advantages: Simple to estimate and interpret Inference the same as for multiple regression (note: LPM is inherently heteroskedastic) Disadvantages: A LPM says that the change in the predicted probability for a given change in x is the same for all values of x, but that does not always make sense Also, LPM predicts probabilities that can be < 0 and > 1 Overall: we need a non-linear model: probit and logit regression

Answer 51

The main reason is historical: logit is computationally faster and easier, but that does not matter nowadays In practise, logit and probit are very similar Empirical results typically should not hinge on the logit versus probit choice

Answer 52

Marginal effect: the effect on the dependent variable that results from changing an independent variable by a small amount Probit and logit are non-linear functions Hence, the ultimate effect of one unit change in a regressor (x) on predicted probabilities is different from different values of x It is common to report marginal (or partial) effects instead of coefficients Marginal effect at the means I Average marginal effect

Answer 53

``` The likelihood function is the conditional density of y1, . . . , yn given x1, . . . , xn treated as a function of the unknown parameters α, β Maximizing the probability of observing the data given the assumed model For probit (with one explanatory variable): ``` The maximum likelihood estimator (MLE) is the value of α, β1, . . . , βk that maximinize the likelihood function The MLE is the value of α, β1, . . . , βk that best describe the full distribution of the data In large samples, MLE is: consistent I normally distributed, efficient (has the smallest variance of all estimators)

Answer 54

The R 2 and Adjusted R2 do not make sense here (even for LPM). So, two other specialised measures are used The fraction correctly predicted = fraction of y’s for which the predicted probability is > 50% when yi is 1, or is < 50% when yi is 0

Answer 55

Usual t-tests and confidence intervals can be used by testing h0: b=0. For joint hypothesis test, use likelihood ratio test that compares likelihood functions of the restricted and unrestricted models. For joint hypothesis, there is for example H0: B0 = 0, B1 = 0, several ones you test. Because there is several, you should use F-statistics because it is designed for testing several at a time. You run one regression under the null hypothesis. You then compare the fits of this regression – its R2 – to the fitted one. If the unrestricted fits better, reject h0.

Answer 56

Panel data is suitable if you have data with different entities that go over time. Pooled OLS will ignore the individual dimensions for each entity and their time. It will assume that there are no variation in the data. Random effects can be used when there is random random variation. This model allows us to think that each entity can have their own time specifics that differ. Balanced data has no missing periods, while unbalanced has that.

Answer 57

With panel data we can control for factors that (1) vary across entities but do not vary over time; could cause omitted variable bias if they are omitted; are unobserved or unmeasured – and therefore cannot be included in the regression using multiple regression.

Answer 58

Z – varies across different entities but do not over time. Z is suppose to correct the model for different omitted variable biases. This variable do not vary over time. Z can for example the the local attitude towards drinking. It can be different in Texas compared to California. Assumes within the entity that the effect does not change. For example, the attitude does not change in Texas, nor in Cali. Another example can be if studying what price Norway can export for to different countries. For example Z can then cover Frances’ price sensitivity that might be different from what we see in China. It is important to note that France’s or China’s price sensitivity will not change with this example, because z does not take time into account S – will take care of the time specific effects that do not vary across different entities. This can eg be national laws or national better safety for cars. In the study of norway’s export, it can for example control for the covid-19 lockdown.

Answer 59

Same as OLS with some small adjustments 1. Error term must have a conditional mean of zero: so there shall be no interactions between x and u. a good profit one year does not mean anything the next year, 2. I.I.D: this one is not the same as in a regular OLS. The standard error will not be appropriate to use, because we will use robust ones to find the clustered standard errors. This allows usto have both heteroskedasticy and autocorrelation. The autocorrelation makes sense, because performance one day can affect the entitys performance the net day. This is for WITHIN an entity. Firm A’s performance one day do not affect Firm B, so in that sense it is individually.

Answer 60

1. Binary regression 2. Entity demeanded 3. First diff specification First difference specification only works when t=2. In this case you substract one regression from the other, for example numbers from 1988 minus 1982 – and then assume that z has not changed 1 and 2 will give the same results. In 1. You include t-1 dummy variables, all except one, because you don’t want to fall in the dummy trap Number 2 is the best alternative. Easiest and most appropriate. In a big dataset. In a large dataset, it will be a pain in the ass to handle all the dummies from number 1.

Answer 61

Use plm-package with the “within” function. Next use “coeftest” and its function “vcovGC”. This will extract the heteroskedasticy and handle the autocorrelation.

Answer 62

With panel data you can control for factors that: (1) vary across entities but do not vary over time, (2) could cause omitted variable bias if they are omitted, (3) are unobserved or unmeasured – and therefore cannot be included in the regression using multiple regression. The key idea: if an omitted variable does not change over time, then any changes in y over time cannot be caused by the omitted variable

Answer 63

The intercept is unique for each entity, but the slope is the same for all. Recall that shifts in the intercept can be represented using binary regressors

Answer 64

An omitted variable might vary over time but not across states. This can for example be safer cars or changes in national laws. These produce intercepts that change over time. We use S to find the combined effect of variables with changes over time that are same for each entity.

Answer 65

The usual OLS standard errors will in general be wrong, because they assume that the error term is not autocorrelated. The solution for this is to use clustered standard errors. We allow for autocorrelation WITHIN entities. Are also robust for heteroskedasticy within and across entities.

Answer 66

X is correlated with the error term. Think of X as a two parts: one part that is correlated and one that is not correlated with the error term. We isolate the part that correlates with the error term by using instruments. One study from USA was on how being in the military affected the future income. One can with good reason believe that people that volunteer for the military comes from poor neighborhoods which is correlated with less future income. Besides that, there is also good reason to believe that people from good neighborhoods have more money and power, hence not joining the military. To fix this bias, there there was used an instrument on people getting drafted to the military. Being drafted to the military is completely random. By using this as an instrument, one was able to split the X into two parts and eliminate the effect that people in the military might come from poor neighborhoods.

Answer 67

Engogen: Variables that are correlated with the error term Exogen: Variables that are not correlated with the error term. (but things outside of the model?

Answer 68

RELVEVANT and EXOGENOUS: The two conditions for a valid instrument: 1. Instrument relevance: if an instrument is relevant, then variation in the instrument is relevant to the variation in X. 2. Instrument Exogeneity: Z is correlated with Y solely through its correlation with X. Relevant: It is relevant in a way that the IV actually affects X Exogenous: Our IV only affect Y through X.

Answer 69

RELVEVANT and EXOGENOUS: The two conditions for a valid instrument: 1. Instrument relevance: if an instrument is relevant, then variation in the instrument is relevant to the variation in X. 2. Instrument Exogeneity: Z is correlated with Y solely through its correlation with X. Relevant: It is relevant in a way that the IV actually affects X Exogenous: Our IV only affect Y through X.

Answer 70

Lets call the instrument for Z. if it satisfy the two conditions of relevance and exogeneity, we can estimate B1 by using an IV estimator called two least squares (TSLS). TSLS is calculated in two stages. First stage splits X into two parts; one part that is problematic and might be correlated to the error term, and one other part that is problem-free. The second stage uses the problem free part to estimate B1. In the first stage you regress x on its instrument that gives X(hatt). You then put X(hatt) in the regression. From our example, the intuition is now that we only regress future salary on those who got draftet, thus eliminates the bias that veterans usually earn less.

Answer 71

It is UNDERIDENTIFIED if it has less IV than endogeneous variables. Can not be computed It is EXACTLY IDENTIFIED if it has the same number of IV’s and endogenous. It can now be computed but cannot be tested. Hence, one would need a good storytelling, economic knowledge to be certain that it is the right one to use. It is OVER-IDENTIFIED if it has more instruments than endogenous variables. It can now be tested if the instruments are RELEVANT and EXOGENOUS.

Answer 72

1.First step is to test for relevance: First we make a regression of X and all of its IV’s. H0: Instruments do not have any effect on X. If H0 is rejected, the instruments are relevant. A general rule of thumb for this testing is to look at the F-statistics. If it is greater than 10, we reject H0, concluding that the instruments are relevant. If it is lower than 10, it implies that the instruments are weak. 2. Second step is to test for Exogeneity: We use a J-test one the error term to see if the instruments are exogenous. If they are exogenous, it means that all of our IV’s has a conditional mean of 0. The null hypothesis of the J-test is that all of our instruments are exogen variables and that they don’t have any relation to the error term. This gives us a chi-squared distribution with (m – k) numbers of freedom where m is the number of instruments used an k is the number of endogen variables. We then compute an F-test and look at the p-value. A p-value over 0,05 tells us that all of our IV’s are exogenous with 95% certainty. So we want to keep H0. When we do a J-test, our nullhypothesis is that all of our instrument is exogenous variables. And that they don’t have any relation to the error term. This gives us a chi-squared distribution with (m – k) numbers of freedom, where m is # of instruments and k is # of endogen variables. We then compute an F-test and look the p-value. WE WANT A P-VALUE THAT’S HIGHER THAN 0,05 OR 0,10, BECAUSE THIS TELLS US THAT OUR IV’S ARE EOGENOUS. SO WE WANT TO KEEP THE H0 IN THIS CASE, OR WE WANT TO FAIL TO REJECT THE H0.

Answer 73

If we were to run t-tests on all and reject the whole regression if one turned out significant, the size of the test would depend on the correlation between t1 and t2.

Answer 74

The size of a test is the actual rejection rate under the null hypothesis

Answer 75

We can run joint hypothesis tests. The test is done by measuring the reduced sum of squared errors by adding the variables. If the sum of squared errors are reduced enough, we reject H0 (all coefficients are equal to zero)

Answer 76

o The p-value is the probability that we observe test results as extreme as the ones we have observed, given that the null hypothesis is true. Probability of type-1 error (the rejection of a true null hypothesis). P-value gives probability for observing our value, given that the H0 is true. Usually we use 5% significance level, meaning that in 1/20 times we could randomly get our value even though the H0 is true.

Answer 77

An estimator whose expected value (the mean of its sampling distribution) is equal to the population value. Of course, we need to increase our sample size to be sure that the mean value actually is correct.

Answer 78

An estimator converges in probability to the correct population value as the sample size increases (the normal distribution gets tighter and tighter)

Answer 79

A more efficient estimator needs less observations in order to achieve a given performance. If we can get smaller standard deviations, our estimator becomes more efficient!

Answer 80

For consistent estimators, with asymptotically normal distributions, the estimator with the smallest asymptotic variance

Answer 81

The sampling distribution of a properly normalized estimator converges to the standard normal distribution

Answer 82

The distribution of sample means approximates a normal distribution (also known as a “bell curve”), as the sample size becomes larger. Given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population. Furthermore, all the samples will follow an approximate normal distribution pattern, with all variances being approximately equal to the variance of the population, divided by each sample's size

Answer 83

The number of elements than can vary

Answer 84

T-test and F-test

Answer 85

Jarque-Bera Shapiro-Wilk Kolmogorov-Smirnov

Answer 86

Goldfeld-Quant Breusch-Pagan White

Answer 87

Durbin-Watson | Breusch Goodfrey

Answer 88

Volatility Inflation Test

Answer 89

Hausman test | J-test (IV)

Answer 90

Likelihood ratio test Wald test Score test (Lagrange Multiplier test)

Answer 91

Dickey-Fuller (full root) Augmented dickey Fuller Dickey Fuller test with time trend

Answer 92

F-test for relevance | J-test for exxogeneity

Answer 93

Main purpose of regression is to take samples from a population and produce estimates in which we can perform inference on the population with. For the inference to be valid, certain assumptions must be satisfied. 1. Normality 2. Consistency: As the sample size increases, the estimates produced by the estimator “converge” to the true value of the parameter being estimated. Increasing the sample size is allowed because it is seen as increasing “n” closer and closer to the true population size. 3. Unbiasednes: : A statement about the expected value of the sampling distribution of theestimator. Is not affected by size. Only satisfied when the SR/MR 1-5 are satisfied = Gauss Markov theorem = BLUE estimator. Unbiased equals “forventningsrett” estimator!! 4. Efficiency: An estimator is efficient if it gets closer to the true parameter more often than other estimators (i.e. it has lower variance around the true parameter) -> BLUE. If BLUE, there exist no other estimators that better explains the true population.

Answer 94

Random effects, Fixed effects and First Differences to control for unobserved variables that: (1) change across entities but not over time or (Entity Fixed) (2) change over time but not across entities (Time Fixed) Random effects can be applied if unobserved factors are uncorrelated with the independent variables, but only then. Less extreme than FE and FD, so could be an appropriate measures in some occasions. Use of Fixed Effects solves for unobserved heterogeneity and is consistent regardless of the covariance of the unobserved effects and the independent variables are zero or not. Since this covariance very rarely are equal to zero, we often must use FE/FD instead of Random Effects. All other assumptions are equal in the three approaches. However, disadvantage by using FD is that we lose one period T (or i) of observations by taking the first difference to get rid of the constant term, and by using FE we ensure that we cannot estimate any variables that are constant over time (or entity). Stricter approaches. FD and FE is however equal approaches, yielding the exact same result, if T = 2. Unobserved heterogeneity: existence of unmeasured/unobserved differences between (1) entities or (2) across time, that are associated or correlated with the independent variables. If not corrected for this will lead to contradicting the zero conditional mean through endogeneity, leads to bias.

Answer 95

In models with binary dependent variables, the regression function is interpreted as a conditional probability function of the binary dependent variable. Three choices: (1) LPM (2) Probit and (3) Logit. Probit and Logit models allows for non-linear relationship between regressors and dependent variable. Assumptions for probit and logit: • Linear in parameters • Random sampling • No perfect multicollinearity • Zero conditional mean of errors • Homoskedasticity Assumptions on parameters: • Consistency: when sample increase, the estimated B will converge to the real B. • Unbiasedness: The expected value of the estimated B will be equal to the true B. Kjøpt av duy kim tran - astudenten - duy_kim@hotmail.com - på Filefora.no 16 R2 has no meaningful interpretation due to the regression line never being able to fit the data perfectly if the dependent variable is binary and regressors are continuous, as R2 relies on a linear relationship between X and Y -> Use correctly predicted proportion (“hitrate”) or PseudoR2 (McFadden), which compares the maximum likelihood with X as opposed to without X. Maximum likelihood estimators are normally distributed in large samples, can do inference! ML estimates the unknown parameters by choosing them such that the likelihood of drawing the sample we observe is maximized (hence, estimates the optimal alpha and betas) Using robust standard errors is imperative as the residuals in a linear probability model always are heteroskedastic Interpretation of Probit coefficient … For standard independent variables: one unit change in X is associated with B1 change in z For log-transformed variables: one unit change log X is associated with B1 change in z z is subsequently interpreted as an associated probability drawn from the cumulative normal standard distribution (CDF) Due to the less straightforward and economic/logic interpretation of probit coefficients it is a more common approach to report the marginal effects. One feature of the probit is that each x will have a different effect on z! (Marginal effects will differ for different X) The benefit of the S-shape is that it predicts conditional probabilities in the interval 0 to 1! T-statistics and confidence intervals can be used due to large sample approximation (CLT) The only virtual difference between probit and logit is the distribution function, which also has implications for the interpretation. Logit uses the standard logistic distribution function. Log-odds interpretation (prob of success / prob of failure) – difficult to comprehend

Answer 96

1. Failure to Randomize If the treatment is not assigned randomly, but instead is based on characteristics or preferences of the subject. This makes the experimental outcome reflect both the effect of the treatment and the effect of the nonrandom assignment. For example, if you have a group of individuals and split them systematically in one control group and one treatment group based on their last name, there might be some ethnical differences in the lasts names. Furthermore, there might be several differences in different ethnicities and this might again lead to a bias. So, there can become a correlation between X and u. Can test for random receipt of treatment. Use the hypothesis that the coefficients on W1….Wn are 0 in a regression of X on W1….Wn and then compute F-statistics whether the coefficients on the W’s are 0. 2. Failure to Follow the Treatment protocol / Partial Compliance People in the experiment does not do what they are told. Called partial compliance with the treatment protocol. Some controls get treatment, some “treated” get controls. This failure leads to bias in the OLS estimator. - 10% of students switched groups because of behavior problems 3. Attrition Some subjects drop out. If the reason for attribution is related to the treatment itself, then the attrition can result in bias in the OLS - Students move out of district - Students leave for other schools 4. Experimental effects Subjects behaviour might be affected by being in a experiment. (Hawthorne Effect) 5. Small Sample Sizes Experiments on humans might be expensive. This can result in a small sample. A small sample means that the causal effect is estimated imprecisely and raises threats to the validility of confidence intervals and hypothesis tests.

Answer 97

1. Nonrepresentative Sample The population studied and the population of interest might differ - A study will often use volunteers. These volunteers are often motivated. These volunteers might have a similar type of personality, resulting with an estimated average treatment effect that is not informative for the whole population 2. Nonrepresentative program or policy The policy or program of interest must be sufficiently similar to the program studied to permit generalizing results. Experimental program is often small scale and tightly monitored. The quality of the actual program, when widely implemented, might therefore be lower than the experimental program. 3. General Equilibrium effects Turning a small experimental program into a widespread, permanent program might change the economic environment. Sometimes causal relationships only work when they are applied to some people. - Imagine that you do a economic training program in 10 villages in Zimbabwe. This leads to an 40% increase in wages for those included in the study. Imagine that this was done nationwide. As more people become skilled, wages will go down leading to decrease in wage gains.

Answer 98

Randomness is introduced by variations in individual circumstances that make it appear “as if” the treatment is randomly assigned. Two types of Quasi-Experiments: 1. Whether an individual (entity) receives treatment is “as if” randomly assigned, possible conditional on certain characteristics “Treatment (d) “as if” randomly assigned • For example a new policy measure that is implemented in one but not in another are, whereby the implementation is “as if” randomly assigned. - Does immigration reduce wages? Eco theory suggest that if the supply of labor increases, wages will fall. However immigrants tend to go to cities with high labor demand, so the OLS estimator of the effect on wages of immigration will be biased. Was done a Quasi on Cubans that moved to Miami. Estimated the causal effect on wages of an increase in immigration by comparing the change in wages of low-skilled workers in Miami to the change in wages of similar workers in comparable U.S cities. He found no effect. 2. Whether an individual receives treatment partially determined by another variable that is “as if” randomly assigned “A variable (z) that influences treatment (d) is “as if” randomly assigned: use IV regressions • The variable that is “as if” randomly assigned can then be used as an instrument variable in a 2SLS regression analysis.

Answer 99

Change in Y = Y (after) – Y (before) Change Y = a + B1*G + u Treatment group G turns 1 for treatment group and 0 for control group. B1 is the DID-estimator. PANEL DATA FORMULATION: y = a + B1*D*G + B2*D + B3*G D – Turns 1 after treatment and 0 before G – Turns 1 for treatment group and 0 for control group D * G – interaction term. Effect of being in the treatment group after treatment was received B1 – DID-estimator

Answer 100

Parallel trend assumption requires that in the absence of treatment, the difference between the treatment and control group is the same. We cannot test for this, but if the treatment and control firms seem similar before the treatment, this is more likely to be the case

Answer 101

Conceptually, the way to estiamte a causal effect is in an ideal randomized controlled experiment, but performing experiments in economic applications can be unethical, impractical, or too expensive

Answer 102

A time series is stationary if it has a constant mean, variance and autocorrelation. A time series can consist of trend and seasonality. When analyzing time series, you don't want your results to be affected by these factors. To remove these factors when working with time series, we use first difference, and sometimes the logarithm of the first difference. Tests: Dickey-Fuller Augmented Dickey Fuller

Answer 103

1. Linear in parameters 2. Zero conditional mean of errors a. E[ut | Xjk] = 0. The error and all the independent variables have to be independent 3. No perfect collinearity 4. Homoscedasticity a. Variance of the error term given all observations of our independent variable has to be constant 5. No serial correlation a. The covariance between two error terms, given all independent variables, have to be zero