Pensum Flashcards
What are the steps in empirical analysis?
- Careful formulation for each research question (RQ)
- Construct economic model
- Turn into econometric model
You now have an econometric model. The model is a outcome of your variable coiches, hypothesis development, data gathering and estimation of model parameters
What is Cross-Sectional data?
Cross-sectional data is a sample on different entities, for example firms, households, companies, cities, states, and countries that are observed at a given point in time or in a given period.
What is Time-series data?
Data for a single entity (firms, households, companies, cities, states, countries) collected at multiple time periods.
What is Panel Data?
Also called longitudinal data, are data for multiple entities in which each entity is observed at two or more periods.
Balanced & Unbalanced.
Pooled OLS: If individual effect does not exist. Does not take time-specific effects and variation across entities into account
Fixed Effects: 1) Control for unobserved variables that vary across entities but not over time, and 2) time specific effects that don’t vary across entities.
Can control for biases that control across entities, but not over time. For example, if you are analyzing Norwegian exports to the EU region, this variable can control for the French price sensitivity, which might be different to Polands.
You can also control for time specific effects. From the last example, if there EU issues a law, this will affect all of the buyers (not vary across entities).
Random Effects:
What is the difference between cross-sectional, time-series and panel data?
Cross sectional data consists of multiple entities observed at a single time period.
Time-series data consists of a single entity observed at multiple time-periods
Panel data consists of multiple entities over two or more time-periods
Describe a simple regression model
OLS chooses the regression coefficients with estimates that are as close as possible to the observed data, where closeness is measured by the sum of squares mistakes made in the predicting Y given X. It gives us estimates and extends this to a linear regression.
y is the linear regression model with one single regression, in which y is the dependent variable and X is the independent variable or the regressor. The first part of the equation, B0 + B1*x is the population regression line or the population regression function. This is the relationship that holds between y and x on average, over the population.
The intercept β_0 and the slope β_1 are the coefficients of the population regression line, also known as the parameters of the population regression line. u is the error term. In context, u is the difference between y and its predicted value.
B1 MEASURES THE MARGINAL EFFECT ON Y FOR A UNIT CHANGE IN X
How do you estimate the coefficients in OLS?
Finding the OLS is about finding the predicted value of Y which minimizes the total squared estimation mistakes. This is also called an estimator. An estimator is a function of a sample of data to be drawn randomly form a population. Given estimates β ̂_0,β ̂_1 of β_0,β_1, we can predict y with y ̂
What is a linear model?
Linear model means that the change in y is independent from the level of x
Explain Standard Deviation and Variation
Both Standard Deviation and Variation measures the “spread” of s probability distribution. The Variation is measured in squared unites, while standard deviation is the square root of this number.
What is Casual Effect
Casualty means that there is a specific action that leads to a specific measurable consequence. For example, there might be a correlation between people eating apples and car accidents. The correlation is probably random, and eating apples will probably not reduce the chance of getting in a car accident.
What is the difference between Experimental data and observational data
Experimental data comes from an experiment that is designed to investigate the casual effect. Observational data is obtained by measuring actual behaviour outside of an experiment.
Sample Space and events
Sample space is the set of all possible outcomes. An event is what gives the outcomes. So one event might have a huge sample space; lots of things can happen from that event
Probability Distribution of a random variable
The probability distribution lists all possible values for the variable and the probability that each value will occur. These probabilities sum to 1.
What is Joint probability and distribution
Joint probability is the probability of two events happening together (think venn-diagram). The joint distribution is the probability that X and Y take on certain values. Lets say that X is 1 when its raining and 0 when its not. Y is 1 when there is more than 10 degrees outside and 0 otherwise. The joint distribution of this is the probabilities of how these two scenarios happen, with 4 different outcomes. Each outcome has a probability and summed together they give a value of 1.
Marginal probability distribution
Just another name for its probability distribution. Term is used to distinguish the distribution of Y alone from the joint distribution.
Conditional Distribution
The distribution of a random variable Y conditional on another variable X taking on a specific value.
Conditional Expectation
The mean of the conditional distribution of Y given X
Law of iterated Expectations
The mean height of adults is the weighted average of the mean height for men and the mean height for women, weighted by the proportions of men and women. Mean of Y is the weighted average og the conditional expectation of Y given X.
Covariance
A measure to which extent two random variables move together.
What is the Standard Error in a regression?
The standard error of a regression estimates the standard deviation of the error term in the regression. it does this by measuring the mean distance between the observed value, and the value on the regression line
Kurtosis
Kurtosis is how much mass the distribution has in its tails, and is therefore a measure of how much of the variance of Y that arises from extreme values. Extreme values are called outliers. The greater the kurtosis of a distribution is, the more likely it is to have outliers.
The kurtosis of a distribution is a measure of how much mass is in its tails and therefore is a measure of how much of the variance of Y arises from extreme values. BTW: An extreme value of Y is called an outlier.
Skewness
Skewness can be quantified as a representation of the extent to which a given distribution varies from a normal distribution. A normal distribution has a skew of zero, represented with equal weight on each tail.
If you are measuring height, you might get a mean of 172 with the tails being equally weighted.
If you are measuring income for people working 100%, few people will have an income under 300K. From 300K to 600K, there will probably be a steep increase. From 600K and to infinity, there will be fewer and fewer people, and the curve will be less and less steep. This means that we get the “long tail” on the right side. “long tail” on right side can be called a “positive skew”, so we can say that the distribution is positively skewed.
If we have an easy exam, and a lot of people get A’s or B’s, we will have a negative skew. The long tail will be on the left side, and slowly increase until it hits C or B. From there it will go steeply up. ‘
I.I.D
Independent and Identically distributed
Independent: The result from one event does not have any impact on the other event. So if you roll two dices, the result you got on the first dice does not affect the sum you will get on the second.
Identically: if you flip a coin (heads/tails) each throw gives you a 50/50 chance. The probability does not change over time.
Chi-Squared
DISTRIBUTION:
The distribution is asymmetrical, with a mean of zero and a standard deviation of one. It is positively skewed. It can be tested on categorical variables, which are variables that only falls into one category (male vs female etc.)
Chi-squared tests can be used when we:
1) Need to estimate how closely an observed distribution matches an expected one
2) need to estimate if two random variables are independent.
GOODNESS OF FIT:
When you have one independent variable, and you want to compare and observed frequency to a theoretical. For example, does age and car accidents have a relation?
H0: no relation between age and var accidents
HA: There is a relation between age and car accidents
Chi-Squared value that’s greater than our critical value implies that there is a relation between age and car accident, hence reject the hull hypothesis. It means that there most likely is a relation, but does not tell us how large that relation is.
Another example is if you flip a coin 100 times. You would expect it to get 50/50 head/tails. The further away from 50/50, the less goodness of fit.
Tests how well a sample of a data matches the known characteristics at the larger population that the sample is trying to represent. For example, the x^2 tells us how well the actual results from 100 coin flips compare to the theoretical model which assumes 50/50. The further away from 50/50, the less goodness of fit (and more likely to conclude that this is not a representative coin).
TEST FOR INDEPENDENCE:
Categorical data for two independent variables, and you want to see id there is an association between them.
Does gender have any significance on Driving test outcome? Is there a relation between student gender and course choice? Reasearcher collect data and compare the frequencies at which rate male and female students select among the different classes. The x^2 for independence tells us how likely it is that random chance can explain the observed difference.
P-value smaller than 0,05: Chi-square value bigger than critical: there is some relation in gender and driving test scores. Reject H0.
Normal Distribution
Normal Distribution is defined by its bell shape, where both sides are equally weighted. It has a mean of 0, a standard deviation of 1 and a kurtosis of 3. Normal distributions are symmetrical.
- bell shape
- both sides equally weighted
- mean of 0
- std of 1
- kurtosis of 3
- no skewness
Student t
Similar to the normal distribution which has […..]. The difference is that this one got heavier tails, or in other words, a greater kurtosis. This leads to more variance of Y from outliers.
F Distribution
If you have a chi-squared distributed variable and you divide it by another chi-squared variable, we have a F-distribution. The F-distribution is a probability density function that is used especially in analysis of variance and is a function of the ratio between two independent random variables each of which has a chi-square distribution and is divided by its number of degrees of freedom.
A intuitive explanation:
Lets say that you want to study a new vaccine. Group A gets put on 10 mg, B on 5 mg and C on placebo.
The Mean Square Error (MSE) is the variance of group A plus variance of group B plus variance of group C.
You take the variance for each group and find its mean. You then add the 3 means together and divide it by 3, to get the mean of the whole sample. This is the Mean Square Error (MSE).
You then find the Mean Square between the groups (MSB). This is found by multiplying each groups sample mean by the number of items in each group.
The F-statistics is then the ratio of these two variances: F = MSB/MSE.
The central limit theorem
The central limit theorem states that, under general conditions, the distribution of a sample approximates a normal distribution as the sample size becomes larger. Each variable themselves in the distribution can be random, but the more we add, the closer to normal distribution we get, and the closer we get to the real population distribution.
With higher N you have a higher probability of having normality and consistency. Consistency is the probability that the estimate is close to the true value when the sample increases
Heteroscedasticy
When the error term doesn’t have a constant variance.
The error term u is homoscedastic if the variance of the conditional distribution of u given X is constant and does not depend on X. Otherwise, the error therm is heteroskedastic.
Homosced: The error has a constant variance
Heterosced: The error has not a constant variance.
The distribution of the errors u is for various values of X. imagine a plot where the variance is large and one where it is small and compact.
The coefficient is unbiased. HETEROSCED LEADS TO BIASED STANDARD ERRORS, NOT UNBIASED COEFFICIENTS. Because of that, we can use clustered
Multicollinearity
When an independent variable is highly correlated with one or more of the other independent variables in a multiple regression equation. This can affect B1 and B2 etc. as we don’t get their real values.
Can use Volatility inflation factor to test if there is multicollinearity. A rule of thumb is that there is multicollinearity if VIF > 10. The solution to this problem is simply just to drop the variable
LEADS TO UNBIASED STANDARD ERRORS
P-value
P-value is the smallest significance level at which the null hypothesis could be rejected.
The p-value is the probability that we observe test results as extreme as the
ones we have observed, given that the null hypothesis is true
Binary dependent variable
Linear probability model (LPM), the predicted probability that y=1. Our dummy variables Is the dependent variable. One disadvantage is that LPM says that a change in the probability given a change in x is the same for all values of x.
PROBIT:
Standard normal cumulative distribution. Models the probability that y=1 using cumulative normal distribution. So we get the marginal probability depending on the value of X + a probability between 0 and 1.
LOGIT:
The probability that y=1 using cumulative standard logistic distribution. The Beta will be different from the PROBIT. The coefficient is given in log-odds form, which is the logarithm of the odds ratio.
It is common to look at both these models marginal effect, which is the effect on the dependent variable given a small change in the regressor.
MAXIMUM LIKELYHOOD (MLE). Likelihood function is the conditional density of y1..yn given x1….xn treated as a function of the unknown parameters, alpha and Beta. We maximize the probability of ebserving the data given the assumed model. The MLE value is the value of alpha, beta1…Bn that best describes the full distribution of the data. In large samples, the MLE will be consistent, normally distributed and efficient, as it has the smallest variance of all estimators.
MEASURES OF FIT
Logit, Probit.
R2 makes no sense here, not wven for LPM. So we use two other measures:
1.the fraction correctly predicted = fraction of y’s for which the predicted probability is > 50% when y is 1, or is <50% when y is 0.
2. the pseudo R2 (McFadden R2) measures the improvement in the value of the log likelihood, relative to having no x’s.
HYPOTHESIS TESTING
Use usual t-test and confidence intervals
For joint hypothesis, use likelihood ratio test that compares likelihood functions of the restricted and unrestricted models
What are the differences between a one-sided and a two-sided test?
Only use one-sided test when there is a clear reason for doing so. For example from economic theory or empirical evidence. Has more statistical pwer to detect an effect in one direction than a two-tailed test. Will occur when effects only can exist in one direction, or if the researchers only care about one direction (not recommended tho).
The difference lies in the alternative hypothesis. In one case you are testing if B1 is only greater or only lower than 0. In the other case, you are testing with the possibility of both scenarios.
Same nullhypothesis, different alterniative hypothesis. Construction of the t-statistics is the same. Only difference is how you interpret the t-statistics.
What is a two-tailed test and how do you perform it?
If we want to test if the mean is statistically and significantly equal to x, we can do a two-sided hypothesis. In other words, we want to test if B1 = 0. That gives us:
H0: B1 = 0
HA: B1 is not 0
Think of how normal distribution looks. If we use a significance level of 0.05 (or alpha = 0,05), the two tailed test will test the probability with an alpha of 0,025 on both tails.
- First compute the standard error of Y, which is an estimator of the standard deviation of the sampling distribution of Y.
- Compute the t-statistics
- Compute the p-value. P- value is the smallest significance level at which the null hypothesis could be rejected.
- Or use t-statistics: Reject H0 if the t-statistic is larger than absolute value of 1,96
Alternatively to the third step, you could compare the t-statistics to the critical value appropriate for the test with its significance level, that is the absolute value of 1,96 if you are testing on a 5%. Reject H0 if the t-statistic is larger than absolute value of 1,96.
What is a one-sided test and how to perform it?
In a one-sided test, the alternative hypothesis will be if B1 is either lower or if its higher than for example 0. A one-sided test should only be used when there is a clear reason of doing so. This reason can come from economic theory, your knowledge etc. You now test with an alpha of 0,05 on one tail. Not with 0,025 on each tail.
Confidence interval for a regression coefficient
For B1: the set of values that cannot be rejected using a two-sided hypothesis test with a 5% significance. An interval that has 95% proability of containing the true value of B1.
If we run a regression and create a confidence interval at 95%, the true population would be between the confidence interval 95% of the time.
When is it appropriate to do a Two-Sided Hypothesis?
- Testing hypothesis about the population mean
- Testing hypothesis about the slope B1
- Reporting regression equations and application to test scores
How to test when X is an Binary/Dummy variable
Because D can only take two values, there is no “line”, so no sense to talk about a slope. Therefore, we refer to B1 as the coefficient on D. The best way to consider the regression is to compare the regression when D = 0 to when D = 1. The tests are the same as with normal regressions. The exception is when B1 (the coefficient to D) is 0. That’s why we can test the nullhypothesis that B1 is 0. We divide B1 on its standard error to look at its t-statistics. If it is higher than 1,96, we think that it is not 0.
Explain R2, ESS, SSR
The R-squared (R^2) represents the proportion of the variance for a dependent variable. It ranges between 0 and 1, but will usually fall in some place in between the extreme values. R2 tells us how much of the variance of y that is explained by our regressors. When you add one regressor, R2 will always increase. Adjusted R2 will not always increase, as it penalizes adding more regressors.
If β_1=0, X explains nothing of the variance in Y, and R-squared should be 0.
The Explained Sum of Squares (ESS) is the sum of the differences between the predicted value and the mean of the dependent variable
The Sum of Squared Residuals (SSR) is the difference between the observed value and the predicted value. We can interpret the residual as the remaining, or the unexplained.
The Total Sum of Squares is the difference between the observed dependent variable, and its mean.
Explain what the standard error of the regression is
The standard error of the regression (S), also known as the standard error of the estimate, represents the average distance that the observed values fall from the regression line. The Standard Error of the Regression estimates the standard deviation of the regression error u. It does this by measuring the mean distance between the observed value, and the value on the regression line.
The Standard Error of the regression can be used to make confident intervals.
What is the Least Square Assumptions?
Assumption 1: The Error Term has Conditional Mean of Zero
No matter which value we chose for X, the error term u must not show any systematic pattern and must have a mean of 0. In other words, the OLS regression will on average be equal to zero. This assumption still allows for over and underestimations of Y, but the OLS estimates will fluctuate around Y’s actual value
Assumption 2: For all n are Independently and Identically Distributed
This is a statement of how the sample is drawn. All n need to be Independently Distributed. This means that the outcome of the first value in the sample, can not affect on another. The Distribution is random, and not one of the values are affecting one another – they have all their individual independence. Identically Distributed means that the probability of any specific outcome is the same. For Example, if you flip a coin 100 times, the probability of heads will always be 50/50, and will not change throughout the experiment.
Assumption 3: Large Outliers Are Unlikely
X and Y have finite kurtosis, as several outliers can give wrong estimations.
What is heteroscedasticity and homoscedasticity?
The error term u is homoscedastic if the variance of the conditional distribution of u given X is constant and does not depend on X. Otherwise, the error therm is heteroskedastic.
Homosced: The error has a constant variance
Heterosced: The error has not a constant variance.
The distribution of the errors u is for various values of X. imagine a plot where the variance is large and one where it is small and compact.
What is goodness of fit/measures of fit?
When you have estimated a linear regression, you will wonder how good the regression line describes the data. The R2 and the standard error measure how well the OLS regression line fits the data.
The R-squared (R^2) represents the proportion of the variance for a dependent variable. It ranges between 0 and 1, but will usually fall in some place in between the extreme values.
If β_1=0, X explains nothing of the variance in Y, and R-squared should be 0.
The Explained Sum of Squares (ESS) is the sum of the differences between the predicted value and the mean of the dependent variable
The Sum of Squared Residuals (SSR) is the difference between the observed value and the predicted value. We can interpret the residual as the remaining, or the unexplained.
The Total Sum of Squares is the difference between the observed dependent variable, and its mean.
What is the Standard error of the regression?
The Standard Error of the Regression estimates the standard deviation of the regression error u. It does this by measuring the mean distance between the observed value, and the value on the regression line.
The Standard Error of the regression can be used to make confident intervals.
SSR/(n-2)
n-2 because it is correcting a bias with two regressor coefficients that were estimated (β_1 and β_0 )
What are the Least squares assumptions?
Assumption 1: The Error Term has Conditional Mean of Zero
No matter which value we chose for X, the error term u must not show any systematic pattern and must have a mean of 0. In other words, the OLS regression will on average be equal to zero. This assumption still allows for over and underestimations of Y, but the OLS estimates will fluctuate around Y’s actual value
In AM Football, the score is given by: Score = 6 * Touchdown + 1Extrapoints + 3 Field Goals +2safeties
If you ran the regression: Score = b1 * Touchdown + b2fieldgoals + e, b1 would be larger than the value of 6. The error term contains biases.
Assumption 2: For all n are Independently and Identically Distributed
This is a statement of how the sample is drawn.
All n need to be Independently Distributed. This means that the outcome of the first value in the sample, can not affect on another. The Distribution is random, and not one of the values are affecting one another – they have all their individual independence.
Identically Distributed means that the probability of any specific outcome is the same. For Example, if you flip a coin 100 times, the probability of heads will always be 50/50, and will not change throughout the experiment.
If the sampling is random, then it is representative for the population. As an example, you don’t only go to Texas if you want to research the average American income.
Assumption 3: Large Outliers Are Unlikely
X and Y have finite kurtosis, as several outliers can give wrong estimations. Large outliers will mess up our distribution and make OLS misleading
How can you get rid of outliers?
Winzoring: Transforming extreme values.
Triming: deleting extreme values
For example, you can winzorize/trim at 5% interval. This will affect 2,5% of the empirical distribution to the right and left.
Properties of Beta0 and Beta1
UNBIASED:
The expectation of beta0 is the real beta0 and the expectation of beta1 is the real beta1. The estimated coefficient might have a smaller or larger value, much depending on our sample size. However, on average, they will be equal to its true value (relationship between x and y).If 10 people estimate B1, they all will probably miss my some value. If 100 more does it we get closer to the real value. And so it goes on.
CONSISTENCY:
The probability that the estimate is close to the true population will increase by increasing the sample size.
NORMALITY:
If n Is large enough, we can use critical values for a normal distribution
What are the assumptions in multiple regression?
1: Error Term has a conditional mean of zero
2: I.I.D
3: Large outliers unlikely
4: No perfect multicollinearity
Perfect multicollinearity is when one of the regressors is an exact linear function of the other regressors. This includes a variable that is included twice in the regression, or a dummy variable trap. Perfect multicollinearity occurs if two or more regressors are perfectly correlated. In reality, we will not often see two regressors that are perfectly correlated. That is why it most often occurs from the dummy trap or by including the same regressor twice. Can use Volatility inflation factor to test if there is multicollinearity. A rule of thumb is that there is multicollinearity if VIF > 10. The solution to this problem is simply just to drop the variable
What is multicollinearity, and how can we test for it?
Perfect multicorr uccurs if two or more regressors are perfectly correlated. In reality, we will not often see two regressors that are perfectly correlated. That is why it most often occurs from the dummy trap or by including the same regressor twice. Can use Volatility inflation factor to test if there is multicollinearity. A rule of thumb is that there is multicollinearity if VIF > 10. The solution to this problem is simply just to drop the variable
What are the problems and solutions with heteroscedasticy?
PROBLEM:
- Coefficients are unbiased and consistent
- Standard errors are biased
- OLS t statistic does not follow a t distribution
- (Fail to) reject H0 too often or not often enough
SOLUTION
- Use heteroskedasticity robust standard errors
- Prudent to assume errors are heteroskedasticity unless there is a compelling reason
Implementation see Lab example