Pensum Flashcards
What are the steps in empirical analysis?
- Careful formulation for each research question (RQ)
- Construct economic model
- Turn into econometric model
You now have an econometric model. The model is a outcome of your variable coiches, hypothesis development, data gathering and estimation of model parameters
What is Cross-Sectional data?
Cross-sectional data is a sample on different entities, for example firms, households, companies, cities, states, and countries that are observed at a given point in time or in a given period.
What is Time-series data?
Data for a single entity (firms, households, companies, cities, states, countries) collected at multiple time periods.
What is Panel Data?
Also called longitudinal data, are data for multiple entities in which each entity is observed at two or more periods.
Balanced & Unbalanced.
Pooled OLS: If individual effect does not exist. Does not take time-specific effects and variation across entities into account
Fixed Effects: 1) Control for unobserved variables that vary across entities but not over time, and 2) time specific effects that don’t vary across entities.
Can control for biases that control across entities, but not over time. For example, if you are analyzing Norwegian exports to the EU region, this variable can control for the French price sensitivity, which might be different to Polands.
You can also control for time specific effects. From the last example, if there EU issues a law, this will affect all of the buyers (not vary across entities).
Random Effects:
What is the difference between cross-sectional, time-series and panel data?
Cross sectional data consists of multiple entities observed at a single time period.
Time-series data consists of a single entity observed at multiple time-periods
Panel data consists of multiple entities over two or more time-periods
Describe a simple regression model
OLS chooses the regression coefficients with estimates that are as close as possible to the observed data, where closeness is measured by the sum of squares mistakes made in the predicting Y given X. It gives us estimates and extends this to a linear regression.
y is the linear regression model with one single regression, in which y is the dependent variable and X is the independent variable or the regressor. The first part of the equation, B0 + B1*x is the population regression line or the population regression function. This is the relationship that holds between y and x on average, over the population.
The intercept β_0 and the slope β_1 are the coefficients of the population regression line, also known as the parameters of the population regression line. u is the error term. In context, u is the difference between y and its predicted value.
B1 MEASURES THE MARGINAL EFFECT ON Y FOR A UNIT CHANGE IN X
How do you estimate the coefficients in OLS?
Finding the OLS is about finding the predicted value of Y which minimizes the total squared estimation mistakes. This is also called an estimator. An estimator is a function of a sample of data to be drawn randomly form a population. Given estimates β ̂_0,β ̂_1 of β_0,β_1, we can predict y with y ̂
What is a linear model?
Linear model means that the change in y is independent from the level of x
Explain Standard Deviation and Variation
Both Standard Deviation and Variation measures the “spread” of s probability distribution. The Variation is measured in squared unites, while standard deviation is the square root of this number.
What is Casual Effect
Casualty means that there is a specific action that leads to a specific measurable consequence. For example, there might be a correlation between people eating apples and car accidents. The correlation is probably random, and eating apples will probably not reduce the chance of getting in a car accident.
What is the difference between Experimental data and observational data
Experimental data comes from an experiment that is designed to investigate the casual effect. Observational data is obtained by measuring actual behaviour outside of an experiment.
Sample Space and events
Sample space is the set of all possible outcomes. An event is what gives the outcomes. So one event might have a huge sample space; lots of things can happen from that event
Probability Distribution of a random variable
The probability distribution lists all possible values for the variable and the probability that each value will occur. These probabilities sum to 1.
What is Joint probability and distribution
Joint probability is the probability of two events happening together (think venn-diagram). The joint distribution is the probability that X and Y take on certain values. Lets say that X is 1 when its raining and 0 when its not. Y is 1 when there is more than 10 degrees outside and 0 otherwise. The joint distribution of this is the probabilities of how these two scenarios happen, with 4 different outcomes. Each outcome has a probability and summed together they give a value of 1.
Marginal probability distribution
Just another name for its probability distribution. Term is used to distinguish the distribution of Y alone from the joint distribution.
Conditional Distribution
The distribution of a random variable Y conditional on another variable X taking on a specific value.
Conditional Expectation
The mean of the conditional distribution of Y given X
Law of iterated Expectations
The mean height of adults is the weighted average of the mean height for men and the mean height for women, weighted by the proportions of men and women. Mean of Y is the weighted average og the conditional expectation of Y given X.
Covariance
A measure to which extent two random variables move together.
What is the Standard Error in a regression?
The standard error of a regression estimates the standard deviation of the error term in the regression. it does this by measuring the mean distance between the observed value, and the value on the regression line
Kurtosis
Kurtosis is how much mass the distribution has in its tails, and is therefore a measure of how much of the variance of Y that arises from extreme values. Extreme values are called outliers. The greater the kurtosis of a distribution is, the more likely it is to have outliers.
The kurtosis of a distribution is a measure of how much mass is in its tails and therefore is a measure of how much of the variance of Y arises from extreme values. BTW: An extreme value of Y is called an outlier.
Skewness
Skewness can be quantified as a representation of the extent to which a given distribution varies from a normal distribution. A normal distribution has a skew of zero, represented with equal weight on each tail.
If you are measuring height, you might get a mean of 172 with the tails being equally weighted.
If you are measuring income for people working 100%, few people will have an income under 300K. From 300K to 600K, there will probably be a steep increase. From 600K and to infinity, there will be fewer and fewer people, and the curve will be less and less steep. This means that we get the “long tail” on the right side. “long tail” on right side can be called a “positive skew”, so we can say that the distribution is positively skewed.
If we have an easy exam, and a lot of people get A’s or B’s, we will have a negative skew. The long tail will be on the left side, and slowly increase until it hits C or B. From there it will go steeply up. ‘
I.I.D
Independent and Identically distributed
Independent: The result from one event does not have any impact on the other event. So if you roll two dices, the result you got on the first dice does not affect the sum you will get on the second.
Identically: if you flip a coin (heads/tails) each throw gives you a 50/50 chance. The probability does not change over time.
Chi-Squared
DISTRIBUTION:
The distribution is asymmetrical, with a mean of zero and a standard deviation of one. It is positively skewed. It can be tested on categorical variables, which are variables that only falls into one category (male vs female etc.)
Chi-squared tests can be used when we:
1) Need to estimate how closely an observed distribution matches an expected one
2) need to estimate if two random variables are independent.
GOODNESS OF FIT:
When you have one independent variable, and you want to compare and observed frequency to a theoretical. For example, does age and car accidents have a relation?
H0: no relation between age and var accidents
HA: There is a relation between age and car accidents
Chi-Squared value that’s greater than our critical value implies that there is a relation between age and car accident, hence reject the hull hypothesis. It means that there most likely is a relation, but does not tell us how large that relation is.
Another example is if you flip a coin 100 times. You would expect it to get 50/50 head/tails. The further away from 50/50, the less goodness of fit.
Tests how well a sample of a data matches the known characteristics at the larger population that the sample is trying to represent. For example, the x^2 tells us how well the actual results from 100 coin flips compare to the theoretical model which assumes 50/50. The further away from 50/50, the less goodness of fit (and more likely to conclude that this is not a representative coin).
TEST FOR INDEPENDENCE:
Categorical data for two independent variables, and you want to see id there is an association between them.
Does gender have any significance on Driving test outcome? Is there a relation between student gender and course choice? Reasearcher collect data and compare the frequencies at which rate male and female students select among the different classes. The x^2 for independence tells us how likely it is that random chance can explain the observed difference.
P-value smaller than 0,05: Chi-square value bigger than critical: there is some relation in gender and driving test scores. Reject H0.