OLS Flashcards

Question 1

Q

What is OLS + Describe a simple Regression model

Answer

A

OLS

Chooses reg coeff that are close as possible to observed data
If you imagine a scatterplot with different values, OLS will draw a regression line where it gets the least sum of squares

B0 + B1x = population regression line, the relationship that holds between y and x on average
B0 + B1 = coefficients of the population regression line
B1 Measures the marginal effect on Y for a unit change in X
u – difference between y and its predicted value

Question 2

Q

How do you estimate the coefficients in OLS?

Answer

A

Finding the OLS is about finding the predicted value of Y which minimizes the total squared estimation mistakes. This is also called an estimator. An estimator is a function of a sample of data to be drawn randomly form a population. Given estimates β ̂_0,β ̂_1 of β_0,β_1, we can predict y with y ̂.

Question 3

Q

What is a linear model?

Answer

A

Linear model means that the change in y is independent from the level of x

Question 4

Q

What is the Least Square Assumptions?

Answer

A

Assumption 1: The Error Term has Conditional Mean of Zero

Error term must not show any systematic pattern
Cannot have omitted variable biases

Assumption 2: For all n are Independently and Identically Distributed

Independently: The variables are independent from each other. The variables does not carry information of each other. If you roll two dices, the value you get on the first dice does not affect the value you get on the second.
Identically Distributed: Each variable in the observation is has the same probability distribution. If you have a deck of cards, the probability of drawing a diamond king is 1 in 52. All of the participants has 1 in 52 chance of drawing a king.

Main: If you flip a coin 100 times, the probability of getting heads/tails will be 50/50 for every throw (coin has no memory), so it is “Indepentent”. The probability for every throw stays the same, so it is “Identical”

Assumption 3: Large Outliers Are Unlikely
X and Y have finite kurtosis, as several outliers can give wrong estimations.

Question 5

Q

What is a type 2 error?

Answer

A

Failure to reject the null when the alternative is true

Question 6

Q

What is meant by asymptotic normality?

Answer

A

The sampling distribution of a properly normalized estimator converges to the
standard normal distribution

Question 7

Q

What is meant by asymptotic efficiency?

Answer

A

For consistent estimators, with asymptotically normal distributions, the
estimator with the smallest asymptotic variance

Question 8

Q

Underlying assumptions of regression analysis

Answer

A

Key assumptions:

Consistency: As the sample size increases, the estimates produced by the estimator
“converge” to the true value of the parameter being estimated. Increasing the sample size is
allowed because it is seen as increasing “n” closer and closer to the true population size.

Unbiasedness: A statement about the expected value of the sampling distribution of the
estimator. Is not affected by size. Only satisfied when the SR/MR 1-5 are satisfied = Gauss
Markov theorem = BLUE estimator. Unbiased equals “forventningsrett” estimator!!

Efficiency: An estimator is efficient if it gets closer to the true parameter more often than
other estimators (i.e. it has lower variance around the true parameter) -> BLUE. If BLUE,
there exist no other estimators that better explains the true population.

Linearity assumption: Linear in parameters: Linear in 𝛼

Question 9

Q

Normality

Answer

A

sed to determine if a data set is well-modeled by a normal distribution and to compute how likely it is for a random variable underlying the data set to be normally distributed.

Question 10

Q

Consistency

Answer

A

As the sample size increases, the estimates produced by the estimator
“converge” to the true value of the parameter being estimated. Increasing the sample size is
allowed because it is seen as increasing “n” closer and closer to the true population size.

Question 11

Q

Unbiasedness

Answer

A

: A statement about the expected value of the sampling distribution of the
estimator. Is not affected by size. Only satisfied when the SR/MR 1-5 are satisfied = Gauss
Markov theorem = BLUE estimator. Unbiased equals “forventningsrett” estimator!!

Question 12

Q

Efficiency

Answer

A

An estimator is efficient if it gets closer to the true parameter more often than
other estimators (i.e. it has lower variance around the true parameter) -> BLUE. If BLUE,
there exist no other estimators that better explains the true population.

Question 13

Q

What are the Least squares assumptions?

Answer

A

Assumption 1: The Error Term has Conditional Mean of Zero

No matter which value we chose for X, the error term u must not show any systematic pattern and must have a mean of 0. In other words, the OLS regression will on average be equal to zero. This assumption still allows for over and underestimations of Y, but the OLS estimates will fluctuate around Y’s actual value

In AM Football, the score is given by: Score = 6 * Touchdown + 1Extrapoints + 3 Field Goals +2safeties
If you ran the regression: Score = b1 * Touchdown + b2fieldgoals + e, b1 would be larger than the value of 6. The error term contains biases.

Assumption 2: For all n are Independently and Identically Distributed

This is a statement of how the sample is drawn.

All n need to be Independently Distributed. This means that the outcome of the first value in the sample, can not affect on another. The Distribution is random, and not one of the values are affecting one another – they have all their individual independence.

Identically Distributed means that the probability of any specific outcome is the same. For Example, if you flip a coin 100 times, the probability of heads will always be 50/50, and will not change throughout the experiment.

If the sampling is random, then it is representative for the population. As an example, you don’t only go to Texas if you want to research the average American income.

Assumption 3: Large Outliers Are Unlikely

X and Y have finite kurtosis, as several outliers can give wrong estimations. Large outliers will mess up our distribution and make OLS misleading

Question 14

Q

What are the assumptions in multiple regression?

Answer

A

1: Error Term has a conditional mean of zero
2: I.I.D
3: Large outliers unlikely
4: No perfect multicollinearity
Perfect multcollinearity is when one of the regressors is an exact linear function of the other regressors. This includes a variable that is included twice in the regression, or a dummy variable trap. Perfect multicorr uccurs if two or more regressors are perfectly correlated. In reality, we will not often see two regressors that are perfectly correlated. That is why it most often occurs from the dummy trap or by including the same regressor twice. Can use Volatility inflation factor to test if there is multicollinearity. A rule of thumb is that there is multicollinearity if VIF > 10. The solution to this problem is simply just to drop the variable

Question 15

Q

What are the Gauss-Markov assumptions?

Answer

A

Parameters are linear
IID
No perfect multicorr
Error term zero mean
Homosked:
No autocorr

Question 16

Q

What does it mean by OLS estimates can be shown to be the best linear unbiased estimates (BLUE)?

Answer

Study These Flashcards

A

E - Estimator - α(hat) and β(hat) are estimators of the true values of α and β

L - Linear - α(hat) and β(hat) are linear estimates - i.e.e the formulae gfor α(hat) and β(hat) are linear combinations of the random variables (Y and possibly X)

U - Unbiased - on avaerage, the actual values of α(hat) and β(hat) will be equal to their true values

B - Best - the OLS estimator β(hat) has minimum variance among the class of linear unbiased estimaors: Gauss-Markov theorem proves that the OLS estimator is best by examining an arbitary alternative linear unbiased estimator and showing in all cases that it must have a variance no smaller than the OLS estimator

Question 17

Q

Why is regression analysis useful?

Answer

Study These Flashcards

A

quantify models.
provides causal effect or relationship between ….
eco theory rarely gives precise values, so better to turn to econometrics and reg analysis
the causal effect between variables can be quantified and evaluated
key toolkit for scientists

Question 18

Q

.

Answer

Study These Flashcards

A

.

Question 19

Q

How can the OLS estimates be calculated?

Answer

Study These Flashcards

A

Calculate the slope coefficient as the ratio of the sample covariance of X and Y to the sample variance of X –> solve for β(hat)
Calculate the intercept using the property that the regression line passes through the sample means of the data (X(bar) and Y(bar) –> solve for α(hat)