Session 3 Flashcards
mean of the probability distribution
mu (u) true mean sample mean = x-bar
law of large numbers
as the number of observations drawn increases, the mean of x-bar (sample mean) eventually approaches the mean mu (true mean of population) as closely as you specified.
standard deviation =
= square root of the variance
variance
is the avg squared deviation of the values of the variable from their mean
3 assumptions on linear regression model under which OLS gives appropriate estimator
- conditional distribution of Ui given Xi has a mean of zero (other factors contained in Ui are unrelated to Xi – that is, given a value of Xi, the mean of the distribution of these other factors is 0). IN other words, the error term Ui has a condition mean zero given Xi:E(Ui|Xi) = 0. 2. (Xi, Yi), i = 1,….n. Are independently and identically distributed. 3. large outliers are unlikely in other words 1. OLS estimator is unbiased, b/c error term has conditional mean of 0 2. Xi and Yi are independent and indentically distributed 3. large outliers unlikely.
if the least squares assumptions hold then the OLS estimators of the slope and intercept are: _____, _______, and have a ______
if the least squares assumptions hold then the OLS estimators of the slope and intercept are: unbiased, consistent, and have a sampling distribution with a variance that is inversely proportional to the sample size n. OR, OLS estimators as unbiased, consistent, & normally distributed when the sample is large.
binary variable is also called ___ or ____
indicator or dummy variable example: male or female, urban or rural
The two conditions for OVB are:
The two conditions for OVB are: 1. X2 (or priGPA) is a determinant of Y 2. X2 (priGPA) and X1 (attend) are correlated
The formula for bias
The formula for bias α1-1 = γ1 * β2
A high R2 means that the regressors…… A high R2 does not mean that you have eliminated …. A high R2 does not mean that you have an unbiased… A high R2 does not mean that the included variables are ….
A high R2 means that the regressors explain the variation in Y. A high R2 does not mean that you have eliminated omitted variable bias. A high R2 does not mean that you have an unbiased estimator of a causal effect (1). A high R2 does not mean that the included variables are statistically significant – this must be determined using hypotheses tests.
linear log log linear log log
Linear Log: A 1% change in X is associated with a change in Y of 0.01β1 Log Linear: A change in X by 1 unit is associated with a 100β1% change in Y Log log:A 1% change in X is associated with a β1% change in Y (β1 is the elasticity with respect to X)
OLS Assumptions
- E(u|X = x) = 0. The conditional distribution of u given X has mean zero.
- (Xi,Yi), i =1,…,n, are i.i.d.
- X and u have four finite moments.
- Assumption 1 is violated. for any slice, the average of residuals = 0
- Assumption 2 is violated. you can see predictive cycles, there’s a cyclical pattern, likely seasonal data
- Assumption 1 is violated.
- Assumption 2 is violated - some trending happening
- Assumption 1 is violated.
Correction: Likely large outlier, could take log transformation on x or y axis. and bring down effect of outlier.