Reg review Flashcards
what is the standard error?
The estimated standard deviation of the sampling distribution of the slope parameter, which tells us how precise our estimate is.
what is the p-value?
The smallest significance level at which we would reject the null hypothesis.
What is a confidence interval?
Over repeated sampling, we would expect 95% of confidence intervals constructed in this manner to contain the true population parameter.
What does it mean for a coefficient to be unbiased?
If an estimator is unbiased, then the mean of the sampling distribution of the estimate should be centered on the true population parameter (E(beta1_hat) = beta1_population)
Definition of beta1 coefficient?
Beta1_hat is the slope of y with respect to x when all other regressors are held constant, or fixed, a one-unit change in X is associated with a beta1_hat change in Y, holding all else constant.
What is an endogenous regressor?
correlated with the error term, or correlated with Y through the error term
What is an exogenous regressor?
uncorrelated with error and has a direct impact on Y, should be included to avoid OVB.
What is consistency?
Betahat is a consistent estimator of betapop, as N approaches infinity, betahat converges in probability to betapop (a.k.a. asymptotic unbiasedness, large samples property)
What is the law of large numbers?
Our estimates of the population mean and variance will converge in probability to the true population parameters.
What is the CLT?
As N approaches infinity, the sampling distribution will be normally distributed.
What are the Guass-markov assumptions for MLR and which are needed for unbiasedness and consistency? Which are needed to be BLUE?
Gauss Markov Assumptions for MLR: (1 – 4 for unbiasedness, 1 – 5 for BLUEs (best linear unbiased estimators))
- Linear in parameters
- Random sampling (independent and identically distributed random variables)
- No perfect collinearity (between any of the predictors, r < 1 between all regressors)
- Zero conditional mean (the expected value of U conditional on all Xs is equal to zero)
- Homoskedasticity (the variance of U conditional on all Xs is equal to the variance of U, sigma-squared)
Probability of type 1 error?
alpha
Probability of type 2 error?
beta
What is r-squared?
Proportion of the sample variation in Y that is explained by X
Factors affecting sampling variances of OLS slope estimators
1) Error Variance: Take more stuff out of the error (make σ2 smaller); add more explanatory variables. As error variance in pop decreases, Var((β_j ) ̂) gets smaller.
2) Total Sample Variation: It is easier to estimate how xj affects y if we see more variation in xj (increase SSTj); increase SSTj by increasing the sample size.
3) As Rj2 gets bigger so does Var(b1). If xj is unrelated to all other independent variables, it is easier to estimate its ceteris paribus effect on y.
What happens when you have heteroskedasticity?
Variance formulas for OLS invalid, does not affect beta coefficients; cannot perform F/t-tests.
In small samples ^B1 = (B1 + ^cov(x,u)/^var(u))
What can we do to show that zero conditional mean assumption holds here?
the cov of x and u is equal to 0 in our sample.
Formula for B1 coefficient?
cov(x,y)/var(x)
Formula for B0 coefficient?
E(y)-B1*E(x)
Formula for SST?
∑(actual xj – x ̅j)2)/n
Formula for var(Bj), SLR?
σ^2/SSTx
Formula for var(Bj), MLR? (expressed in terms of sigma squared)
σ^2/(SST_j *(1-R_j^2))
Formula for standard error of Bj, SLR?
√(SSR/(n-2))/√SSTx
Formula for standard error of Bj, MLR?
√(SSR/df)/√(SST_j (1-R_j^2))
Variance of regression?
SSR/(df)
RMSE (standard error of regression)?
√(SSR/df)
How to calculate residual?
yi - ^yi (actual y – predicted y)
How to calculate SST?
∑(actual y – avg y)2
How to calculate SSE?
∑(predict y – avg y)2
How to calculate SSR?
∑ (actual y – predict y)2 or ∑u2
How to calculate R squared?
SSE/SST or 1 – SSR/SST
How to calculate correlation given covariance and variances?
(Cov(X,Y))/√(Var(X)Var(Y))
Formula for F statistic (SS form?)
((SSRr-SSRur)/q/
SSRur/(n-k-1)
Formula for F statistic (R squared form?)
((R^2r-R^2ur)/q/
1-R^2ur)/(n-k-1
Change in y with respect to change in x when you have a quadratic? (B0+B1x1+B2x2^2+u)
change in y/change in x=
B1+2*B2x
What are advantages of using LPM (OLS) for a binary DV?
easy estimation and interpretation of coeffs. That are reasonably good
Disadvantages of using LPM (OLS) for a binary dv?
predicted probs. May be greater than 1 or less than zero (Marginal probability effects sometimes logically impossible)
Partial effects of explanatory variables are constant
LPM is heteroskedastic -have to use OLS with robust standard errors
Non-normality of errors (are binomial)
True or false: Partial effects from logit models are nonlinear and depend on the level of x
True
What values do researchers typically hold other covariates at when generating marginal effects?
holds all other covariates at their means (which might not make sense for dummy variables)
What does MLE do?
use a likelihood function, which gives us the liklehood of the data given a set of proposed parameters
Computer keeps running iterations of the above until the improvements in the liklehood are small; process called “convergence”
What distribution does logit model use for hypothesis testing? What kind of stat will you get?
Use normal distribution; report z-statistics
How can you measure goodness of fit from logit model?
Percent correctly specified
Pseudo r-squared
Chi-square test (like f-test, tests null that all coeff. Are zero)