Reg review Flashcards

Question 1

Q

what is the standard error?

Answer

A

The estimated standard deviation of the sampling distribution of the slope parameter, which tells us how precise our estimate is.

Question 2

Q

what is the p-value?

Answer

A

The smallest significance level at which we would reject the null hypothesis.

Question 3

Q

What is a confidence interval?

Answer

A

Over repeated sampling, we would expect 95% of confidence intervals constructed in this manner to contain the true population parameter.

Question 4

Q

What does it mean for a coefficient to be unbiased?

Answer

A

If an estimator is unbiased, then the mean of the sampling distribution of the estimate should be centered on the true population parameter (E(beta1_hat) = beta1_population)

Question 5

Q

Definition of beta1 coefficient?

Answer

A

Beta1_hat is the slope of y with respect to x when all other regressors are held constant, or fixed, a one-unit change in X is associated with a beta1_hat change in Y, holding all else constant.

Question 6

Q

What is an endogenous regressor?

Answer

A

correlated with the error term, or correlated with Y through the error term

Question 7

Q

What is an exogenous regressor?

Answer

A

uncorrelated with error and has a direct impact on Y, should be included to avoid OVB.

Question 8

Q

What is consistency?

Answer

A

Betahat is a consistent estimator of betapop, as N approaches infinity, betahat converges in probability to betapop (a.k.a. asymptotic unbiasedness, large samples property)

Question 9

Q

What is the law of large numbers?

Answer

A

Our estimates of the population mean and variance will converge in probability to the true population parameters.

Question 10

Q

What is the CLT?

Answer

A

As N approaches infinity, the sampling distribution will be normally distributed.

Question 11

Q

What are the Guass-markov assumptions for MLR and which are needed for unbiasedness and consistency? Which are needed to be BLUE?

Answer

A

Gauss Markov Assumptions for MLR: (1 – 4 for unbiasedness, 1 – 5 for BLUEs (best linear unbiased estimators))

Linear in parameters
Random sampling (independent and identically distributed random variables)
No perfect collinearity (between any of the predictors, r < 1 between all regressors)
Zero conditional mean (the expected value of U conditional on all Xs is equal to zero)
Homoskedasticity (the variance of U conditional on all Xs is equal to the variance of U, sigma-squared)

Question 12

Q

Probability of type 1 error?

Question 13

Q

Probability of type 2 error?

Question 14

Q

What is r-squared?

Answer

A

Proportion of the sample variation in Y that is explained by X

Question 15

Q

Factors affecting sampling variances of OLS slope estimators

Answer

A

1) Error Variance: Take more stuff out of the error (make σ2 smaller); add more explanatory variables. As error variance in pop decreases, Var((β_j ) ̂) gets smaller.
2) Total Sample Variation: It is easier to estimate how xj affects y if we see more variation in xj (increase SSTj); increase SSTj by increasing the sample size.
3) As Rj2 gets bigger so does Var(b1). If xj is unrelated to all other independent variables, it is easier to estimate its ceteris paribus effect on y.

Question 16

Q

What happens when you have heteroskedasticity?

Answer

A

Variance formulas for OLS invalid, does not affect beta coefficients; cannot perform F/t-tests.

Question 17

Q

In small samples ^B1 = (B1 + ^cov(x,u)/^var(u))

What can we do to show that zero conditional mean assumption holds here?

Answer

A

the cov of x and u is equal to 0 in our sample.

Question 18

Q

Formula for B1 coefficient?

Answer

A

cov(x,y)/var(x)

Question 19

Q

Formula for B0 coefficient?

Answer

A

E(y)-B1*E(x)

Question 20

Q

Formula for SST?

Answer

A

∑(actual xj – x ̅j)2)/n

Question 21

Q

Formula for var(Bj), SLR?

Answer

A

σ^2/SSTx

Question 22

Q

Formula for var(Bj), MLR? (expressed in terms of sigma squared)

Answer

A

σ^2/(SST_j *(1-R_j^2))

Question 23

Q

Formula for standard error of Bj, SLR?

Answer

A

√(SSR/(n-2))/√SSTx

Question 24

Q

Formula for standard error of Bj, MLR?

Answer

A

√(SSR/df)/√(SST_j (1-R_j^2))

Question 25

Q

Variance of regression?

Question 26

Q

RMSE (standard error of regression)?

Answer

A

√(SSR/df)

Question 27

Q

How to calculate residual?

Answer

A

yi - ^yi (actual y – predicted y)

Question 28

Q

How to calculate SST?

Answer

A

∑(actual y – avg y)2

Question 29

Q

How to calculate SSE?

Answer

A

∑(predict y – avg y)2

Question 30

Q

How to calculate SSR?

Answer

A

∑ (actual y – predict y)2 or ∑u2

Question 31

Q

How to calculate R squared?

Answer

A

SSE/SST or 1 – SSR/SST

Question 32

Q

How to calculate correlation given covariance and variances?

Answer

A

(Cov(X,Y))/√(Var(X)Var(Y))

Question 33

Q

Formula for F statistic (SS form?)

Answer

A

((SSRr-SSRur)/q/

SSRur/(n-k-1)

Question 34

Q

Formula for F statistic (R squared form?)

Answer

A

((R^2r-R^2ur)/q/

1-R^2ur)/(n-k-1

Question 35

Q

Change in y with respect to change in x when you have a quadratic? (B0+B1x1+B2x2^2+u)

Answer

A

change in y/change in x=

B1+2*B2x

Question 36

Q

What are advantages of using LPM (OLS) for a binary DV?

Answer

A

easy estimation and interpretation of coeffs. That are reasonably good

Question 37

Q

Disadvantages of using LPM (OLS) for a binary dv?

Answer

A

predicted probs. May be greater than 1 or less than zero (Marginal probability effects sometimes logically impossible)

Partial effects of explanatory variables are constant

LPM is heteroskedastic -have to use OLS with robust standard errors

Non-normality of errors (are binomial)

Question 38

Q

True or false: Partial effects from logit models are nonlinear and depend on the level of x

Question 39

Q

What values do researchers typically hold other covariates at when generating marginal effects?

Answer

A

holds all other covariates at their means (which might not make sense for dummy variables)

Question 40

Q

What does MLE do?

Answer

A

use a likelihood function, which gives us the liklehood of the data given a set of proposed parameters

Computer keeps running iterations of the above until the improvements in the liklehood are small; process called “convergence”

Question 41

Q

What distribution does logit model use for hypothesis testing? What kind of stat will you get?

Answer

A

Use normal distribution; report z-statistics

Question 42

Q

How can you measure goodness of fit from logit model?

Answer

A

Percent correctly specified

Pseudo r-squared

Chi-square test (like f-test, tests null that all coeff. Are zero)