EXAM 2 Flashcards by Britney Ng

How do we draw a line through data to estimate the population slope?

Ordinary Least Squares

How well did you know this?

Not at all

Perfectly

What is the OLS Estimator

The OLS estimator minimizes the average squared difference b/w the actueal values of Y, and the prediction based on the estimated line

How well did you know this?

Not at all

Perfectly

Regression R^2
1. R^2=0 means ESS=0
2. R^2=1 means ESS=TSS
3. 0<=R^2<=1

for regression w/ a single X, R^2 is equal to the square of the correlation coefficient b/w X&Y

Measures the fraction of the variance of Y that is explained by X
1. is unitless
2. between 0 (no fit) and 1 (perfect fit)

How well did you know this?

Not at all

Perfectly

Standard Error of the Regression

Measures the magnitude of a typical regression residual in the units of Y
- measures the spread of the distribution of u
- units of u, which are the units of Y and measures the average size of the OLS residual

How well did you know this?

Not at all

Perfectly

TSS=ESS+SSR(sum of squared residuals)

R^2=TSS/ESS (total sum of squares/explained sum of squares)

R^2=1-SSR/TSS

How well did you know this?

Not at all

Perfectly

Root mean squared error (RMSE ) is closely related to the SER

How well did you know this?

Not at all

Perfectly

The Least Squares Assumptions
1. The conditional distribution of ui, given Xi, has the mean = zero
2. (Xi,Yi), i=1,2,3,…n are iid
3. Large outliers in X and/or Y are rare

How well did you know this?

Not at all

Perfectly

The conditional distribution of ui, given Xi, has the mean = zero

Failure of this leads to omitted variable bias
- means if there is a omitted variable that correlations with the equation, the condition fails and there is OV bias

Is equivalent to assuming that the population regression line is the conditional mean of Yi given Xi

Because X is assigned randomly, all other individual characteristics–the things that make up u– are distributed independently of X , so u and X are independent.
Thus, in an ideal randomized controlled experiment, E [ui |Xi ] = 0

In actual experiments, or with observational data, we will need to think hard about whether E [ui |Xi] = 0 holds.

How well did you know this?

Not at all

Perfectly

(Xi,Yi), i=1,2,3,…n are iid

Assumption automatically applies if sampled by random sampling because chosen from same location
- they’re selected at random so the values are independently distributed

We can expect to encounter non-i.i.d. data when information is
recorded over time for the same entity (panel data and time series
data)

How well did you know this?

Not at all

Perfectly

Large Outliers are rare

assuming that Xi and Yi have nonzero
finite fourth moments, i.e., 0 < E [X 4
i ] < ∞ and 0 < E [Y 4
i ] < ∞, or
in other words, that the distributions of Xi and Yi have finite kurtosis - outliers are often data glitches (coding or recording
problems). Sometimes they are observations that really shouldn’t be
in your data set

How well did you know this?

Not at all

Perfectly

R^2

This measures the variance of Y that is explained by X

How well did you know this?

Not at all

Perfectly

SER- standard error of the regression

Measures the magnitude of a typical regression residual in the units of Y.

How well did you know this?

Not at all

Perfectly

t= (estimator- hypothesized value)/
(SE of estimator)

t= (average of Y- mean y) / (Sy/sqrt(n))

How well did you know this?

Not at all

Perfectly

95% Confidence Interval

the set of points that cannot be rejected at the 5% significance level;
a set-valued function of the data (an interval that is a function of the
data) that contains the true parameter value 95% of the time in
repeated samples.

How well did you know this?

Not at all

Perfectly

Homoskedastic

If variance of the conditional distribution of u given X doesn’t depend on X

E[u|x]=0

How well did you know this?

Not at all

Perfectly

Heteroskedastic

Study These Flashcards

If variance of the conditional distribution of u given X does depend on X
V [ui |Xi = x] changes with x

Homoskedastic- only SE are valid if errors are homoskedastic

Study These Flashcards

Both homoskedasticity and heteroskedasticity formula differs, and will get different SE

Usual SE are heteroskedasticity- robust SE because they’re valid whether or not errors are heteroskedastic

Study These Flashcards

Advantages of homoskedasticity

Study These Flashcards

Equation is simpler

Disadvantage:
- formula is only correct if errors are homoskedastic

Homoskedasticity

Study These Flashcards

## default setting in regression software

Population vs Sample Parameter

Study These Flashcards

A parameter is a measure that describes the whole population. A statistic is a measure that describes the sample.

Slope in population regression line

Study These Flashcards

Expected effect of on Y of a unit change in X

Regression Error

Study These Flashcards

consists of omitted factors that affect measurement of Y, and error in measuring Y

Omitted Variable Bias

Study These Flashcards

Variables that are omitted but causes bias in the OLS estimator

Causality

effect measured in ideal randomized controlled experiment

IDEAL RANDOMIZED CONTROLLED EXPERIMENT

Ideal: subjects all follow the treatment protocol –perfect compliance, no errors in reporting, etc. Randomized: subjects from the population of interest are randomly assigned to a treatment or control group (so there are no confounding factors). Controlled: permits measuring the differential effect of the treatment. Experiment: the treatment is assigned as part of the experiment: the subjects have no choice, so there is no “reverse causality” in which subjects choose the treatment they think will work best.

Three ways to overcome omitted variable bias

1. Run a randomized control experiment in which STR is randomly assigned 2. Adopt cross tabulation approach - data tables that present the results of the entire group of respondents, as well as results from subgroups of survey respondents. 3. Use regression in which omitted variable is no longer omitted

Adjusted R^2

Penalized you for including another regressor - this is because R^2 always increases when adding another regressor ---------------------- adjusted R^2< R^2 values are close when n is larger

4. There is no perfect multicollinearity - Perfect collinearity is when one of the regressors is an exact linear function of other regressors - accidentally include same variable twice

Imperfect multicollinearity occurs when two or more regressors are very highly correlated. - If two regressors are very highly correlated, then their scatterplot will pretty much look like a straight line –they are “colinear” –but unless the correlation is exactly 1 or −1, that collinearity is imperfect. Imperfect multicollinearity implies that one or more of the regression coefficients will be imprecisely estimated

imperfect multicollinearity - results in large standard errors for one or more of the OLS coefficients

p value > alpha pvalue

p value > alpha ACCEPT NULL HYPOTHESIS pvalue

Joint Hypothesis

specifies a value for 2 or more coefficients; imposes a restriction on 2+ coefficients

How to test join hypothesis

F STATISTICS

F-statistics

1. Large when t1 and t2 is large when n is large F=.5(t1+t2)

Chi Squared Distribution

q degrees of freedom

Control Variable W

Correlated with + controls for; an omitted causal factor in regression of Y on X, but doesn't have causal effect on Y

Three interchangeable statements about effective control variable

1. when included in the regression, makes the error term uncorrelated with the variable of interest. 2 Holding constant the control variable(s), the variable of interest is “as if” randomly assigned. 3 the variable of interest is uncorrelated with the omitted determinants of Y

When control variables are included, the LSA (1) E [ui |X1,i , ..., XK ,i ] = 0 must not hold

Conditional mean independence

Given the control variable, the mean of ui doesn't depend on variable of interest E [ui |Xi , Wi ] = E [ui |Wi ]

EXAM 2 Flashcards

(39 cards)