EXAM 2 Flashcards

1
Q

How do we draw a line through data to estimate the population slope?

A

Ordinary Least Squares

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the OLS Estimator

A

The OLS estimator minimizes the average squared difference b/w the actueal values of Y, and the prediction based on the estimated line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Regression R^2
1. R^2=0 means ESS=0
2. R^2=1 means ESS=TSS
3. 0<=R^2<=1

  • for regression w/ a single X, R^2 is equal to the square of the correlation coefficient b/w X&Y
A

Measures the fraction of the variance of Y that is explained by X
1. is unitless
2. between 0 (no fit) and 1 (perfect fit)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Standard Error of the Regression

A

Measures the magnitude of a typical regression residual in the units of Y
- measures the spread of the distribution of u
- units of u, which are the units of Y and measures the average size of the OLS residual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

TSS=ESS+SSR(sum of squared residuals)

R^2=TSS/ESS (total sum of squares/explained sum of squares)

A

R^2=1-SSR/TSS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Root mean squared error (RMSE ) is closely related to the SER

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The Least Squares Assumptions
1. The conditional distribution of ui, given Xi, has the mean = zero
2. (Xi,Yi), i=1,2,3,…n are iid
3. Large outliers in X and/or Y are rare

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  1. The conditional distribution of ui, given Xi, has the mean = zero

Failure of this leads to omitted variable bias
- means if there is a omitted variable that correlations with the equation, the condition fails and there is OV bias

A

Is equivalent to assuming that the population regression line is the conditional mean of Yi given Xi

  1. Because X is assigned randomly, all other individual characteristics–the things that make up u– are distributed independently of X , so u and X are independent.
  2. Thus, in an ideal randomized controlled experiment, E [ui |Xi ] = 0

In actual experiments, or with observational data, we will need to think hard about whether E [ui |Xi] = 0 holds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  1. (Xi,Yi), i=1,2,3,…n are iid
A

Assumption automatically applies if sampled by random sampling because chosen from same location
- they’re selected at random so the values are independently distributed

We can expect to encounter non-i.i.d. data when information is
recorded over time for the same entity (panel data and time series
data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
  1. Large Outliers are rare
A

assuming that Xi and Yi have nonzero
finite fourth moments, i.e., 0 < E [X 4
i ] < ∞ and 0 < E [Y 4
i ] < ∞, or
in other words, that the distributions of Xi and Yi have finite kurtosis - outliers are often data glitches (coding or recording
problems). Sometimes they are observations that really shouldn’t be
in your data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

R^2

A

This measures the variance of Y that is explained by X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

SER- standard error of the regression

A

Measures the magnitude of a typical regression residual in the units of Y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

t= (estimator- hypothesized value)/
(SE of estimator)

A

t= (average of Y- mean y) / (Sy/sqrt(n))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

95% Confidence Interval

A
  • the set of points that cannot be rejected at the 5% significance level;
  • a set-valued function of the data (an interval that is a function of the
    data) that contains the true parameter value 95% of the time in
    repeated samples.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Homoskedastic

A

If variance of the conditional distribution of u given X doesn’t depend on X

E[u|x]=0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Heteroskedastic

A

If variance of the conditional distribution of u given X does depend on X
V [ui |Xi = x] changes with x

17
Q

Homoskedastic- only SE are valid if errors are homoskedastic

-

A

Both homoskedasticity and heteroskedasticity formula differs, and will get different SE

18
Q

Usual SE are heteroskedasticity- robust SE because they’re valid whether or not errors are heteroskedastic

A
19
Q

Advantages of homoskedasticity

A

Equation is simpler

Disadvantage:
- formula is only correct if errors are homoskedastic

20
Q

Homoskedasticity

A
  • ## default setting in regression software
21
Q

Population vs Sample Parameter

A

A parameter is a measure that describes the whole population. A statistic is a measure that describes the sample.

22
Q

Slope in population regression line

A

Expected effect of on Y of a unit change in X

23
Q

Regression Error

A

consists of omitted factors that affect measurement of Y, and error in measuring Y

24
Q

Omitted Variable Bias

A

Variables that are omitted but causes bias in the OLS estimator

25
Q

Causality

A

effect measured in ideal randomized controlled experiment

26
Q

IDEAL
RANDOMIZED
CONTROLLED
EXPERIMENT

A

Ideal: subjects all follow the treatment protocol –perfect compliance,
no errors in reporting, etc.

Randomized: subjects from the population of interest are randomly
assigned to a treatment or control group (so there are no confounding
factors).

Controlled: permits measuring the differential
effect of the treatment.

Experiment: the treatment is assigned as part of the experiment: the
subjects have no choice, so there is no “reverse causality” in which
subjects choose the treatment they think will work best.

27
Q

Three ways to overcome omitted variable bias

A
  1. Run a randomized control experiment in which STR is randomly assigned
  2. Adopt cross tabulation approach
    - data tables that present the results of the entire group of respondents, as well as results from subgroups of survey respondents.
  3. Use regression in which omitted variable is no longer omitted
28
Q

Adjusted R^2

A

Penalized you for including another regressor
- this is because R^2 always increases when adding another regressor
———————-
adjusted R^2< R^2
values are close when n is larger

29
Q
  1. There is no perfect multicollinearity
    - Perfect collinearity is when one of the regressors is an exact linear function of other regressors
    - accidentally include same variable twice
A

Imperfect multicollinearity occurs when two or more regressors are very
highly correlated.

  • If two regressors are very highly
    correlated, then their scatterplot will pretty much look like a straight
    line –they are “colinear” –but unless the correlation is exactly 1 or
    −1, that collinearity is imperfect.
    Imperfect multicollinearity implies that one or more of the regression
    coefficients will be imprecisely estimated
30
Q

imperfect multicollinearity
- results in large standard errors
for one or more of the OLS coefficients

A
31
Q

p value > alpha
pvalue<alpha

A

p value > alpha
ACCEPT NULL HYPOTHESIS

pvalue<alpha
REJECT NULL HYPOTHESIS

32
Q

Joint Hypothesis

A

specifies a value for 2 or more coefficients; imposes a restriction on 2+ coefficients

33
Q

How to test join hypothesis

A

F STATISTICS

34
Q

F-statistics

A
  1. Large when t1 and t2 is large

when n is large
F=.5(t1+t2)

35
Q

Chi Squared Distribution

A

q degrees of freedom

36
Q

Control Variable W

A

Correlated with + controls for; an omitted causal factor in regression of Y on X, but doesn’t have causal effect on Y

37
Q

Three interchangeable statements about effective control variable

A
  1. when included in the
    regression, makes the error term uncorrelated with the variable of
    interest.
    2 Holding constant the control variable(s), the variable of interest is “as
    if” randomly assigned.

3 the variable of interest is uncorrelated with the omitted
determinants of Y

38
Q

When control variables are included, the LSA (1) E [ui |X1,i , …, XK ,i ] = 0
must not hold

A
39
Q

Conditional mean independence

A

Given the control variable, the mean of ui doesn’t depend on variable of interest
E [ui |Xi , Wi ] = E [ui |Wi ]