lecture 2 Flashcards

1
Q

SR: how to draw a sample

A

population -> draw a sample -> gives us our statistic aka estimator -> use this to estimate parameter -> get our parameter of interest (u)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

X-bar

A

sample mean (parameter of interest)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is X-bar the same thing as

A

E(X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is E(X)

A

expected value (mean u of a discrete random variable X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

instead of population mean we are interested in

A

population conditional mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

population conditional mean

A

mean of conditional distribution: find the average of something conditional on another variable, like wage conditional on gender or expected value of GPA conditional on high school GPA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

conditional distribution

A

The probability distribution of one random variable given that another random variable takes on a particular value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

true line

A
  • arguing that this exists
  • population regression line
  • very few observations lie on it
  • Y = Bo + B1X + u
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

u

A

the error term; allows for discrepancies
u = Y - Bo - B1X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

u on a regression

A

the vertical distance from the point to the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

in regression of Y on X

A

Y is on the left side, X is on the right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Terminology when Y is the dependent variable

A

X = independent variable
u = error term

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Terminology when Y is the explained variable

A

X = explanatory variable
u = unobserved heterogeneity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Terminology when Y is the regressand

A

X = regressor
u = disturbance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Terminology when Y is the left-hand side variable

A

X = right-hand side variable
u =

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

use the equation: Y = Bo + B1X + u to represent

A

the conditional expectation of Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

the conditional expectation of Y

A

E(Y|X) = Bo + B1X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what do we need to know to find the expected value of Y given X

A

need to know Bo and B1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

we use sample mean as an estimator of the population mean so we use

A

estimators for Bo and B1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

where do we get estimators for Bo and B1

A

draw a sample from the population then

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

B hat o

A

estimated value of the y-intercept in a linear regression model
- predicted value of Y when all independent variables are zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

B hat 1

A

estimated slope of a regression model
- change in y for one unit increase in x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

adding a subscript to our regression

A

reflects that we now have data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
population regression line
expected value a dependent variable conditional on one or more independent variables
26
what does the true line tell us: E(Yi|Xi) = Bo + B1Xi left side?
this true line is what we want to know, as it tells us the population mean of Y conditional on X
27
Yi = Bo + B1Xi + ui ui?
the distance of an observation from the true line is known as the error term
28
Yhat i = E(Yi|Xi) hat = B hat o + bhat 1 Xi E(Yi|Xi) hat?
this is our best guess of what the population mean of Y should be for a given X, also known as the fitted or predicted value
29
Yi = Bhat o + Bhat 1Xi + uhat i uhat i?
the distance of the observation from the estimated line is known as the residual
30
the residual is
the difference between the observed Y and the fitted Y
31
equation for the difference between the observed Y and the fitted Y
uhat i = Yi - Yhat i = Yi = Bhat o + Bhat 1Xi
32
how do we estimate Bo and B1
we want a method that makes the size of the residuals as small as possible, on average
33
what minimizing method do we choose
ordinary least squares (OLS) - minimizing the sum of squared residuals uhat i = Yi - Bhat o + Bhat 1Xi
34
residuals
the difference between the predicted values and the actual values, how far away they are from the estimated line
35
OLS estimators for Bhat 1
sXY/s^2X = cov hat (X,Y) / Var hat (X)
36
OLS estimator for B hat o
Ybar - Bhat1Xbar
37
interpretation of Bo
the intercept parameter or the constant term
38
interpretation of B1
- slope parameter - marginal effect of X on Y, no causality necessarily implied
39
_cons
when X is set at 0, what is Y, intercept
40
STATA: variables listed
as you go up by one unit how does the Y change
41
example of how to interpret Bo and B1
Bo = y-intercept, _cons; for an individual with a HS GPA of zero grade points, expected college GPA is estimated to be 1.815 grade points B1 = slope, HS GPA listed on the left side; as HS GPA goes up by one grade point, expected college GPA is estimated to rise by 0.482 grade points the line: GPA hat = 1.81 + 0.48HS GPA
42
how to estimate for an individual with a regression line
for an individual input the data into x to get the prediction - EX expression: GPA hat = 1.81 + 0.48HS GPA = GPA hat = 1.81 + 0.48(2.6) = 3.058
43
goodness of fit: Sum of Squares Residuals (SSR)
- measure of how well a regression model fits the data - come from OLS residuals
44
how did we find the OLS regression line
minimizing the sum of squares residuals
45
the smaller we are able to get the SSR
the better the fit of the regression line
46
what is the SSR a measure of
how much variability in Y was left unexplained by the regressor
47
formula for SSR / RSS
(yhati (observed value) - y(mean))^2
48
collect twice as much data what happens to the SSR- what is the standard error of regression
SER = √ (SSR/N-2)
49
TSS (total sum of squares)
by default is the dependent variable of the regression, measure of total variability (as opposed to the average)
50
ESS explained sum of squares
how much variability in Y is explained by the regressor
51
regressor
variable used in a regression model to predict outcome
52
TSS = ESS + SSR
total variation = explained variation + residual variation
53
formula for R^2
ESS/TSS or 1- (SSR/TSS)
54
what is R^2
the proportion of the total sample variation in Y that is explained by X
55
if we are regressing Y on X but it tells us nothing about Y (no order to the points), what is the slope, intercept, and regression line
slope = 0 intercept = Ybar regression line = Ybar
56
no order to the points, regression is Ybar; what is ESS, what is R^2
ESS = 0 R^2 = 0 (ESS/TSS, 0/TSS)
56
if there is a perfect relationship between x and y, what will each residual be
0, there is no variation from the estimated line
57
if there is a perfect relationship between x and y, what will each residual be, SSR is, ESS is
every residual is 0 SSR = 0, nothing is unexplained ESS = TSS, all variation is explained, how it can equal 1 - all variation in y explained by x r^2 = 1
58
0 <_R^2<_1; ex: R^2 = 0.37
- multiply by 100 and treat it as a percentage - 37% of the sample variation in Y has been explained by X
59
what is being looked for when interpreting R^2
context specific, ex: 37% of the variation in college GPA can be explained by HS GPA
60
how to decide between questions based on R^2
just because points are exactly on the line, a more interesting question may be a project where very little of the points lie on the line
61
where does R^2 get its name
reference to correlation coefficient
62
relationship between r and R^2
r is the square root of R^2
63
two primary statistical measures for how well a regression line fits the data
SER & R^2
64
SER
standard error of regression; - measures the variability of the regression residuals - the measure in the units of Y s uhat = √ SSR/n-2
65
R^2
of the regression; - measures the fraction of the sample variation of Y that is explained by X - This measure is unit-free and ranges between zero (no fit) and 1 (perfect fit) R^2 = ESS/TSS
66
R^2 for a simple regression
equal to the sample correlation coefficient squared
67
goodness of fit on a STATA output
SS + model (top #) = ESS SS + residual (bottom #) = SSR total = TSS root MSE = SER