Week 5 Flashcards

1
Q

model that is even simpler than the univariate regression model.

A

It is called the intercept-only model, which is a regression model without any predictor.

uy = B0 +ei

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Can you guess what the β0 equal to? What is the best prediction for Yi
if you don’t know anything else?

A

Answer: β0 = µy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

For the intercept-only model, we can only test what hypothesis

A

H0 : β0 = 0
H1 : β0 does not equal 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is this equivalent to
H0 : β0 = 0
H1 : β0 does not equal 0. in a one sample t-test

A

H0 : µy = 0
H1 : µy does not equal 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Univariate Model:

Equation?

What does B1 mean?

What tests?

A

yi = β0 + β1x1 + ϵi

§ β1: for one unit increase in x1, there is β1 unit increase in Y

§ t-test for the regression coefficient and correlation coefficient and F-test for the overall model fit (or R-squared) are equivalent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Bivariate Model:

Equation?

What does B1 mean?

What tests?

A

yi = β0 + β1x1 + β2x2 + ϵi

β1: holding x2 constant, for one unit increase in x1, there is β1 unit increase in Y .

t-test for the partial regression coefficient is different from F-test for the overall model (or R-squared).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

the F-test for the univariate and bivariate regression tests whether…

A

the variance explained in the criterion variable can be significantly accounted for by all the predictors

H0: p2yy=0
H0: p2yy>0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

p2yy= what at the population level?

A

ssregression/sstotal

Another way of looking at the F-test is that it is a ratio comparing the current model and the intercept-only model.

p2yy = SScurrentmod/SSinterceptonlymod

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In the intercept-only model, can you do an F test?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is unadjusted R^2 not good?

A

Because the the sample R-squared r2yyˆ
is a biased estimator of the population R-squared ρ2yyˆ

§ Over repeated studies, the sample R-squared r2yyˆ tends to be higher than ρ2yyˆ
.
The sample R-squared r2yyˆtends to increase as the numberof predictors (denoted by p) increases.
§ As p increases, the model tends to be overfitting.
§ Overfitted model will be very unstable; the estimation varies widely across repeated samples - line too close to points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Bias-Variance Trade Off

Define bias and variance

Explain how they influence

A

For any statistical modelling, there is a bias-variance tradeoff.
Bias: how good is the model fit to the current data.
§ Less bias means less residual.
§ observed and predicted value are similar.
§ more variance in the criterion variable can be explained by the predictors.
§ In regression, usually, as you add more predictors, you will get less bias.

Variance: how variable is your estimated across repeated samples.
§ Large variance implies large standard error and more prediction error.
§ In regression, usually, as you add more predictors, you will get large standard error and predictor error.
§ recall multicollinearity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Underfitted Model - bias and variance?

A

High bias; low variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Overfitted Model: - bias and variance?

A

Low bias; high variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Both underfitted and overfitted models have _________ prediction error

A

large

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The unadjusted R2 tends to favor ______________ models even
though they are not good.

A

overfitted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The goal of the adjusted R2 is…

A

to provide a more balanced evaluation of the fit relative to the number of predictors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Unadjusted R^2 formula

A

r2yy = 1- (ssregression/sstotal)

18
Q

adjusted R^2 formula

A

r2yy = 1 = (ssresidual/dfresidual)/(sstotal/dftotal)

19
Q

As the number of predictors, relative to sample size, increases, the R-squared is adjusted how?

A

downward more.

In short, adjusted R squared adjusts the unadjusted R-squared downward to provide a better evaluation of fit.

20
Q

We know that in the population model, the error term is…

what is the notation

A

random variable with normal distribution

ei ~ N(0,o^2)

21
Q

deterministic view

A

The deterministic view assumes that the variability of the criterion variable can be fully accounted for by a list of predictors at the population level; therefore, there is no error term in the population

22
Q

stochastic view

A

The stochastic view assumes that the variability of the criterion variable CANNOT be fully accounted for by a list of predictors at the population level; therefore, there should be an error term in the population

23
Q

Modern statistics takes which view?

A

stochastic view.

24
Q

There are two fundamentally different interpretations of the
regression coefficients.

A
  1. Descriptively as an empirical association
  2. Causally as a structural relation
25
Q

There are two fundamentally different interpretations of the
regression coefficients.

  1. Descriptively as an empirical association.
A
  1. Descriptively as an empirical association.
    § Treat predictors as correlates with the criterion variable.
    § Population regression coefficients simply describe the relationship between predictors and the criterion variable at the population level.
    § Error term represents all the correlates that we didn’t measure but correlate with the criterion variable.
    § Relevant when you only focus on prediction.
26
Q

There are two fundamentally different interpretations of the
regression coefficients.

  1. Causally as a structural relation.
A
  1. Causally as a structural relation.
    § Treat predictors as causes of the criterion variable.
    § Population regression coefficients try to model the underlying causal relationship between the predictors and
    the criterion variable at the population level.
    § Error term represents non-systematic causes of the criterion
    variables.
    § Relevant when you want to study the underlying causal
    relationship
27
Q

In the empirical association view, omitting a relevant predictor for the criterion variable _________________the inference.

A

does not affect

But the prediction may not be good if you omit an important predictor

28
Q

What happens when you omit a predictor in the structural relation view?

A

On the other hand, if we hold the structural relation view, then we can talk about a possible bias produced by omitting a predictor in the population.

29
Q

Structural Relation View – Example

Suppose in the population, the true model is
µy|x = β0 + β1x1 + β2x2 + ϵ.

Under the structural relation view, the true model means

A

x1 and x2 are the only causes of Y . Usually (not always), cor(x1, x2) = 0.
§ ϵ is variance in Y that can’t be accounted for by any other variables.
§ Thus, cor(x1, ϵ) = 0 and cor(x2, ϵ) = 0 at the population level

30
Q

Structural Relation View – Example

Suppose in the population, the true model is
µy|x = β0 + β1x1 + β2x2 + ϵ.

However, suppose we fit an univariate regression model.

A

§ omitting x2
§ fitting a misspecified model

µy|x = β0 + β1x1’ + ϵ’.

where, implicitly, the effect of x2 on Y is absorbed by the error

ϵ’ = β2x2 + ϵ.

If x1 and x2 are correlated, then there is a correlation between
x1 and ϵ’, cor(x1, ϵ1) = 0, at the population level.

However, if you fit the one predictor model with the least squares method at the sample level, the correlation between the predictor and the residual will be forced to be 0.

need to look at slide bit more complex than this

31
Q

In conclusion, if we hold the structural relation view, then we can talk about …

A

In conclusion, if we hold the structural relation view, then we can talk about bias produced by omitting a predictor that is
a cause of Y (i.e., fitting a misspecified model) and is correlated with another predictor in the model.

In other words, in the structural relation view, we need to include all relevant causes of Y as predictors to obtain
consistent estimation.
§ possible for experimental studies
§ not possible for observational studies but we try our best.
§ “All models are wrong but some are useful.”

32
Q

Why do we create dummy variables?

A

To incorporate categorical variables into the regression model, we need to create dummy variables.

33
Q

What are dummy variables?

A

Dummy (code) variables are numeric variables that use 0 or 1 to represent categorical variables.

34
Q

How many dummy variables are needed?

A

For a categorical variable that has g groups, you need g - 1 dummy variables to code this categorical variable

35
Q

If a categorical variable has 2 groups (e.g., male vs female), then how many dummy variables?

A

you need 1 dummy variable to code it.

36
Q

Reference group

A

The group that has 0 on all dummy variables is called reference group

37
Q

What is the interpretation of the intercept βˆ0 = 85.6?

yˆ = 85.6 + 4.2D

A

mean grade of male (or the reference group)

Recall βˆ0 is the value of the criterion variable when the predictor variable is 0.

Therefore, βˆ0 is the value of the criterion variable for the reference group.

38
Q

What is the interpretation of the intercept βˆ1 = 4.2?

A

difference between the mean grades of male and female.

Recall: βˆ1 is the amount of change in the criterion variable for 1 unit change in the predictor variable.

§ Now, 1 unit change in the dummy variable is the change from the male group to the female group.

§ Therefore, βˆ1 is the amount of change in the criterion when you change from the reference group to the non-reference
group

39
Q

What does lm do in respect to dummy variables?

A

The lm function will automatically coerce character vectors as factors and do dummy code in the background.

§ It will assign the reference group based on alphabetical order.

§ e.g., “female” is before “male” alphabetically so “female” will be automatically assigned as the reference group.

40
Q

A simple regression with a dummy variable is equivalent to

A

an independent-sample t-test, which is equivalent to running ANOVA with two groups

41
Q

F test H0

A

H0 : µ1 = µ2 = µ3
H0 : β1=“ β2 = 0.

H1 at least one pair of means are not equal.
§ same as the F-test in ANOVA.