Week 5 Flashcards
model that is even simpler than the univariate regression model.
It is called the intercept-only model, which is a regression model without any predictor.
uy = B0 +ei
Can you guess what the β0 equal to? What is the best prediction for Yi
if you don’t know anything else?
Answer: β0 = µy
For the intercept-only model, we can only test what hypothesis
H0 : β0 = 0
H1 : β0 does not equal 0.
what is this equivalent to
H0 : β0 = 0
H1 : β0 does not equal 0. in a one sample t-test
H0 : µy = 0
H1 : µy does not equal 0.
Univariate Model:
Equation?
What does B1 mean?
What tests?
yi = β0 + β1x1 + ϵi
§ β1: for one unit increase in x1, there is β1 unit increase in Y
§ t-test for the regression coefficient and correlation coefficient and F-test for the overall model fit (or R-squared) are equivalent.
Bivariate Model:
Equation?
What does B1 mean?
What tests?
yi = β0 + β1x1 + β2x2 + ϵi
β1: holding x2 constant, for one unit increase in x1, there is β1 unit increase in Y .
t-test for the partial regression coefficient is different from F-test for the overall model (or R-squared).
the F-test for the univariate and bivariate regression tests whether…
the variance explained in the criterion variable can be significantly accounted for by all the predictors
H0: p2yy=0
H0: p2yy>0
p2yy= what at the population level?
ssregression/sstotal
Another way of looking at the F-test is that it is a ratio comparing the current model and the intercept-only model.
p2yy = SScurrentmod/SSinterceptonlymod
In the intercept-only model, can you do an F test?
No
Why is unadjusted R^2 not good?
Because the the sample R-squared r2yyˆ
is a biased estimator of the population R-squared ρ2yyˆ
§ Over repeated studies, the sample R-squared r2yyˆ tends to be higher than ρ2yyˆ
.
The sample R-squared r2yyˆtends to increase as the numberof predictors (denoted by p) increases.
§ As p increases, the model tends to be overfitting.
§ Overfitted model will be very unstable; the estimation varies widely across repeated samples - line too close to points
Bias-Variance Trade Off
Define bias and variance
Explain how they influence
For any statistical modelling, there is a bias-variance tradeoff.
Bias: how good is the model fit to the current data.
§ Less bias means less residual.
§ observed and predicted value are similar.
§ more variance in the criterion variable can be explained by the predictors.
§ In regression, usually, as you add more predictors, you will get less bias.
Variance: how variable is your estimated across repeated samples.
§ Large variance implies large standard error and more prediction error.
§ In regression, usually, as you add more predictors, you will get large standard error and predictor error.
§ recall multicollinearity.
Underfitted Model - bias and variance?
High bias; low variance
Overfitted Model: - bias and variance?
Low bias; high variance.
Both underfitted and overfitted models have _________ prediction error
large
The unadjusted R2 tends to favor ______________ models even
though they are not good.
overfitted
The goal of the adjusted R2 is…
to provide a more balanced evaluation of the fit relative to the number of predictors.
Unadjusted R^2 formula
r2yy = 1- (ssregression/sstotal)
adjusted R^2 formula
r2yy = 1 = (ssresidual/dfresidual)/(sstotal/dftotal)
As the number of predictors, relative to sample size, increases, the R-squared is adjusted how?
downward more.
In short, adjusted R squared adjusts the unadjusted R-squared downward to provide a better evaluation of fit.
We know that in the population model, the error term is…
what is the notation
random variable with normal distribution
ei ~ N(0,o^2)
deterministic view
The deterministic view assumes that the variability of the criterion variable can be fully accounted for by a list of predictors at the population level; therefore, there is no error term in the population
stochastic view
The stochastic view assumes that the variability of the criterion variable CANNOT be fully accounted for by a list of predictors at the population level; therefore, there should be an error term in the population
Modern statistics takes which view?
stochastic view.
There are two fundamentally different interpretations of the
regression coefficients.
- Descriptively as an empirical association
- Causally as a structural relation
There are two fundamentally different interpretations of the
regression coefficients.
- Descriptively as an empirical association.
- Descriptively as an empirical association.
§ Treat predictors as correlates with the criterion variable.
§ Population regression coefficients simply describe the relationship between predictors and the criterion variable at the population level.
§ Error term represents all the correlates that we didn’t measure but correlate with the criterion variable.
§ Relevant when you only focus on prediction.
There are two fundamentally different interpretations of the
regression coefficients.
- Causally as a structural relation.
- Causally as a structural relation.
§ Treat predictors as causes of the criterion variable.
§ Population regression coefficients try to model the underlying causal relationship between the predictors and
the criterion variable at the population level.
§ Error term represents non-systematic causes of the criterion
variables.
§ Relevant when you want to study the underlying causal
relationship
In the empirical association view, omitting a relevant predictor for the criterion variable _________________the inference.
does not affect
But the prediction may not be good if you omit an important predictor
What happens when you omit a predictor in the structural relation view?
On the other hand, if we hold the structural relation view, then we can talk about a possible bias produced by omitting a predictor in the population.
Structural Relation View – Example
Suppose in the population, the true model is
µy|x = β0 + β1x1 + β2x2 + ϵ.
Under the structural relation view, the true model means
x1 and x2 are the only causes of Y . Usually (not always), cor(x1, x2) = 0.
§ ϵ is variance in Y that can’t be accounted for by any other variables.
§ Thus, cor(x1, ϵ) = 0 and cor(x2, ϵ) = 0 at the population level
Structural Relation View – Example
Suppose in the population, the true model is
µy|x = β0 + β1x1 + β2x2 + ϵ.
However, suppose we fit an univariate regression model.
§ omitting x2
§ fitting a misspecified model
µy|x = β0 + β1x1’ + ϵ’.
where, implicitly, the effect of x2 on Y is absorbed by the error
ϵ’ = β2x2 + ϵ.
If x1 and x2 are correlated, then there is a correlation between
x1 and ϵ’, cor(x1, ϵ1) = 0, at the population level.
However, if you fit the one predictor model with the least squares method at the sample level, the correlation between the predictor and the residual will be forced to be 0.
need to look at slide bit more complex than this
In conclusion, if we hold the structural relation view, then we can talk about …
In conclusion, if we hold the structural relation view, then we can talk about bias produced by omitting a predictor that is
a cause of Y (i.e., fitting a misspecified model) and is correlated with another predictor in the model.
In other words, in the structural relation view, we need to include all relevant causes of Y as predictors to obtain
consistent estimation.
§ possible for experimental studies
§ not possible for observational studies but we try our best.
§ “All models are wrong but some are useful.”
Why do we create dummy variables?
To incorporate categorical variables into the regression model, we need to create dummy variables.
What are dummy variables?
Dummy (code) variables are numeric variables that use 0 or 1 to represent categorical variables.
How many dummy variables are needed?
For a categorical variable that has g groups, you need g - 1 dummy variables to code this categorical variable
If a categorical variable has 2 groups (e.g., male vs female), then how many dummy variables?
you need 1 dummy variable to code it.
Reference group
The group that has 0 on all dummy variables is called reference group
What is the interpretation of the intercept βˆ0 = 85.6?
yˆ = 85.6 + 4.2D
mean grade of male (or the reference group)
Recall βˆ0 is the value of the criterion variable when the predictor variable is 0.
Therefore, βˆ0 is the value of the criterion variable for the reference group.
What is the interpretation of the intercept βˆ1 = 4.2?
difference between the mean grades of male and female.
Recall: βˆ1 is the amount of change in the criterion variable for 1 unit change in the predictor variable.
§ Now, 1 unit change in the dummy variable is the change from the male group to the female group.
§ Therefore, βˆ1 is the amount of change in the criterion when you change from the reference group to the non-reference
group
What does lm do in respect to dummy variables?
The lm function will automatically coerce character vectors as factors and do dummy code in the background.
§ It will assign the reference group based on alphabetical order.
§ e.g., “female” is before “male” alphabetically so “female” will be automatically assigned as the reference group.
A simple regression with a dummy variable is equivalent to
an independent-sample t-test, which is equivalent to running ANOVA with two groups
F test H0
H0 : µ1 = µ2 = µ3
H0 : β1=“ β2 = 0.
H1 at least one pair of means are not equal.
§ same as the F-test in ANOVA.