Week 5 Flashcards

Question

There are two fundamentally different interpretations of the regression coefficients. 1. Descriptively as an empirical association.

Answer 1

1. Descriptively as an empirical association. § Treat predictors as correlates with the criterion variable. § Population regression coefficients simply describe the relationship between predictors and the criterion variable at the population level. § Error term represents all the correlates that we didn’t measure but correlate with the criterion variable. § Relevant when you only focus on prediction.

Answer 2

2. Causally as a structural relation. § Treat predictors as causes of the criterion variable. § Population regression coefficients try to model the underlying causal relationship between the predictors and the criterion variable at the population level. § Error term represents non-systematic causes of the criterion variables. § Relevant when you want to study the underlying causal relationship

Answer 3

does not affect But the prediction may not be good if you omit an important predictor

Answer 4

On the other hand, if we hold the structural relation view, then we can talk about a possible bias produced by omitting a predictor in the population.

Answer 5

x1 and x2 are the only causes of Y . Usually (not always), cor(x1, x2) = 0. § ϵ is variance in Y that can’t be accounted for by any other variables. § Thus, cor(x1, ϵ) = 0 and cor(x2, ϵ) = 0 at the population level

Answer 6

§ omitting x2 § fitting a misspecified model µy|x = β0 + β1x1' + ϵ'. where, implicitly, the effect of x2 on Y is absorbed by the error ϵ' = β2x2 + ϵ. If x1 and x2 are correlated, then there is a correlation between x1 and ϵ', cor(x1, ϵ1) = 0, at the population level. However, if you fit the one predictor model with the least squares method at the sample level, the correlation between the predictor and the residual will be forced to be 0. need to look at slide bit more complex than this

Answer 7

In conclusion, if we hold the structural relation view, then we can talk about bias produced by omitting a predictor that is a cause of Y (i.e., fitting a misspecified model) and is correlated with another predictor in the model. In other words, in the structural relation view, we need to include all relevant causes of Y as predictors to obtain consistent estimation. § possible for experimental studies § not possible for observational studies but we try our best. § “All models are wrong but some are useful.”

Answer 8

To incorporate categorical variables into the regression model, we need to create dummy variables.

Answer 9

Dummy (code) variables are numeric variables that use 0 or 1 to represent categorical variables.

Answer 10

For a categorical variable that has g groups, you need g - 1 dummy variables to code this categorical variable

Answer 11

you need 1 dummy variable to code it.

Answer 12

The group that has 0 on all dummy variables is called reference group

Answer 13

mean grade of male (or the reference group) Recall βˆ0 is the value of the criterion variable when the predictor variable is 0. Therefore, βˆ0 is the value of the criterion variable for the reference group.

Answer 14

difference between the mean grades of male and female. Recall: βˆ1 is the amount of change in the criterion variable for 1 unit change in the predictor variable. § Now, 1 unit change in the dummy variable is the change from the male group to the female group. § Therefore, βˆ1 is the amount of change in the criterion when you change from the reference group to the non-reference group

Answer 15

The lm function will automatically coerce character vectors as factors and do dummy code in the background. § It will assign the reference group based on alphabetical order. § e.g., "female" is before "male" alphabetically so "female" will be automatically assigned as the reference group.

Answer 16

an independent-sample t-test, which is equivalent to running ANOVA with two groups

Answer 17

H0 : µ1 = µ2 = µ3 H0 : β1=“ β2 = 0. H1 at least one pair of means are not equal. § same as the F-test in ANOVA.

Week 5 Flashcards

(41 cards)