module 10 Flashcards

Question 1

Q

Association between 2 numerical variables, controlling for a categorical variable

Answer

A

Can use scatterplot

Question 2

Q

Association between 2 numerical variables - controlling for a numerical variable

Answer

A

Can use scatterplot

Question 3

Q

Association between a numerical and categorical variable - controlling for another categorical variable

Answer

A

Can plot a side by side boxplot/violinplot visualization

Question 4

Q

Simple OLS regression model:

Answer

A

yhat (response variable) = intercept + slope*explanatory variable

Question 5

Q

Reference/baseline level

Answer

A

a categorical explanatory variable that is not assigned an indicator variable (indicator variable is 0/1)

Question 6

Q

interpreting a numerical variable slope in a multiple linear regression model

Answer

A

“All else held equal, by increasing the given explanatory variable by 1, we expect the predicted response variable to increase/decrease by NUMBER on average.”

Question 7

Q

Formal definition of the indicator variable slope

Answer

A

“All else held equal, we expect the predicted response variable value that corresponds to the given indicator variable level to be NUMBER higher/lower than the reference level, on average”

Question 8

Q

Formal def of the intercept

Answer

A

“We expect the predicted response variable value that corresponds to the observation in which all explanatory and indicator variable values are 0 to be our intercept value, on average.”

Question 9

Q

When to use interaction terms

Answer

A

If you observe diff slopes between a given numerical explanatory variable and the response variable for different levels

Question 10

Q

residual

Answer

A

actual - predicted

Question 11

Q

training dataset

Answer

A

to train the machine learning model; usually 80%

Question 12

Q

Test dataset

Answer

A

to test the machine learning model that has been fit with the training dataset; may calculate the RMSE of the test dataset

Question 13

Q

RMSE

Answer

A

The avg error of each response variable in the dataset; we would have no model error for any of our observations; RMSE = 0, the closer the RMSE to 0, the better

Question 14

Q

SSE

Answer

A

Sum square error (SSE): minimal value of the model; amount of response variable variability in the dataset that is not explained by the model

Question 15

Q

SST

Answer

A

Sum square total (SST): the total amount of response variable variability in the dataset

Question 16

Q

SSR

Answer

Study These Flashcards

A

Sum square regression (SSR) = SST - SSE; Amt of response variable variability that is explained by the model

Question 17

Q

R^2

Answer

Study These Flashcards

A

The percent of response variable variability that is explained by the model; SSR/SST; would like 100%

Question 18

Q

Linear regression assumptions

Answer

Study These Flashcards

A

LINE + no multicollinearity:
required:
Linearity: relationship between the Xs and the Y variable should be linear in form
response variable is quantitative

interpretable results:
Multicollinearity: no strong multicollinearity between the X variables

best model assumptions:
Independence: true errors are independent
Normality: true errors are normally distributed
Equal variance: variance of Y at each combination of X is equal

module 10 Flashcards

(18 cards)