Chapter 6 Flashcards

Question 1

Q

What is Multiple Linear Regression?

Answer

A

When a regression model has more than one input

Used to fit a linear relationship between a quantitative dependent variable Y

Question 2

Q

Linear Regression

Answer

A

most commonly used predictive modelling technique, “best fit line”

Question 3

Q

Input, Target variables

Answer

A

must be numeric

Question 4

Q

Best fit

Answer

A

Minimizes the sum of squares if the vertical distance from the data points of the line.

Question 5

Q

Goodness of Fit

Answer

A

Difference between the predicted and actual values, called residuals

Question 6

Q

R squared

Answer

A

0 to 1 value - measures the proportion of variance in the target that is explained by the input.

Question 7

Q

Y

Answer

A

Dependent Variable, aka outcome or response variable

Question 8

Q

X

Answer

A

Predictors, aka independent or input variables, regressors, covariates

Question 9

Q

B

Answer

A

B: Coefficients

Question 10

Q

E

Answer

A

The noise or unexplained part

Question 11

Q

The data are used to estimate

Answer

A

the coefficients and the variability of the noise

Question 12

Q

Objectives of fitting a model related to a quantitative outcome

Answer

A

Understanding the relationship between factors (focus of classical stats)
Predicting the outcome of new cases (focus of Data Mining)

Question 13

Q

Explanatory vs Predictive Modeling

Answer

A

The choice of model is closely tied to which is the goal

Both use a dataset to fit a model (i.e. estimate coefficients)

However, there several differences between the two:
Explanatory fits data closely - Predictive predicts new cases accurately
Explanatory uses entire data set - Predictive splits into partitions
Performance measures:
Explanatory: How well data fits model
Predictive: Predictive accuracy

Question 14

Q

Explanatory Modeling

Answer

A

Goal: Explain relationship between predictors (explanatory variables) and target

Familiar use of regression in data analysis

Model Goal: Fit the data well and understand the contribution of explanatory variables to the model

“goodness-of-fit”: R2, residual analysis, p-values

Question 15

Q

Predictive Modeling

Answer

A

Goal: predict target values in other data where we have predictor values, but not target values
Classic data mining context
Model Goal: Optimize predictive accuracy
Train model on training data
Assess performance on validation (hold-out) data
Explaining role of predictors is not primary purpose (but useful)

Question 16

Q

*You cannot include all the binary dummies

Answer

Study These Flashcards

A

*You cannot include all the binary dummies; in regression this will cause a multicollinearity error.
Other data mining methods can use all the dummies.

Question 17

Q

Selecting Subsets of Predictors

Answer

Study These Flashcards

A

Goal: Find parsimonious model (the simplest model that performs sufficiently well)
More robust
Higher predictive accuracy

We will assess predictive accuracy on validation data
Exhaustive Search = “best subset”
Partial Search Algorithms
Forward
Backward
Stepwise

Question 18

Q

Partial Search Algorithms

Answer

Study These Flashcards

A

Forward
Backward
Stepwise

Question 19

Q

Exhaustive Search = Best Subset

Answer

Study These Flashcards

A

All possible subsets of predictors assessed (single, pairs, triplets, etc.)
Computationally intensive, not feasible for big data
Judge by “adjusted R2”
Adjusted R2 for the models with 1 predictor, 2 predictors, 3 predictors, etc. (exhaustive search method)

Adjusted R2 rises until you hit 7-8 predictors, then stabilizes, so choose model with 7 predictors, according to the adj R2 criterion

Question 20

Q

Forward Selection

Answer

Study These Flashcards

A

Start with no predictors
Add them one by one (add the one with largest contribution)
Stop when the addition is not statistically significant

Question 21

Q

Backward Elimination

Answer

Study These Flashcards

A

Start with all predictors
Successively eliminate least useful predictors one by one
Stop when all remaining predictors have statistically significant contribution

Question 22

Q

p value

Answer

Study These Flashcards

A

if it is less than 0.05 then it is less likely to occur by chance

Question 23

Q

Stepwise

Answer

Study These Flashcards

A

Like Forward Selection

Except at each step, also consider dropping non-significant predictors

Chapter 6 Flashcards

multi linear reg (23 cards)