Chapter 6 Flashcards
multi linear reg
What is Multiple Linear Regression?
When a regression model has more than one input
Used to fit a linear relationship between a quantitative dependent variable Y
Linear Regression
most commonly used predictive modelling technique, “best fit line”
Input, Target variables
must be numeric
Best fit
Minimizes the sum of squares if the vertical distance from the data points of the line.
Goodness of Fit
Difference between the predicted and actual values, called residuals
R squared
0 to 1 value - measures the proportion of variance in the target that is explained by the input.
Y
Dependent Variable, aka outcome or response variable
X
Predictors, aka independent or input variables, regressors, covariates
B
B: Coefficients
E
The noise or unexplained part
The data are used to estimate
the coefficients and the variability of the noise
Objectives of fitting a model related to a quantitative outcome
Understanding the relationship between factors (focus of classical stats)
Predicting the outcome of new cases (focus of Data Mining)
Explanatory vs Predictive Modeling
The choice of model is closely tied to which is the goal
Both use a dataset to fit a model (i.e. estimate coefficients)
However, there several differences between the two:
Explanatory fits data closely - Predictive predicts new cases accurately
Explanatory uses entire data set - Predictive splits into partitions
Performance measures:
Explanatory: How well data fits model
Predictive: Predictive accuracy
Explanatory Modeling
Goal: Explain relationship between predictors (explanatory variables) and target
Familiar use of regression in data analysis
Model Goal: Fit the data well and understand the contribution of explanatory variables to the model
“goodness-of-fit”: R2, residual analysis, p-values
Predictive Modeling
Goal: predict target values in other data where we have predictor values, but not target values
Classic data mining context
Model Goal: Optimize predictive accuracy
Train model on training data
Assess performance on validation (hold-out) data
Explaining role of predictors is not primary purpose (but useful)