Regression Flashcards
What is Regression Modeling?
The construction and evaluation of models used to generate predictions of numeric values of a target or dependent variable, which is also called response sometimes. Complement of classification which generates predictions of categorical values
What are predictors?
Predictors, independent variables, features or input variables (or attributes) may be either numeric or categorical.
Common Methods of Regression
Multiple Linear Regression (MLR - lm) Regression Tree (rpart) Model Tree (M5P) Support Vector Machine (SVM – KSVM) K-Nearest Neighbors (KNN)
Explanatory vs Predictive Power
Explanatory - explaining which and how predictors affect the dependent variable most significantly.
Important questions
Is there a relationship between bmi and insurance expenses? Is it linear? How strong is it?
Is it stronger than the relationship between smoker and expenses? Is there a synergy in effects from smoker and bmi on expenses?
Would the discovered relationships be general across different data sets?
Model elements and metrics to be used: correlation coefficients, beta coefficients, p-values, R-squared, F-statistics
Predictive power
Predictive – Evaluate prediction accuracy and generality.
Important questions
How accurate can a model predict?
How can we improve the model accuracy?
Which predictors contribute to model accuracy?
Which method is more accurate?
Would the prediction performance be general across different data sets?
Does the model generate more large prediction errors than other models?
Which error measures should be adopted to evaluate and compare prediction performance?
Performance metrics to be used: MAE, MSE, RMSE, MAPE, RMSPE, RAE and RRSE
What is regression?
Regression is concerned with specifying the relationship between a single
numeric dependent variable (the value to be predicted) and one or more numeric
independent variables (the predictors). As the name implies, the dependent variable
depends upon the value of the independent variable or variables. The simplest forms
of regression assume that the relationship between the independent and dependent
variables follows a straight line.
correlation
The correlation between two variables is a number that indicates how closely their
relationship follows a straight line.