Chapter 5 Flashcards
Backward Elimination
Backward elimination is a quantitative approach to identify the independent variable to include in a model. It starts with all the independent variables in the model first and each one is deleted one at a time if they are not significant. The process stops when all variables in the model are significant.
Dependent Variable
The variable being predicted is referred to as the dependent (target) variable (Y).
Dummy Coding
Dummy coding involves creating a dichotomous value from a categorical value.
Feature Selection
Feature selection refers to identifying the optimal subset of features (independent variables) to explain a target variable. Feature selection can be done quantitatively or qualitatively.
Forward Selection
Forward selection is a quantitative approach to identify the independent variable to include in a model. A separate regression model is created for each predictor and then variables are added one by one to determine which variables improve the model prediction.
Independent Variable
The variables used to make the prediction are called independent variables (X) (also referred to as predictors or features).
Linear Regression
Linear regression is a type of modeling that shows a relationship between the independent and dependent variables. It is represented by a straight line that best fits the data.
Mean Absolute Error
Mean Absolute Error (MAE) measures the absolute difference between the predicted and actual values of the model.
Mean Absolute Percentage Error
Mean Absolute Percentage Error (MAPE) is the percentage absolute difference the prediction is, on average, from the actual target.
Multicollinearity
Multicollinearity is a situation where the predictor variables are highly correlated with each other.
Multiple Regression
Multiple regression is used to determine whether two or more independent variables are good predictors of the single dependent variable.
Numerical Variable
Ordinary Least Squares
The ordinary least squares (OLS) regression method commonly referred to as linear regression minimizes the sum of squared errors.
Overfitting
Overfitting occurs from an overly complex model where the results are limited to the data being used and are not generalizable—which means future relationships cannot be inferred, and results will be inconsistent when using other data.
R^2
R^2 measures the amount of variance in the dependent variable that is predicted by the independent variable(s). The R^2 value ranges between 0 and 1, and the closer the value is to 1, the better the prediction by the regression model. When the value is near 0, the regression model is not a good predictor of the dependent variable.