Chapter 5 Flashcards

1
Q

Backward Elimination

A

Backward elimination is a quantitative approach to identify the independent variable to include in a model. It starts with all the independent variables in the model first and each one is deleted one at a time if they are not significant. The process stops when all variables in the model are significant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Dependent Variable

A

The variable being predicted is referred to as the dependent (target) variable (Y).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Dummy Coding

A

Dummy coding involves creating a dichotomous value from a categorical value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Feature Selection

A

Feature selection refers to identifying the optimal subset of features (independent variables) to explain a target variable. Feature selection can be done quantitatively or qualitatively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Forward Selection

A

Forward selection is a quantitative approach to identify the independent variable to include in a model. A separate regression model is created for each predictor and then variables are added one by one to determine which variables improve the model prediction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Independent Variable

A

The variables used to make the prediction are called independent variables (X) (also referred to as predictors or features).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Linear Regression

A

Linear regression is a type of modeling that shows a relationship between the independent and dependent variables. It is represented by a straight line that best fits the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Mean Absolute Error

A

Mean Absolute Error (MAE) measures the absolute difference between the predicted and actual values of the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Mean Absolute Percentage Error

A

Mean Absolute Percentage Error (MAPE) is the percentage absolute difference the prediction is, on average, from the actual target.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Multicollinearity

A

Multicollinearity is a situation where the predictor variables are highly correlated with each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Multiple Regression

A

Multiple regression is used to determine whether two or more independent variables are good predictors of the single dependent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Numerical Variable

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Ordinary Least Squares

A

The ordinary least squares (OLS) regression method commonly referred to as linear regression minimizes the sum of squared errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Overfitting

A

Overfitting occurs from an overly complex model where the results are limited to the data being used and are not generalizable—which means future relationships cannot be inferred, and results will be inconsistent when using other data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

R^2

A

R^2 measures the amount of variance in the dependent variable that is predicted by the independent variable(s). The R^2 value ranges between 0 and 1, and the closer the value is to 1, the better the prediction by the regression model. When the value is near 0, the regression model is not a good predictor of the dependent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Regression Modeling

A

Regression modeling captures the strength of a relationship between a single numerical dependent or target variable, and one or more (numerical or categorical) predictor variables.

17
Q

Residuals

A

Residuals represent the difference between the observed and predicted value of the dependent variable.

18
Q

Root Mean Squared Error

A

Root Mean Squared Error (RMSE) indicates how different the residuals are from zero.

19
Q

Simple Linear Regression

A

Simple linear regression is used when the focus is limited to a single, numeric dependent variable and a single independent variable.

20
Q

Stepwise Selection

A

Stepwise selection is a quantitative approach to identify the independent variable to include in a model. Stepwise selection follows forward selection by adding a variable at each step, but also includes removing variables that no longer meet the threshold. The stepwise selection stops when the remaining predictors in a model satisfy the threshold to remain in the model.

21
Q

Test Data

A

A testing dataset is used to evaluate the final selection algorithm on a dataset unique from the training and validation datasets.

22
Q

Training Data

A

The training dataset is the data used to build the algorithm and “learn” the relationship between the predictors and the target variable.

23
Q

Validation Data

A

The validation data is used to assess how well the regression model estimates the target variable when compared to the actual values.