CH 3 Predictive Modeling Flashcards

Question 1

Q

Which one of the following statements is correct?

Answer

A

An algorithm is a set of steps used to solve a problem or complete a process.

Question 2

Q

When examining a model’s results, insurance and risk management professionals should defer to

Answer

A

Their professional experience.

Question 3

Q

A predictive model is applied to a clothing manufacturer’s data of 1,000 employees, 50 of whom had workplace injuries in the past year. The table below shows how often the model correctly and incorrectly predict for each employee “yes, will have an accident” or “no, will not have an accident.”

Predicted No +Predicted Yes =Total (1,000 Employees)

Actual No 945 5 950
Actual Yes 10 40 50

Based on the preceding number, these statements can be made:

There are 40 true positives (TP) for which the model correctly predicted yes.
There are 945 true negatives (TN) for which the model correctly predicted no.
There are 5 false positives (FP) for which the model incorrectly predicted yes (and the actual answer is no).
There are 10 false negatives (FN) for which the model incorrectly predicted no (and the actual answer is yes).
What is the accuracy of the workplace injury predictive model?

Answer

A

.985
(TP + TN) ÷ (TP + TN + FP + FN);
(40 + 945) ÷ (40 + 945 + 5 + 10) = 0.985

985 / 1000 = .985

Question 4

Q

During the process of training a predictive model, overfitting occurs when

Answer

A

A model is overly tailored to the training data.

Question 5

Q

If a predictive model makes 60 percent positive predictions in a situation in which without the model, only 40 percent of positive predictions would be made by chance, which one of the following is the model’s leverage?

Answer

A

0.20

The model’s leverage is 0.20.

Question 6

Q

A predictive model is applied to a clothing manufacturer’s data of 1,000 employees, 50 of whom had workplace injuries in the past year. The table below shows how often the model correctly and incorrectly predict for each employee “yes, will have an accident” or “no, will not have an accident.”

Predicted No+Predicted Yes =Total (1,000 Employees)

Actual No 945 5 950
Actual Yes 10 40 50

Based on the preceding number, these statements can be made:

There are 40 true positives (TP) for which the model correctly predicted yes.
There are 945 true negatives (TN) for which the model correctly predicted no.
There are 5 false positives (FP) for which the model incorrectly predicted yes (and the actual answer is no).
There are 10 false negatives (FN) for which the model incorrectly predicted no (and the actual answer is yes).
Using the formula, 2 × [(Precision × recall) ÷ Precision + recall)], which in this case is 2 × [(.889 × .80) ÷ (.889 + .80)] = .842 measures the workplace injury predictive model’s

Answer

A

F-score.
Using the formula, 2 × [(Precision × recall) ÷ Precision + recall)],
which in this case is 2 × [(.889 × .80) ÷ (.889 + .80)] = .842 measures the workplace injury predictive model’s F-score.

Question 7

Q

Conducting unsupervised learning before supervised learning may

Answer

A

Provide the information needed to define an appropriate target for supervised learning.

Question 8

Q

In a data mining context, similarity is usually measured as

Answer

A

The distance between two instances’ data points.

Question 9

Q

In the context of a predictive model, a true positive results when the model

Answer

A

Correctly predicts a positive.

Question 10

Q

If an attribute has high information gain, it

Answer

A

Decreases entropy.

Question 11

Q

A predictive model was developed for Shelton Manufacturing to determine the likelihood of current and future employees suffering from hearing loss. The predictive model was applied to Shelton Manufacturing data of 200 employees, 10 of whom developed hearing loss in the past year. Based on the numbers shown in the performance metric below, what is the accuracy of the hearing loss predictive model?

Predicted No +Predicted Yes= Total (200 employees)

Actual No 178 12 190

Actual Yes 2 8 10

Answer

A

0.93
The accuracy of the hearing loss predictive model is 0.93. The following formula is used to measure accuracy (TP + TN) ÷ (TP + TN + FP + FN) (8 +178) ÷ (8 + 178 + 12 + 2) = 186/200 = 0.93

Question 12

Q

Stevens Insurance developed a predictive model that predicts the likelihood that personal automobile policyholders will not renew their policies. The model is based on data on 500 policyholders. The data includes the policyholder name, age, number of vehicles insured, length of time insured with Stevens, and whether the policy renewed or not. Which one of the following would be considered the target variable in the model?

Answer

A

Whether the policy renewed or not
The target variable would be whether the policy renewed or not. That is the attribute whose value is being predicted by the model.

Question 13

Q

A predictive model is applied to a clothing manufacturer’s data of 1,000 employees, 50 of whom had workplace injuries in the past year. The table below shows how often the model correctly and incorrectly predict for each employee “yes, will have an accident” or “no, will not have an accident.”

Predicted No+Predicted Yes =Total (1,000 Employees)

Actual No 945 5 950
Actual Yes 10 40 50
Based on the preceding number, these statements can be made:

There are 40 true positives (TP) for which the model correctly predicted yes.
There are 945 true negatives (TN) for which the model correctly predicted no.
There are 5 false positives (FP) for which the model incorrectly predicted yes (and the actual answer is no).
There are 10 false negatives (FN) for which the model incorrectly predicted no (and the actual answer is yes).
What is the F-score of the workplace injury predictive model?

Answer

A

.842
The recall of the workplace injury predictive model is calculated as 40 ÷ (40 + 10) = .80; The precision of the workplace injury predictive model is calculated as 40 ÷ (40 + 5) = .889; The F-score formula is 2 × [(Precision × recall) ÷ Precision + recall)]; The workplace injury predictive model’s F-score is calculated as 2 × [(.889 × .80) ÷ (.889 + .80)] = .842

Question 14

Q

Which one of the following is the term for the most similar instances in a data model?

Answer

A

Nearest neighbors

Question 15

Q

In the algorithm k nearest neighbor (k-NN), the “k” refers to

Answer

A

The number of neighbors used.

Question 16

Q

Which one of the following best describes why a weighted average gives a more accurate estimate than a simple majority combining function when predicting the value of a target variable by its nearest neighbors?

Answer

A

A majority combining function gives equal weight to all of the nearest neighbors, while a weighted average weights the nearest neighbors’ contributions by their distance.

Question 17

Q

In link prediction, a model attempts to predict

Answer

A

A pair of instances.

Question 18

Q

When training a predictive model, which one of the following is a reason for cross-validation to be used?

Answer

A

A very limited amount of training data is available, and the model’s developers think it unwise to not use some of the data for training because of the need for holdout data

Question 19

Q

In a social networking scenario, which one of the following counts how many people are connected to a person?

Question 20

Q

In predictive modeling terminology, a target variable is

Answer

A

The predefined attribute whose value is being predicted in a predictive model.