CH 3 Predictive Modeling Flashcards

1
Q

Which one of the following statements is correct?

A

An algorithm is a set of steps used to solve a problem or complete a process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When examining a model’s results, insurance and risk management professionals should defer to

A

Their professional experience.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A predictive model is applied to a clothing manufacturer’s data of 1,000 employees, 50 of whom had workplace injuries in the past year. The table below shows how often the model correctly and incorrectly predict for each employee “yes, will have an accident” or “no, will not have an accident.”

Predicted No +Predicted Yes =Total (1,000 Employees)

Actual No 945 5 950
Actual Yes 10 40 50

Based on the preceding number, these statements can be made:

There are 40 true positives (TP) for which the model correctly predicted yes.
There are 945 true negatives (TN) for which the model correctly predicted no.
There are 5 false positives (FP) for which the model incorrectly predicted yes (and the actual answer is no).
There are 10 false negatives (FN) for which the model incorrectly predicted no (and the actual answer is yes).
What is the accuracy of the workplace injury predictive model?

A

.985
(TP + TN) ÷ (TP + TN + FP + FN);
(40 + 945) ÷ (40 + 945 + 5 + 10) = 0.985

985 / 1000 = .985

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

During the process of training a predictive model, overfitting occurs when

A

A model is overly tailored to the training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

If a predictive model makes 60 percent positive predictions in a situation in which without the model, only 40 percent of positive predictions would be made by chance, which one of the following is the model’s leverage?

A

0.20

The model’s leverage is 0.20.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A predictive model is applied to a clothing manufacturer’s data of 1,000 employees, 50 of whom had workplace injuries in the past year. The table below shows how often the model correctly and incorrectly predict for each employee “yes, will have an accident” or “no, will not have an accident.”

Predicted No+Predicted Yes =Total (1,000 Employees)

Actual No 945 5 950
Actual Yes 10 40 50

Based on the preceding number, these statements can be made:

There are 40 true positives (TP) for which the model correctly predicted yes.
There are 945 true negatives (TN) for which the model correctly predicted no.
There are 5 false positives (FP) for which the model incorrectly predicted yes (and the actual answer is no).
There are 10 false negatives (FN) for which the model incorrectly predicted no (and the actual answer is yes).
Using the formula, 2 × [(Precision × recall) ÷ Precision + recall)], which in this case is 2 × [(.889 × .80) ÷ (.889 + .80)] = .842 measures the workplace injury predictive model’s

A

F-score.
Using the formula, 2 × [(Precision × recall) ÷ Precision + recall)],
which in this case is 2 × [(.889 × .80) ÷ (.889 + .80)] = .842 measures the workplace injury predictive model’s F-score.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Conducting unsupervised learning before supervised learning may

A

Provide the information needed to define an appropriate target for supervised learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In a data mining context, similarity is usually measured as

A

The distance between two instances’ data points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In the context of a predictive model, a true positive results when the model

A

Correctly predicts a positive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

If an attribute has high information gain, it

A

Decreases entropy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A predictive model was developed for Shelton Manufacturing to determine the likelihood of current and future employees suffering from hearing loss. The predictive model was applied to Shelton Manufacturing data of 200 employees, 10 of whom developed hearing loss in the past year. Based on the numbers shown in the performance metric below, what is the accuracy of the hearing loss predictive model?

Predicted No +Predicted Yes= Total (200 employees)

Actual No 178 12 190

Actual Yes 2 8 10

A

0.93
The accuracy of the hearing loss predictive model is 0.93. The following formula is used to measure accuracy (TP + TN) ÷ (TP + TN + FP + FN) (8 +178) ÷ (8 + 178 + 12 + 2) = 186/200 = 0.93

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Stevens Insurance developed a predictive model that predicts the likelihood that personal automobile policyholders will not renew their policies. The model is based on data on 500 policyholders. The data includes the policyholder name, age, number of vehicles insured, length of time insured with Stevens, and whether the policy renewed or not. Which one of the following would be considered the target variable in the model?

A

Whether the policy renewed or not
The target variable would be whether the policy renewed or not. That is the attribute whose value is being predicted by the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A predictive model is applied to a clothing manufacturer’s data of 1,000 employees, 50 of whom had workplace injuries in the past year. The table below shows how often the model correctly and incorrectly predict for each employee “yes, will have an accident” or “no, will not have an accident.”

Predicted No+Predicted Yes =Total (1,000 Employees)

Actual No 945 5 950
Actual Yes 10 40 50
Based on the preceding number, these statements can be made:

There are 40 true positives (TP) for which the model correctly predicted yes.
There are 945 true negatives (TN) for which the model correctly predicted no.
There are 5 false positives (FP) for which the model incorrectly predicted yes (and the actual answer is no).
There are 10 false negatives (FN) for which the model incorrectly predicted no (and the actual answer is yes).
What is the F-score of the workplace injury predictive model?

A

.842
The recall of the workplace injury predictive model is calculated as 40 ÷ (40 + 10) = .80; The precision of the workplace injury predictive model is calculated as 40 ÷ (40 + 5) = .889; The F-score formula is 2 × [(Precision × recall) ÷ Precision + recall)]; The workplace injury predictive model’s F-score is calculated as 2 × [(.889 × .80) ÷ (.889 + .80)] = .842

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which one of the following is the term for the most similar instances in a data model?

A

Nearest neighbors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In the algorithm k nearest neighbor (k-NN), the “k” refers to

A

The number of neighbors used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which one of the following best describes why a weighted average gives a more accurate estimate than a simple majority combining function when predicting the value of a target variable by its nearest neighbors?

A

A majority combining function gives equal weight to all of the nearest neighbors, while a weighted average weights the nearest neighbors’ contributions by their distance.

17
Q

In link prediction, a model attempts to predict

A

A pair of instances.

18
Q

When training a predictive model, which one of the following is a reason for cross-validation to be used?

A

A very limited amount of training data is available, and the model’s developers think it unwise to not use some of the data for training because of the need for holdout data

19
Q

In a social networking scenario, which one of the following counts how many people are connected to a person?

A

Degree

20
Q

In predictive modeling terminology, a target variable is

A

The predefined attribute whose value is being predicted in a predictive model.