Session 3 Flashcards

Question 1

Q

We have a particular life insurance product we would like to sell, we have a nice offer, but we incur a cost to target it. How should we proceed?

Answer

A

Define target
Collect data
Build a model
Predict outcomes

Question 2

Q

How to choose at each step which of the attributes to use to segment the population?

Answer

A

General rule: resulting groups to be as pure as possible

i.e., homogeneous with respect to the target variable.

Question 3

Q

The concept of information provides a way to…

Answer

A

… quantify the amount of surprise for an event measured in bits.

Question 4

Q

Intuition

Answer

A

the events that are rare (low probability) are more surprising and therefore contain more information than those events that are common (high probability)

Question 5

Q

Entropy

Answer

A

Disorder corresponds to how mixed (impure) a segment is
Entropy is zero at minimum disorder (all members belong to the same class)
Entropy is one at maximal disorder (members equally distributed among classes)

Question 6

Q

Information Gain

Answer

A

Information gain (IG) measures the change in entropy due to any amount of new information being added
Information gain measures how much an attribute decreases entropy over the whole segmentation it creates

Question 7

Q

How to choose at each step which of the attributes to use to segment the population?

Answer

A

Rule: choose the variable that provides the most information gain with respect to the target variable

Question 8

Q

Do decision trees evaluate the information gain of all the variables at each split?

Question 9

Q

Can we use the same variable to split the data more than once?

Question 10

Q

How is the split done for continuous

variables (e.g., income)?

Answer

A

Different thresholds are tested; threshold

with highest IG is used

Question 11

Q

The confusion matrix

Answer

A

The confusion matrix allows visualization of the performance of a model

Question 12

Q

True Positives (TP)

Answer

A

actual positives correctly predicted as

positive

Question 13

Q

True Negatives (TN)

Answer

A

actual negatives correctly predicted as

negative

Question 14

Q

False Positives (FP)

Answer

A

negatives incorrectly predicted as positive

Question 15

Q

False Negatives (FN)

Answer

A

positives incorrectly predicted as negative

Question 16

Q

RapidMiner Studio

Answer

Study These Flashcards

A

is a commercial software that provides an integrated environment for machine learning and business analytics

It is a good tool for teaching the basic data science concepts
It is free to use for small data sets (up to 10,000 observations)

Session 3 Flashcards

(16 cards)