Chapter 3 Flashcards

1
Q

What is the definition of Information regarding informative attributes ?

A

Information is a quantity that reduces uncertainty about something.
The better the information the more uncertainty is reduced.
Selecting informative attributes provides an intelligent method for selecting an informative subset of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is tree induction method ?

A

Tree induction is a tree-structured model incorporates the idea of supervised segmentation in an elegant manner, repeatedly selecting informative attributes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a model ?

A

A model is simplified representation of reality created to serve a purpose.
In data science, a predictive model is a formula for estimating the unknown value of interest: the target.
In data science, prediction mean to estimate an unknown value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the difference between predictive and descriptive modeling ?

A

Predictive modeling main purpose is to estimate a value. (what customers who churn typically look like ?)
Where descriptive main purpose is to gain insight into the underlying phenomenon . (why do people churn ?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is supervised learning model ?

A

Supervised learning is model creation where the model describes a relationship between a set of selected attributes and a predefined variable called the target variable.
The model estimates the value of the target variable as a function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is model induction ?

A

It is the creation of model from data. Induction is generalizing from specific cases to general rules.
The input data for the induction algorithm is the Training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is purity measure (entropy) ?

A

Purity measure is a formula that evaluates how well each attributes splits a set of examples into segments with respect to target variable. Entropy is a measure of disorder that can be applied to a set.
The most common splitting creation is called information gain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the entropy equation ?

A

entropy = p1 x log(p1) - p2 x log(p2) - ….

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is information gain ?

A
information gain (IG) is used to know how informative an attribute is with respect to our target variable.
How much information gain it gives us about the value of the target variable.
information gain measures how much an attribute improves/ decrease entropy over the whole segmentation it creates.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the entropy equation ?

A

IG (parent, children) = entropy (parent) - [p(c1) x entropy(c1) + p(c2) x entropy(c2) ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to calculate purity for numeric values (regression) ?

A

The natural measure of impurity for numeric values is variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a classification (decision) tree ?

A
A classification (decision) tree is made up of root in the top and nodes where each node in the tree contains a test of an attribute.
Each path terminates at a leaf, each leaf correspond to a segment and the attributes along the path give the characteristics of the segment. 
Each leaf contain a value for the target variable.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the goal of the decision tree ?

A

To provide a supervised segmentation, to partition the instances, based on their attributes, into subgroups that have similar values for their target variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the frequency-based estimate of class membership probability ?

A

the frequency-based estimate is the probability of any new instance being positive is n/(n+m)
Where n is positive instances and m negative instances of the leaf.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Laplace correction ?

A
If a leaf have only one single instance (less evidence), then the probability that members of the segment will belong to the class is 100%. n/(n+m)
The Laplace will reduce it to 75% to reflect the uncertainty.
To avoid over-fitting instead of computing the frequency, we can use smoothed version of the frequency-based estimate.
The purpose of which is to moderate the influence of leaves with only a few instances. 

Laplace: n+1 / (n+m+2)

the effect of Laplace on high number of instances decrease.
e.g with 20 instances 20+1 / 20+2 = 0.95
where with 2 instances 2+1 / 2+2 = 0.75

How well did you know this?
1
Not at all
2
3
4
5
Perfectly