Chapter 4: Information-based Learning Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

How do you build perceptive machine learning models?

A

Use the most informative features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In this context what is an informative feature?

A

Descriptive feature whose values split the instances in the dataset into the most homogenous sets with respect to the target feature value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you calculate the average number of questions you have to ask per game?

A

Add the number of paths to get to each person and divide by the number of people

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do we consider the effects of difference answers in terms of?

A
  • How the domain is split up after the answer is received
  • The likelihood of each of the answers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does a decision tree consist of?

A
  • Root node (starting node)
  • Interior nodes
  • Leaf nodes (terminating nodes)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some important information about non-leaf nodes and leaf nodes?

A
  • Each non-leaf node specifies a test to be carried out on one of the query’s descriptive features
  • Each leaf node contains a class label, it specifies a predicted classification for the query
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the process of using a decision tree to make a prediction for a query instance?

A
  • Start by testing the value of the descriptive feature at the root node of the tree
  • The result of the test determines which of the root node’s children the process should descend to
  • The two steps of testing the descriptive feature and descending a level are repeated until the process comes to a leaf node at which a prediction can be made
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the preference for decision trees?

A

Shallower trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do we make shallow trees?

A
  • Testing the informative features early on in the tree
  • We do that with ENTROPY which is a computational metric of the purity of a set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Shannon’s entropy model?

A
  • It defines a computational measure of the impurity of the elements of a set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is entropy related to the probability of an outcome?

A

High probability –> Low entropy
Low probability –> High entropy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do we map probability to entropy value?

A

log functions of the probability multiplied by -1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Shannon’s entropy model?

A
  • A weighted sum of the logs of the probabilities of each of the possible outcomes when we make a random selection from a set
  • It is the cornerstone of modern information theory and is an excellent measure of the impurity (heterogeneity) of a set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the weights used in the sum?

A

The weights used in the sum are the probabilities of the outcomes themselves so that outcomes with high probabilities contribute to the overall entropy of a set than outcomes with low probabilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is there a minus sign at the beginning of the equation?

A

It is added to convert negative numbers returned by the log function to positive ones

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the base of our calculation?

A

We always use base 2 so that entropy is calculated in bits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the relationship between a measure of heterogeneity of a set and predictive analytics?

A

If we can construct a sequence of tests that splits the training data into pure sets with respect to the target feature values then we can label queries by applying the same sequence of tests to a query and labeling it with the target feature value of instances in the set it ends up in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is our intuition for information gain?

A

Our intuition is that the ideal discriminatory feature will partition the data into pure subsets where all the instances in each subset have the same classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the information gain of a descriptive feature?

A

It is a measure of the reduction in the overall entropy of a prediction task by testing on that feature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the first step of the three step process to computing information gain?

A
  • Compute the entropy of the original dataset with respect to the target feature.
  • This gives a measure of how much information is required to organize the data into pure sets
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the second step of the three step process to computing information gain?

A
  • For each descriptive feature, create the sets that result by partitioning the instances in the dataset using their feature values and then sum the entropy scores of each set
  • This gives a measure of the information that remain required to organize the instances into pure sets after we have split them using the descriptive feature
22
Q

What is the third step of the three step process to computing information gain?

A

-Subtract the remaining entropy value (step 2) from the original entropy value (step 1) to give the information gain

23
Q

What does ID3 stand for?

A

Iterative Dichotomizer 3

24
Q

What does ID3 do?

A

It attempts to create the shallowest tree that is consistent with the data that is given

25
Q

How does ID3 work?

A

It builds a tree in a recursive depth-first manner, beginning at the root node and working down to the leaf nodes.

26
Q

What is step 1 of the ID3 algorithm

A

Start by choosing the best descriptive feature to test using information gain (basically best question to ask first)

27
Q

What is step 2 of the ID3 algorithm

A

Add the root node to the tree and label it with the selected test feature

28
Q

What is step 3 of the ID3 algorithm

A
  • Partition the training dataset using the test (the chosen attribute value)
  • One partition is created for each possible test result which contains the training instances that returned that result
29
Q

What is step 4 of the ID3 algorithm

A

A branch is grown from the node for each partition

30
Q

What is step 5 of the ID3 algorithm

A

Repeat the process for each branch using the relevant partition of the training set in place of the full training set and with the selected test feature excluded form further testing

31
Q

What is step 6 of the ID3 algorithm

A

This process is repeated until all the instances in
a partition have the same target level, at which point a leaf node is created and labeled
with that level.

32
Q

What assumption is ID3 algorithm based on?

A

A correct decision tree for a domain will classify instances from that domain in the same proportion as the target level occurs in the domain

33
Q

What is the first thing to remember when designing base cases?

A
  • First, the dataset of training instances considered at each of the interior
    nodes in the tree is not the complete dataset, rather it is the subset of instances considered
    at its parent node that had the relevant feature value for the branch from the parent to the
    current node.
34
Q

What is the second thing to remember when designing base cases?

A

Once a feature has been tested, it is not considered for selection
again along that path in the tree. A feature will only be tested once on any path in the tree,
but it may occur several times in the tree on different paths.

35
Q

What is the first ID3 stopping condition?

A
  • All instances in the dataset have the same classification (target value)
  • Return a leaf node with that classification as its label
36
Q

What is the first ID3 stopping condition?

A
  • The set of features left to test is empty
  • Return a leaf node tree with the majority class of the dataset as its classification/label
37
Q

What is the third ID3 stopping condition?

A
  • The dataset is empty
  • Return a leaf node tree with the majority class of the dataset at the parent node that made the recursive call
38
Q

What is the first step to recursively create interior nodes?

A
  • Decide which descriptive feature should be tested at this node (use information gain)
  • This is based on purity and homogeneity of the resulting partition
39
Q

What is the second step to recursively create interior nodes?

A

After choosing the most informative feature, the algorithm adds a new node the the tree
- It then splits the dataset considered at this node according the levels the new node can take

40
Q

What is the third step to recursively create interior nodes?

A

Remove the feature of the new node form the set of features to be considered for testing later on

41
Q

What is the fourth step to recursively create interior nodes?

A

The algorithm grows a branch in the tree for each of the values in the domain of the new node

42
Q

How do you address the issue of preferencing features with many levels?

A
  • Information gain ratio
  • Computed by dividing the information gain of a feature by the amount of information used to determine the value of the feature
43
Q

What is another commonly used measure of impurity?

A

Gini Index

44
Q

What is Gini Index?

A

It is calculating how often you would misclassify an instance in the dataset if you classifies it based on the distribution of classifications in the dataset

45
Q

How do you calculate information gain using the Gini index?

A

Replace the entropy measure with the Gini index

46
Q

How do you determine which impurity threshold to use between Gini or entropy?

A

Try out different impurity metrics and compare the results to see which suits a dataset best

47
Q

What is CART?

A
  • Classification And Regression Tree
  • another version of ID3 algorithm which uses Gini Index as replacement for Information Gain
48
Q

What is the easiest way to handle continuous valued descriptive features?

A

Turn them into Boolean features by defining a threshold and using it to partition the instances based on their value of the continuous descriptive feature

49
Q

How do we set the threshold?

A
  • The instances in the dataset are sorted according to the continuous feature values
  • The adjacent instances in the ordering that have different classifications are selected as possible threshold points
  • The optimal threshold is found by computing the information gain for each classification boundary and selecting the boundary with the highest information gain as the threshold
50
Q

What happens once the threshold has been set?

A
  • The dynamically created new Boolean feature can compete with the other categorical features for selection as the splitting feature at that node
  • The process is repeated at each node as the tree grows
51
Q
A