Lecture 4: Decision Trees and k-means clustering Flashcards

Question 1

Q

What is A=>B

Answer

A

A implies B

Question 2

Q

What is Deduction?

Answer

A

Conclusion follows necessary from the premi

From A=>B and A, we conclude that B

Example:
“all men are mortal”

socrates is mortal

Question 3

Q

Abduction

Answer

A

Conclusion is one hypothetical (most probable) explanation for the premises

From A ⇒ B and B, we conclude A
 Ex:
Drunk people do not walk straight.

John does not walk straight.
John is drunk.

Not sound… but may be most likely explanation for B

Question 4

Q

Induction

Answer

A

 Conclusion about all members of a class from the examination of only a few member of the class.
From A ∧ C ⇒ B and A ∧ D ⇒ B, we conclude A⇒B
 We construct a general explanation based on a specific
case.
 Ex:
All CS students in COMP 472 are smart.
All CS students on vacation are smart.
All CS students are smart.
 Not sound
 But, can be seen as hypothesis construction or generalisation

Question 5

Q

What is Inductive Learning?

Answer

A

 = learning from examples
 Most work in ML
 Examples are given (positive and/or negative) to train a
system in a classification (or regression) task

 Given a new instance X you have never seen
 You must find an estimate of the function f(X) where f(X) is
the desired output

Question 6

Q

What is the framework for inductive learning

Answer

A

 Input data are represented by a vector of features (attributes), X
 Each vector X is a list of (attribute, value) pairs.
 Ex: X = [nose:big, teeth:big, eyes:big, moustache:no]
 The number of attributes is fixed (positive, finite)
 Each attribute has a fixed, finite number of possible values
 Each example can be interpreted as a point in a n-dimensional feature space
 where n is the number of attributes

Question 7

Q

What are 3 common techniques in Machine Learning?

Answer

A

Probabilistic Methods
 ex: Naïve Bayes Classifier
Decision Trees
 Use only discriminating features as questions in a big if-then-else
tree
Neural networks
 Also called parallel distributed processing or connectionist systems
 Intelligence arise from having a large number of simple
computational units

Question 8

Q

How does a decision tree work?

Answer

A

 Look for features that are very good indicators of the result, place these features (as questions) in nodes of
the tree
 Split the examples so that those with different values for the chosen feature are in a different set
 Repeat the same process with another feature

Question 9

Q

How to select attribute in a decision tree?

Answer

A

search the space of all decision trees
 always pick the next attribute to
split the data based on its
“discriminating power”
(information gain)

Question 10

Q

What are the 4 different factors that quanitify the size of tree?

Answer

A

 Number of leaves
 Height of the tree
 External Path Length
 Weighted External Path Length

Question 11

Q

What is height of a tree?

Answer

A

 Longest path in the tree from the root to a leaf

Question 12

Q

What is External Path length?

Answer

A

 Start at leaf, go up to the root and count the number of
edges
 Do this for every leaf and add up the numbers

Question 13

Q

What is weighted external path length?

Answer

A

Weighted External Path Length
 Idea: not all paths are equally important/likely
 Use the training data to computed a weighted sum

Question 14

Q

What is the equation for entropy?

Answer

A

H(X) = -(summation from i = i to n) [p(xi)log2p(xi)]
where n = possible outcomes

Question 15

Q

Formula to choose the best feature

Answer

A

gain(S,A) = H(S) - H(S|A)
= H(S) - (summation) |Sv|*H(Sv) / |S|

Question 16

Q

What are the benefits of decision trees?

Answer

Study These Flashcards

A

 One of the most widely used learning methods in
practice
 Fast, simple, and traceable (explainable AI!)
 Can out-perform human experts in many problems

Question 17

Q

What is the equation of F-Measure and how does beta affect it?

Answer

Study These Flashcards

A

A weighted combination of precision and recall
F = [(B^2 + 1) PR ]/ [(B^2)P + R]

 β represents the relative importance of precision
and recall
 when β = 1, precision & recall have same importance
 when β > 1, precision is favored
 when β < 1, recall is favored

Question 18

Q

What is clustering?

Answer

Study These Flashcards

A

The organization of unlabeled data into similarity groups called clusters.
A cluster is a collection of data items which are “similar”
between them, and “dissimilar” to data items in other
clusters

Question 19

Q

Answer

Study These Flashcards

A

Lecture 4: Decision Trees and k-means clustering Flashcards

(19 cards)