12 Learning From Examples Flashcards

Question 1

Q

General model of learning agents

Answer

A

Performance element:
Carries out the task of the agent, i.e. processes percepts and decides on actions

Learning element:
Proposes improvements of the performance element, based on previous knowledge and feedback

Critic:
Evaluates performance element by comparing results of its actions with imposed performance standards

Problem generator:
Proposes exploratory actions to increase knowledge

Question 2

Q

Aspects of the learning element

Answer

A

Which components of the performance element are to be improved. Which parts of the agent’s knowledge base is targeted? What feedback is available. Supervised, unsupervised or reinforcement learning differ in type of feedback agent receives. What representation is used for the components? E.g. logic sentences, belief networks, utility functions, etc. What prior information (knowledge) is available?

Question 3

Q

Performance element components
Possible components that can be improved

Answer

A

Possible components that can be improved:
• Direct mapping from states to actions
• Means to infer world properties from percept sequences • Information about how the world evolves
• Information about the results of possible actions
• Utility information about the desirability of world states
• Desirability of specific actions in specific states
• Goals describing states that maximize utility

In each case, learning can be sees as learning an unknown function y = f(x)

Question 4

Q

Hypothesis space H

Answer

A

H: the set of hypothesis functions h to be considered in searching for f(x). Consistent hypothesis: Fits with all data

If several consistent hypotheses - choose simplest one! (Occam’s razor)

Realizability of learning problem:
• Realizable if H contains the “true” function
• Unrealizable if not
• We do normally know what the true function is

Why not choose H as large as possible? May be very inefficient in learning and in applying.

Question 5

Q

Types of learning

Answer

A

Knowledge
Inductive learning:
• Given a collection of examples (x, f (x))
• Return a function h that approximates f
• Does not rely on prior knowledge (“just data”)

Deductive (or analytical) learning:
• Going from known general f to a new f0 that is logically entailed • Based on prior knowledge (“data+knowledge”)
• Resemble more human learning

Feedback
Unsupervised learning:
Agent learns patterns in data even though no feedback is given, e.g. via clustering

Reinforcement learning:
Agent gets reward or punishment at the end, but is not told which particular action led to the result

Supervised learning:
Agent receives learning examples and is explicitly told what the correct answer is for each case

Mixed modes, e.g. semi-supervised learning: Correct answers for some but not all examples

Question 6

Q

Learning decision trees

Answer

A

Learning decision trees
A decision situation can be described by:
• A number of attributes, each with a set of possible values
• A decision which may be Boolean (yes/no) or multivalued

A decision tree is a tree structure where:

Each internal node represents a test of the value of an attribute, with one branch for each possible attribute value
Each leaf node represents the value of the decision if that node is reached

Decision tree learning is one of simplest and most successful forms of machine learning. An example of inductive and supervised learning

Question 7

Q

Expressiveness of decision trees

Answer

A

The tree is equivalent to a conjunction of implications
∀r Patrons(r,FULL) ^ WaitEstimate(r, 10-30) ^ Hungry(r,No) => WillWait(r)

Cannot represent tests on two or more objects, restricted to testing attributes of one object. Fully expressive as propositional language, e.g. any Boolean function can be written as a decision tree. For some functions, exponentially large decision trees are required. E.g. decision trees are good for some functions and bad for others.

Question 8

Q

Inducing decision trees from examples

Answer

A

Terminology:
• Example - Specific values for all attributes, plus goal predicate
• Classification - Value of goal predicate of the example
• Positive/negative example - Goal predicate is true/false
• Training set - Complete set of examples

The task of inducing a decision tree from a training set is to find the simplest tree that agrees with the examples. The resulting tree should be more compact and general than the training set itself.

Question 9

Q

Decision trees: Induction algorithm

Answer

A

General idea
Test the most important attribute first, i.e. the one that makes the most difference to the classification.

Example: Patrons? is a good choice for the first attribute, because it allows early decisions. Apply same principle recursively.

Recursive step of induction algorithm
The attribute test splits the tree into smaller decision trees, with fewer examples and one attribute less. Four cases to consider for the smaller trees:
• If some positive and some negative examples, choose best attribute to split them
• If examples are all positive (negative), answer Yes (No)
• If no examples left, return a default value (no example observed for this case)
• If no attributes left, but both positive and negative examples: Problem! (same description, different classifications - noise)

[Image: The induced tree is simpler than the original “manual” tree. It captures some regularities that the original creator was unaware of.]

Question 10

Q

Broaden applicability of decision trees

Answer

A

Broaden applicability of decision trees

Missing data: How to handle training samples with partially missing attribute values

Multi/many-valued attributes: How to treat attributes with many possible values

Continuous or integer-valued input attributes: How to branch the decision tree when attribute has a continuous value range

Continuous-valued output attributes: Requires regression tree rather than a decision tree, i.e. output value is a linear function of input variables rather than a point value

Question 11

Q

Assessing learning performance

Answer

A

Collect large set of examples. Divide into two disjoint sets, training set and test set. Use learning algorithm on training set to generate hypothesis h. Measure percentage of examples in test set that are correctly classified by h. Repeat steps above for differently sized training sets.