12 Learning From Examples Flashcards
General model of learning agents
Performance element:
Carries out the task of the agent, i.e. processes percepts and decides on actions
Learning element:
Proposes improvements of the performance element, based on previous knowledge and feedback
Critic:
Evaluates performance element by comparing results of its actions with imposed performance standards
Problem generator:
Proposes exploratory actions to increase knowledge
Aspects of the learning element
Which components of the performance element are to be improved. Which parts of the agent’s knowledge base is targeted? What feedback is available. Supervised, unsupervised or reinforcement learning differ in type of feedback agent receives. What representation is used for the components? E.g. logic sentences, belief networks, utility functions, etc. What prior information (knowledge) is available?
Performance element components
Possible components that can be improved
Possible components that can be improved:
• Direct mapping from states to actions
• Means to infer world properties from percept sequences • Information about how the world evolves
• Information about the results of possible actions
• Utility information about the desirability of world states
• Desirability of specific actions in specific states
• Goals describing states that maximize utility
In each case, learning can be sees as learning an unknown function y = f(x)
Hypothesis space H
H: the set of hypothesis functions h to be considered in searching for f(x). Consistent hypothesis: Fits with all data
If several consistent hypotheses - choose simplest one! (Occam’s razor)
Realizability of learning problem:
• Realizable if H contains the “true” function
• Unrealizable if not
• We do normally know what the true function is
Why not choose H as large as possible? May be very inefficient in learning and in applying.
Types of learning
Knowledge
Inductive learning:
• Given a collection of examples (x, f (x))
• Return a function h that approximates f
• Does not rely on prior knowledge (“just data”)
Deductive (or analytical) learning:
• Going from known general f to a new f0 that is logically entailed • Based on prior knowledge (“data+knowledge”)
• Resemble more human learning
Feedback
Unsupervised learning:
Agent learns patterns in data even though no feedback is given, e.g. via clustering
Reinforcement learning:
Agent gets reward or punishment at the end, but is not told which particular action led to the result
Supervised learning:
Agent receives learning examples and is explicitly told what the correct answer is for each case
Mixed modes, e.g. semi-supervised learning: Correct answers for some but not all examples
Learning decision trees
Learning decision trees
A decision situation can be described by:
• A number of attributes, each with a set of possible values
• A decision which may be Boolean (yes/no) or multivalued
A decision tree is a tree structure where:
- Each internal node represents a test of the value of an attribute, with one branch for each possible attribute value
- Each leaf node represents the value of the decision if that node is reached
Decision tree learning is one of simplest and most successful forms of machine learning. An example of inductive and supervised learning
Expressiveness of decision trees
The tree is equivalent to a conjunction of implications
∀r Patrons(r,FULL) ^ WaitEstimate(r, 10-30) ^ Hungry(r,No) => WillWait(r)
Cannot represent tests on two or more objects, restricted to testing attributes of one object. Fully expressive as propositional language, e.g. any Boolean function can be written as a decision tree. For some functions, exponentially large decision trees are required. E.g. decision trees are good for some functions and bad for others.
Inducing decision trees from examples
Terminology:
• Example - Specific values for all attributes, plus goal predicate
• Classification - Value of goal predicate of the example
• Positive/negative example - Goal predicate is true/false
• Training set - Complete set of examples
The task of inducing a decision tree from a training set is to find the simplest tree that agrees with the examples. The resulting tree should be more compact and general than the training set itself.
Decision trees: Induction algorithm
General idea
Test the most important attribute first, i.e. the one that makes the most difference to the classification.
Example: Patrons? is a good choice for the first attribute, because it allows early decisions. Apply same principle recursively.
Recursive step of induction algorithm
The attribute test splits the tree into smaller decision trees, with fewer examples and one attribute less. Four cases to consider for the smaller trees:
• If some positive and some negative examples, choose best attribute to split them
• If examples are all positive (negative), answer Yes (No)
• If no examples left, return a default value (no example observed for this case)
• If no attributes left, but both positive and negative examples: Problem! (same description, different classifications - noise)
[Image: The induced tree is simpler than the original “manual” tree. It captures some regularities that the original creator was unaware of.]
Broaden applicability of decision trees
Broaden applicability of decision trees
Missing data: How to handle training samples with partially missing attribute values
Multi/many-valued attributes: How to treat attributes with many possible values
Continuous or integer-valued input attributes: How to branch the decision tree when attribute has a continuous value range
Continuous-valued output attributes: Requires regression tree rather than a decision tree, i.e. output value is a linear function of input variables rather than a point value
Assessing learning performance
Collect large set of examples. Divide into two disjoint sets, training set and test set. Use learning algorithm on training set to generate hypothesis h. Measure percentage of examples in test set that are correctly classified by h. Repeat steps above for differently sized training sets.