W05 Supervised Learning Flashcards
Classification:
data basis
several independent attributes
one dependent attribute, the class
Classification:
condition
a priori knowledge of classification for some instances (supervised learning!)
Classification:
model building
generate rules from classified instances
first: generate best fit
then: prune based on validation set
Classification:
generalization
apply rules to new instances
Classification:
methods
logistic regression naive bayes classifier support vector machines decision trees random forest neural networks nearest neighbour
Decision Tree Terminology:
Binary Tree
each node splits data at most in 2 sets
Decision Tree Terminology:
Classification Tree
split can lead to >2 branches
Decision Tree Terminology:
Decision Tree
Nominal (categorical) Classes
Decision Tree Terminology:
Regression Tree
Cardinal Classes
Decision Tree Terminology:
Input
Instance pool
Decision Tree Terminology:
Output
Full Tree
Decision Tree Terminology:
Objective
Formulate rules of type:
If condition 1-n, THEN condition n
Decision Tree Terminology:
Rule
Path from root to leaf
Generating a decision tree algorithm
1 all objects in single node 2 search for best classification criterion 3 classify all objects accordingly 4 recusively apply 2+3 until STOP 5 prune tree
Classifcation algorithms variety
1 stop criteria 2 pruning strategy 3 choice of attributes as classification criterion 4 number of splits per node 5 scale of measurement
(CH)AID
chi squared automatic interaction detection
->find significantly different subsets of data so select attributes to generate those
CART
classification and regression trees
->maximize information content i so select attributes accordingly
ID3
iterative dichotomizer 3
->minimize entropy so split on attribute producing that
Entropy Formula
H(S) = - sum [pi*log2pi]
Information Entropy formula
I(a) = sum[qi+H(S)]
Decision Tree Pruning
simplify complicated decision trees to incease efficiency and avoid over-fitting
top-down pruning->stopping criteria when building trees
bottom-up-pruning
->ex post:
prune splits that not increase subset homogeneeity sufficiently
prune to undo over-fitting based on validation set: prune tree parts that not increase success quota
Decision Tree Properties:
number of generated rules
number of leaves
Decision Tree Properties
maximum rule length
depth of tree
Decision Tree Properties
sum of all path lengths from root fo leaf
external path length;
determines memory requirements
Decision Tree Properties
sum of path lengths from root to leaf multiplied by number of represented instances
weighted external length;
measures classification costs
Decision Trees:
understandability?
relationships?
too complex rules?
- high understandability and interpretability
- non-linear relationships
- pruning important
Random Forest
-several randomised instances of model - use aggreagated results for clasification
1 generate k trees by drawing with replacement k times
2 generalize by classifying with k trees and choose most frequently determined class
Gradient Boosted Trees
1 initialize a prediction model with constant vlaue
2 compute pseudo-residuals
3 extend model by creating a regression tree to predict pseudo residuals
4 apply and repeat 2 for M iterations
Support Vector Machines
built a linear discriminant function to separate two classes as widely as possible
critical boundary instances are termed support vectors
Neural Networks
imitate concepts of brain
connect several simple models in hierarchical strucure
simple models are perceptrons, massively interconnected, decomposing problems and forwarding them
backpropagation -> modify weights based on contribution to accurate solution