5 - Supervised Learning, Classification Flashcards by Jennifer Schuster

Classification: Assign instances to predefined classes

Data basis:

several (independent) attributes
one (dependent) attribute

Condition:
- a priori knowledge of classification for some instances (supervised learning)

Model building:

generate rules from classified instances
> first: generate the best fit
> then: prune based on validation set

Generalization:
- apply rules to new instances

How well did you know this?

Not at all

Perfectly

Classification: Assign instances tp predefined classes

Exemplary methods

logistic regression
support vector machines
decision trees, regression trees
random forest
neural networks, nearest neighbor

How well did you know this?

Not at all

Perfectly

Classification Examples

Can you think of binary vs. nominal classes for these examples?

credit scoring
marketing responses
geo-temporal events

Credit scoring:
- nominal

Marketing responses:
- binary: response vs. no response

Geo-Temporal events:
- can be both

How well did you know this?

Not at all

Perfectly

Decision Tree Terminology

Which types of trees are there?

Binary tree:
- each node splits data at most in 2 sets

Classification tree:
- splits can lead to more than 2 branches

Decision tree:
- classes are nominal (categorical) or ordinal

Regression tree:
- classes are cardinal (continuous values)

How well did you know this?

Not at all

Perfectly

Decision Tree Terminology

Input

instance pool ((x1, …, xn), c)

with
x = set of independent attributes
c = class attribute

How well did you know this?

Not at all

Perfectly

Decision Tree Terminology

Output

Full tree

How well did you know this?

Not at all

Perfectly

Decision Tree Terminology

Objective

Formulate rules of the type:

If (condition 1) AND … AND (condition n) then c

How well did you know this?

Not at all

Perfectly

Decision Tree Terminology

Rule

Path from root to leaf

How well did you know this?

Not at all

Perfectly

Generating a decision tree

Algorithm steps

Start: all objects are in a single node
search for the best classification criterion
classify all objects according to this criterion
Recursively apply steps 2 and 3 until stop
Go back and prune the tree

How well did you know this?

Not at all

Perfectly

Generating a decision tree

Algorithm design varies in …

stop criteria: e.g. number of instances per class, tree depths, homogeneity measurements (“Gini Index”)
pruning strategy
choice of attributes as classification criterion (split quality)
number of splits per node
scales of measurement

How well did you know this?

Not at all

Perfectly

Which decision tree algorithms are there?

(CH)AID
- (chi-squared) automatic interaction detection

CART
- classification and regression trees

ID3
- iterative dichotomizer

How well did you know this?

Not at all

Perfectly

Which decision tree algorithms are there?

(CH)AID

(chi-squared) automatic interaction detection

objective: find significantly different subsets of data
select attributes that generate significantly different subsets

How well did you know this?

Not at all

Perfectly

Which decision tree algorithms are there?

CART

Classification and regression trees

objective: maximize the information content I
select attributes that split the data with the best success quota
only binary trees

How well did you know this?

Not at all

Perfectly

Which decision tree algorithms are there?

ID3

Interactive dichotomizer 3

objective: minimize entropy
split on attribute that produces subsets with minimal entropy

How well did you know this?

Not at all

Perfectly

ID3: classification by entropy

Compute the entropy (i.e. the homogeneity) of a given set of instances according to the target attribute c
entropy measures the homogeneity according to c

How well did you know this?

Not at all

Perfectly

Evaluating splits by information entropy

Study These Flashcards

compute the information entropy of a split according to the weighted sum over the entropies of the new subsets
choose the split that results in the lowest entropy (-> this provides the highest information gain)

Decision tree pruning

What is pruning?

Study These Flashcards

Simplify complicated decision trees to increase efficiency and avoid over-fitting oder under-fitting

Decision tree pruning

What kinds of pruning are there?

Study These Flashcards

Top-down-pruning:
-> stopping criteria when building trees

Bottom-up-pruning:

> ex post pruning
pruning of splits that do not increase subset homogeneity sufficiently
pruning to undo over-fitting based on validation set: prune tree parts that do not significantly increase the success quota

Decision tree quality

Which indicators do you use to evaluate decision tree quality?

Study These Flashcards

number of leaves
- number of generated rules (can be too many or too few)

depths of tree
- maximum rule length

external path length
- sum of all path lengths from root to leaf, determines memory requirements

weighted external path length
- sum of path lengths from root to leaf multiplied by the number of instances represented, measures classification costs

Decision trees: conclusion

Study These Flashcards

rules are easily understood and interpreted, relatively easy to determine
can model non-linear relationships
rule set can become very complex, therefore careful pruning is important

Random forests

What are random forests?

Study These Flashcards

Random forests are an example for approaches that generate several randomized instances of a model and classify data based on the aggregated results from this model set

Random forests

How do random forests work?

Study These Flashcards

Generation: Generate k trees by

drawing a number of instances (with replacement) to generate a specific training set
generate the model based on the specific training set

Generalisation: For each instance in the test set

classify the instance according to each of the k trees
assign the most frequently determined class (or assign probabilities)

Gradient boosted trees

Steps

Study These Flashcards

Initialise a prediction regression model with a constant value
compute so-called pseudo-residuals
extend the model by creating a regression tree to predict the pseudo residuals
- for any region defined by a rule, the tree applies a constant set of coefficient to predict the outcome
- classical boosting: find a multiplication to adjust the coefficients in order to minimize the loss function
- TreeBoost algorithm: one multiplicator per region
apply the model and repeat from step 2 for M iterations

Support Vector Machines

Study These Flashcards

build a linear discriminant function to separate two classes as widely as possible
critical boundary instances are termed support vectors

Neural Networks What are neural networks?

- neural networks are a tool of "artificial intelligence", as they imitate some concepts of the brain Neural networks connect several simple models in a hierarchical structure - the simple models are termed units, perceptrons, or artificial neurons - neurons are massively interconnected, decomposing problems and forwarding and altering signals

Neural Networks Structure

The structure of the network and the weights can be learned via backpropagation - modify weights based on the unit's contribution to accurate solutions We differentiate neural networks also by their number of layers involved

5 - Supervised Learning, Classification Flashcards

(26 cards)