W05 Supervised Learning Flashcards

1
Q

Classification:

data basis

A

several independent attributes

one dependent attribute, the class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Classification:

condition

A

a priori knowledge of classification for some instances (supervised learning!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Classification:

model building

A

generate rules from classified instances

first: generate best fit
then: prune based on validation set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Classification:

generalization

A

apply rules to new instances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Classification:

methods

A
logistic regression
naive bayes classifier
support vector machines
decision trees
random forest
neural networks
nearest neighbour
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Decision Tree Terminology:

Binary Tree

A

each node splits data at most in 2 sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Decision Tree Terminology:

Classification Tree

A

split can lead to >2 branches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Decision Tree Terminology:

Decision Tree

A

Nominal (categorical) Classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Decision Tree Terminology:

Regression Tree

A

Cardinal Classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Decision Tree Terminology:

Input

A

Instance pool

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Decision Tree Terminology:

Output

A

Full Tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Decision Tree Terminology:

Objective

A

Formulate rules of type:

If condition 1-n, THEN condition n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Decision Tree Terminology:

Rule

A

Path from root to leaf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Generating a decision tree algorithm

A
1 all objects in single node
2 search for best classification criterion
3 classify all objects accordingly
4 recusively apply 2+3 until STOP
5 prune tree
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Classifcation algorithms variety

A
1 stop criteria
2 pruning strategy
3 choice of attributes as classification criterion
4 number of splits per node
5 scale of measurement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

(CH)AID

A

chi squared automatic interaction detection

->find significantly different subsets of data so select attributes to generate those

17
Q

CART

A

classification and regression trees

->maximize information content i so select attributes accordingly

18
Q

ID3

A

iterative dichotomizer 3

->minimize entropy so split on attribute producing that

19
Q

Entropy Formula

A

H(S) = - sum [pi*log2pi]

20
Q

Information Entropy formula

A

I(a) = sum[qi+H(S)]

21
Q

Decision Tree Pruning

A

simplify complicated decision trees to incease efficiency and avoid over-fitting

top-down pruning->stopping criteria when building trees

bottom-up-pruning
->ex post:
prune splits that not increase subset homogeneeity sufficiently
prune to undo over-fitting based on validation set: prune tree parts that not increase success quota

22
Q

Decision Tree Properties:

number of generated rules

A

number of leaves

23
Q

Decision Tree Properties

maximum rule length

A

depth of tree

24
Q

Decision Tree Properties

sum of all path lengths from root fo leaf

A

external path length;

determines memory requirements

25
Q

Decision Tree Properties

sum of path lengths from root to leaf multiplied by number of represented instances

A

weighted external length;

measures classification costs

26
Q

Decision Trees:
understandability?
relationships?
too complex rules?

A
  • high understandability and interpretability
  • non-linear relationships
  • pruning important
27
Q

Random Forest

A

-several randomised instances of model - use aggreagated results for clasification

1 generate k trees by drawing with replacement k times
2 generalize by classifying with k trees and choose most frequently determined class

28
Q

Gradient Boosted Trees

A

1 initialize a prediction model with constant vlaue
2 compute pseudo-residuals
3 extend model by creating a regression tree to predict pseudo residuals
4 apply and repeat 2 for M iterations

29
Q

Support Vector Machines

A

built a linear discriminant function to separate two classes as widely as possible

critical boundary instances are termed support vectors

30
Q

Neural Networks

A

imitate concepts of brain

connect several simple models in hierarchical strucure

simple models are perceptrons, massively interconnected, decomposing problems and forwarding them

backpropagation -> modify weights based on contribution to accurate solution