Module 2 Flashcards

1
Q

Nearest neighbour classifier

A
  • classify instance to the class label of the nearest training instance
  • non-parametric model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

One nearest neighbour cons

A
  • sensitive to noise

- overfit training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Increasing k will make the classifier

A
  • have a smoother decision boundary (higher bias)

- less sensitive to training data (lower variance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Weighted k-NN

A
  • assign a weight to each neighbour (based on how close they are)
  • sum the weights per class in neighbourhood (assign to class with largest sum)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

k-NN pros

A
  • robust to noisy data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

k-NN cons

A
  • slow for large datasets
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

k-NN regression

A

Compute the mean value across k nearest neighbours

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Locally weighted regression

A
  • distance-weighted k-NN for regression

- compute the weighted mean value across k nearest neighbours

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Decision Tree learning

A
  • search for an “optimal” splitting rule
  • split your your dataset
  • repeat 1 & 2 on each new splitter subset
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Entropy

A

A measure of the uncertainty of a random variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Information Gain

A

Difference between the initial entropy and the (weighted) average entropy of the produced subsets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Ordered values

A
  • for each feature sort its values

- consider only split points that are between two examples with different class labels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Categorical/Symbolic values

A
  • find the most informative feature

- create as many branches as there are different values for this feature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Pruning

A
  • target all nodes that are connected only to leaf nodes
  • turn each into a leaf node
  • repeat until all such nodes have been tested
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Random forests

A
  • use many decision tress

- each tree generated with random sample of training set & random subsets of features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Regression trees

A
  • remove of class label

- predict a real-valued number