Classification Flashcards

1
Q

What is a feature space

A

a coordinate space used to represent the input examples for a given problem, with one coordinate for each descriptive feature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Eager Learning Classification Strategy

A
  • classifier build a full model during an initial training phase, to use later when new query examples arrive
  • more offline setup work, less work at run-time
  • generalise before seeing the query example
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Lazy Learning Classification Strategy

A
  • Classifier keeps all the training examples for later use.
  • Little work is done offline, wait for new query examples.
  • Focus on the local space around the examples
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What learning strategy does KNN Classifier use and how?

A

Lazy. K-NN identifies the k most similar previous examples from the training set for which a label has already been assigned, using some distance function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does Weighted kNN differ from the regular model

A

Weighted voting, closer neighbours get higher votes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Is there a “best” distance measure

A

No, the choice of distance measure is highly problem-dependent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between a local distance function (LDF) and a global distance function (GDF)

A

LDFs measure the distance between two examples based on a single feature, where as GDFs are based on the combination of the local distances across all features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define the overlap function (measuring distance)

A

Returns 0 if the two values for a feature are equal and 1 otherwise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define Hamming Distance (measuring distance)

A

GDF which is the sum of the overlap differences across all features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define Absolute Difference (measuring distance)

A

Absolute value of the difference between values for a feature or several features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define Absolute Difference for ordinal features (measuring distance)

A

calculate the absolute value of the difference between the two positions in the ordered list of possible values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define Euclidean Distance,and give the formula

A
  • “Straight line” distance between two points in a feature space
  • calculated as the square root of the sum of squared differences for each feature f, representing a pair of examples.

ED(p, q) = SQR(SUM_f (q_f - p_f)^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are Heterogeneous (Diverse) Distance Functions

A

GDF created from different local distance functions, using an appropriate function for each feature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Min-max normalization formula

A

z_i = x_i - min(x) / max(x) - min(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Standard Normalisation Formula

A

z_i = x_i - μ / σ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Advantages of kNN

A
  • little training time
  • interpretability and explainability
  • transparency
17
Q

Disadvantages of kNN

A
  • need to carefully customise the distance function
  • query time depends on the complexity of the function
18
Q

Decision Tree Algorithm

A
  1. All training examples in root node
  2. Examples are split by one feature into child nodes
  3. The process repeats for each child node
  4. Continue until all leaf nodes contain the same class
19
Q

Entropy formula

A

H(X) = - SUM_x p(x) log p(x)

20
Q

What does Information Gain measure and give the formula

A

measures the reduction in entropy when a feature is used to split a set into subsets

IG for feature A that splits a set of examples S into {S1,…Sm}:
IG(S, A) = (original entropy) - (entropy after split)
IG(S, A) = H(S) - SUM(i=1, m) |S_i|/ |S| H(S_i)

21
Q

Steps of computing IG for each feature in a dataset

A
  1. calculate overall dataset entropy
  2. calculate entropy for each feature
  3. calculate IG for each feature
22
Q

Why do ensembles work?

A

When the average probability of an individual being correct is > 50%, the chance of the ensemble of them reaching the correct decision increases as more members are added.
This holds true only if the diversity in ensemble continues to grow as well.

23
Q

What is the key idea of Bagging

A

Train classifiers on different subsets of the training data

24
Q

What is Bootstrap aggregation, what classifiers does it work better for

A

Bagging technique which randomly samples with replacement
Works better for “unstable” classifiers, e.g. DT, NN

25
Q

What is the key idea of Random Subspacing, and what does it encourage?

A

Train n base classifiers, each on a different subset of features.
Encourages diversity in the ensemble

26
Q

When would you choose weighted voting for your ensembles

A

When the individual classifiers do not give equal performance, we should give more influence to the better classifiers

27
Q

Discuss the Accuracy vs Diversity Trade-off in ensemble classification

A

An ideal ensemble is one that consists of highly accurate members which at the same time disagree, therefore we face a trade-off between diversity and accuracy when constructing an ensemble of classifiers

28
Q

What is the key idea in Boosting

A

Train a sequence of classifiers sequentially, so that later classifiers are trained to better predict class labels that earlier ones performed poorly on

29
Q

Give the basic approach to Boosting

A
  1. Assign an equal weight to all training examples
  2. Get a random sample from the training examples based on the weights.
  3. Train a classifier on the sample
  4. Increase weights for misclassified examples, decrease weight for correctly classified examples
  5. Output final model based on all classifiers (e.g. majority voting model)
30
Q

Explain the bias-variance trade off

A

Bias is how close the classifier’s predictions are from the correct values and Variance is the error from sensitivity to small changes in the training set. There is often a tradeoff between minimising the two

31
Q

Discuss how ensemble generation methods affect the bias-variance tradeoff

A
  • Bagging can often reduce the variance part of error
  • Boosting can often reduce variance and bias, because it focuses on misclassified examples
  • Boosting can sometimes increase error, as it is susceptible to noise, which can lead to overfitting
32
Q

Which classifiers generally suffer from overfitting and which classifiers generally suffer underfitting?

A
  • low bias but high variance classifiers tend to overfit
  • high bias but low variance classifiers tend to underfit