Classification Flashcards by Byron OG

What is a feature space

a coordinate space used to represent the input examples for a given problem, with one coordinate for each descriptive feature

How well did you know this?

Not at all

Perfectly

Eager Learning Classification Strategy

classifier build a full model during an initial training phase, to use later when new query examples arrive
more offline setup work, less work at run-time
generalise before seeing the query example

How well did you know this?

Not at all

Perfectly

Lazy Learning Classification Strategy

Classifier keeps all the training examples for later use.
Little work is done offline, wait for new query examples.
Focus on the local space around the examples

How well did you know this?

Not at all

Perfectly

What learning strategy does KNN Classifier use and how?

Lazy. K-NN identifies the k most similar previous examples from the training set for which a label has already been assigned, using some distance function

How well did you know this?

Not at all

Perfectly

How does Weighted kNN differ from the regular model

Weighted voting, closer neighbours get higher votes.

How well did you know this?

Not at all

Perfectly

Is there a “best” distance measure

No, the choice of distance measure is highly problem-dependent

How well did you know this?

Not at all

Perfectly

What is the difference between a local distance function (LDF) and a global distance function (GDF)

LDFs measure the distance between two examples based on a single feature, where as GDFs are based on the combination of the local distances across all features

How well did you know this?

Not at all

Perfectly

Define the overlap function (measuring distance)

Returns 0 if the two values for a feature are equal and 1 otherwise

How well did you know this?

Not at all

Perfectly

Define Hamming Distance (measuring distance)

GDF which is the sum of the overlap differences across all features

How well did you know this?

Not at all

Perfectly

Define Absolute Difference (measuring distance)

Absolute value of the difference between values for a feature or several features

How well did you know this?

Not at all

Perfectly

Define Absolute Difference for ordinal features (measuring distance)

calculate the absolute value of the difference between the two positions in the ordered list of possible values

How well did you know this?

Not at all

Perfectly

Define Euclidean Distance,and give the formula

“Straight line” distance between two points in a feature space
calculated as the square root of the sum of squared differences for each feature f, representing a pair of examples.

ED(p, q) = SQR(SUM_f (q_f - p_f)^2)

How well did you know this?

Not at all

Perfectly

What are Heterogeneous (Diverse) Distance Functions

GDF created from different local distance functions, using an appropriate function for each feature

How well did you know this?

Not at all

Perfectly

Min-max normalization formula

z_i = x_i - min(x) / max(x) - min(x)

How well did you know this?

Not at all

Perfectly

Standard Normalisation Formula

z_i = x_i - μ / σ

How well did you know this?

Not at all

Perfectly

Advantages of kNN

Study These Flashcards

little training time
interpretability and explainability
transparency

Disadvantages of kNN

Study These Flashcards

need to carefully customise the distance function
query time depends on the complexity of the function

Decision Tree Algorithm

Study These Flashcards

All training examples in root node
Examples are split by one feature into child nodes
The process repeats for each child node
Continue until all leaf nodes contain the same class

Entropy formula

Study These Flashcards

H(X) = - SUM_x p(x) log p(x)

What does Information Gain measure and give the formula

Study These Flashcards

measures the reduction in entropy when a feature is used to split a set into subsets

IG for feature A that splits a set of examples S into {S₁,…S_m}:
IG(S, A) = (original entropy) - (entropy after split)
IG(S, A) = H(S) - SUM(i=1, m) |S_i|/ |S| H(S_i)

Steps of computing IG for each feature in a dataset

Study These Flashcards

calculate overall dataset entropy
calculate entropy for each feature
calculate IG for each feature

Why do ensembles work?

Study These Flashcards

When the average probability of an individual being correct is > 50%, the chance of the ensemble of them reaching the correct decision increases as more members are added.
This holds true only if the diversity in ensemble continues to grow as well.

What is the key idea of Bagging

Study These Flashcards

Train classifiers on different subsets of the training data

What is Bootstrap aggregation, what classifiers does it work better for

Study These Flashcards

Bagging technique which randomly samples with replacement
Works better for “unstable” classifiers, e.g. DT, NN

What is the key idea of Random Subspacing, and what does it encourage?

Train *n* base classifiers, each on a different subset of features. Encourages diversity in the ensemble

When would you choose weighted voting for your ensembles

When the individual classifiers do not give equal performance, we should give more influence to the better classifiers

Discuss the Accuracy vs Diversity Trade-off in ensemble classification

An ideal ensemble is one that consists of highly accurate members which at the same time disagree, therefore we face a trade-off between diversity and accuracy when constructing an ensemble of classifiers

What is the key idea in Boosting

Train a sequence of classifiers sequentially, so that later classifiers are trained to better predict class labels that earlier ones performed poorly on

Give the basic approach to Boosting

1. Assign an equal weight to all training examples 2. Get a random sample from the training examples based on the weights. 3. Train a classifier on the sample 4. Increase weights for misclassified examples, decrease weight for correctly classified examples 5. Output final model based on all classifiers (e.g. majority voting model)

Explain the bias-variance trade off

Bias is how close the classifier's predictions are from the correct values and Variance is the error from sensitivity to small changes in the training set. There is often a tradeoff between **minimising the two**

Discuss how ensemble generation methods affect the bias-variance tradeoff

- Bagging can often reduce the variance part of error - Boosting can often reduce variance and bias, because it focuses on misclassified examples - Boosting can sometimes increase error, as it is susceptible to noise, which can lead to overfitting

Which classifiers generally suffer from overfitting and which classifiers generally suffer underfitting?

- low bias but high variance classifiers tend to overfit - high bias but low variance classifiers tend to underfit

Classification Flashcards

(32 cards)