Lecture 3 - Multi-class classification and regression Flashcards

1
Q

Give an example of multi-class classification

A
  1. Disease type diagnosis
  2. topic classification
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are the two approaches to turn binary classifier into multi-class

A
  1. One versus rest
  2. One versus one
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is one versus rest

A

Each classifier distinguishes between one specific class and all other classes combined. The class with the highest confidence score from its respective classifier is chosen as the final prediction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the process of learning and inference one-versus-rest

A

Learning: traink or k-1 seperate classifiers where k is the number of classes
Inference: use all the and form a code word based on the output of the classidier. Next compare the code word against all the rows and dinf the cnearest row in the code matrix.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is one-versus-one

A

strategy for multi-class classification in machine learning where a separate binary classifier is trained for every possible pair of classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

For n classes, this results is n(n−1)/2 classifiers for symmetric, and n(n-1) for asymetric.

who does this apply to?

A

one versus one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the process of training and inference of one-versus-one

A

Training: traiin seperate classifiers for each pair of classes
Inference: use all the classifications for a code word based on the output of the classifier. Next, compare the code word against all rows and find the nearest row in the code matrix. Take a voting scheme when distances are not unique.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to get the accuracy in a confuzion matrix of a three-class confuzion matrix

A

add up all the True Positives divided by the total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to get the precision in a confuzion matrix of a three-class confuzion matrix

A

get the precision for each class, then multiply it by the distribution with the total, then add all the three weighted precisions together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

how can we know how good a classifier can be?

A
  1. macro-average
  2. micro-average
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is macro-average

A

macro-averate will compute the metric independently for each class and then take the average.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is micro-average

A

micro-average will aggregate the contributions of all classes to compute the average metric.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how should AUC curves be used for multi-class classifiers

A

The average AUC over binary classification tasks, eigher in a one-versus-rest of one-versus-one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what does ROC stand for

A

Receiver operating characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is regression loss

A

Regression models are evaluated by applying a loss function to the residuals. f(x)- ^f(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how many parameters does n-degree polynomial have?

A

n+1 parameters

17
Q

how to avoid overfitting in regression

A

To avoid overfitting, the number of parameters estimated from the data must be considerably less than the number of data points.

18
Q

what is the Bias-variance dilemma

A

A low-complecity models suffers less from variability due to random variations in the training data, but mey introduce a systematic bias that even large amounts of training data can’t resolve; on the other hand, a high-complexity model eliminates such bias but can suffer non-systematic errors due to variance.

19
Q

In ____ learning the task if to come up with a description of the data

A

descriptive

20
Q

what is distance based clustering

A

Most distance-based clustering methods depend on the possibility of defining a ‘centre of mass’ or exemplar of an arbitrary set of instances, such that the exemplar minimises some distance-related quantity over all instances in the set, called its scatter. A good clustering is then one whereby the scatter is summed over each cluster - the witithin-cluster scatter is much smaller than the scatter of the entire data set.

21
Q

What is purity

A

check slides or gpt
1/N sigma(max(omega_k union c_j))

22
Q

What are the three ways to evaluate clustering performance without ground truth?

A
  1. Calinski-Harabsz Index
  2. Davies-Bouldin Index
  3. Silhouette Coefficient
23
Q

what is the silhouette coef formula

A

s=(b-a)/max(a,b)

a: the mean distance between an instance and all other points in the same cluster
b: the mean distance between an instance and all other points in the next cluster.

24
Q

what is the range of silhouette coef results

A

-1 to +1. +1 far from neighbours, 0 on the decision line, -1might be assignmed to the wrong cluster.

25
Q

Give 2 examples of subgroup discovery

A
  1. Detection of risk groups with coronary heart disease or cancer
  2. Finding patterns in traffic accidents
26
Q

how can we assess performance of sub-group discovery

A
  1. Chi squared test
  2. class distribution on the left different from class distribution in the row marginals