Lecture 3 - Multi-class classification and regression Flashcards
Give an example of multi-class classification
- Disease type diagnosis
- topic classification
what are the two approaches to turn binary classifier into multi-class
- One versus rest
- One versus one
What is one versus rest
Each classifier distinguishes between one specific class and all other classes combined. The class with the highest confidence score from its respective classifier is chosen as the final prediction.
what is the process of learning and inference one-versus-rest
Learning: traink or k-1 seperate classifiers where k is the number of classes
Inference: use all the and form a code word based on the output of the classidier. Next compare the code word against all the rows and dinf the cnearest row in the code matrix.
what is one-versus-one
strategy for multi-class classification in machine learning where a separate binary classifier is trained for every possible pair of classes.
For n classes, this results is n(n−1)/2 classifiers for symmetric, and n(n-1) for asymetric.
who does this apply to?
one versus one
what is the process of training and inference of one-versus-one
Training: traiin seperate classifiers for each pair of classes
Inference: use all the classifications for a code word based on the output of the classifier. Next, compare the code word against all rows and find the nearest row in the code matrix. Take a voting scheme when distances are not unique.
How to get the accuracy in a confuzion matrix of a three-class confuzion matrix
add up all the True Positives divided by the total
How to get the precision in a confuzion matrix of a three-class confuzion matrix
get the precision for each class, then multiply it by the distribution with the total, then add all the three weighted precisions together.
how can we know how good a classifier can be?
- macro-average
- micro-average
what is macro-average
macro-averate will compute the metric independently for each class and then take the average.
what is micro-average
micro-average will aggregate the contributions of all classes to compute the average metric.
how should AUC curves be used for multi-class classifiers
The average AUC over binary classification tasks, eigher in a one-versus-rest of one-versus-one.
what does ROC stand for
Receiver operating characteristics
What is regression loss
Regression models are evaluated by applying a loss function to the residuals. f(x)- ^f(x)
how many parameters does n-degree polynomial have?
n+1 parameters
how to avoid overfitting in regression
To avoid overfitting, the number of parameters estimated from the data must be considerably less than the number of data points.
what is the Bias-variance dilemma
A low-complecity models suffers less from variability due to random variations in the training data, but mey introduce a systematic bias that even large amounts of training data can’t resolve; on the other hand, a high-complexity model eliminates such bias but can suffer non-systematic errors due to variance.
In ____ learning the task if to come up with a description of the data
descriptive
what is distance based clustering
Most distance-based clustering methods depend on the possibility of defining a ‘centre of mass’ or exemplar of an arbitrary set of instances, such that the exemplar minimises some distance-related quantity over all instances in the set, called its scatter. A good clustering is then one whereby the scatter is summed over each cluster - the witithin-cluster scatter is much smaller than the scatter of the entire data set.
What is purity
check slides or gpt
1/N sigma(max(omega_k union c_j))
What are the three ways to evaluate clustering performance without ground truth?
- Calinski-Harabsz Index
- Davies-Bouldin Index
- Silhouette Coefficient
what is the silhouette coef formula
s=(b-a)/max(a,b)
a: the mean distance between an instance and all other points in the same cluster
b: the mean distance between an instance and all other points in the next cluster.
what is the range of silhouette coef results
-1 to +1. +1 far from neighbours, 0 on the decision line, -1might be assignmed to the wrong cluster.
Give 2 examples of subgroup discovery
- Detection of risk groups with coronary heart disease or cancer
- Finding patterns in traffic accidents
how can we assess performance of sub-group discovery
- Chi squared test
- class distribution on the left different from class distribution in the row marginals