Lecture 2 - Binary Classification Flashcards

Binary Classification and Related tasks

1
Q

Tasks, Instances, Instance Space, Label Space, Output space, model

What is the difference between instances and instance space

A

Instances: the objects of interest in machine learning described with features
Instance Space: the space of all possible instances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Tasks, Instances, Instance Space, Label Space, Output space, model

What is the difference between Label Space and Output Space

A

Label Space: labels of example instances
Output space: the space of the outcomes (or targets) of the task

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Tasks, Instances, Instance Space, Label Space, Output space, model

What is a Model?

A

A mapping from the input space to the output space potentially using labelled examples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a binary classifier?

A

Binary classifier has only two class labels (example: True or False)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the term “learning a classifier”?

A

Learning a classifier involves constructing the function c^ such that it matched c as closely as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the two ways of assessing performance of a binary classifier?

A
  • Contigency table
  • ill get back
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a contingency table?

A

A matrix that displays the (multivariate) frequency distribution of the variables. c(x) and c^(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

table looks like?

A

predicted + predicted -
actual + True positive False negative
actual - False positive True negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Contingency table

Accuracy?

A

acc = (TP+TN)/Total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Contingency table

Error rate?

A

err = (FP + FN)/total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Contingency table

What is an unbalanced dataset?

A

a dataset within which one or some of the classes have a much greater number of examples than the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Contingency table

What do we use in an unballanced dataset?

name 5 >:)

A
  • True Positive rate, sensitivity, recall
  • True Negative rate, specificity
  • False Positive rate, false alarm rate
  • Flase Negative rate
  • Precision, confidence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is F-measure or F1 score?

A

F-measure is a performance metric that is not affected by negatives as a weighted harmonic mean of precision and recall

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is harmonic mean used for

A

helps to find multiplicative or divisor relationships between fractions without worrying about common denominators. Harmonic means are often used in averaging things like rates (e.g the average travel speed given a duration of several trips).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the F-measure formula?

A

F1 = 2(prec x rec)/(prec + rec)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are train/test splits?

A

The test set is used to make predictions during the inference phase on data not used during the trainig phase

17
Q

What are folds

A

a process when all data is randomly split into k folds
train, train,train,train,test
train, train, train,test,train
train, train,test,train,train,

18
Q

How can we aggregate the folds?

A

1/N * sum(scores)

19
Q

What if validation set for?

A

To test hyperparameters.
To decrease bias from train/test.
train, train,tarin,valid,test
Cross-validation

20
Q

define train set error and test set error

A
  • Train set error is measured on the validation set
  • Test set error is measured on the test set

Comparing these metrics helps knowing if a model is over-fitting on the training set

21
Q

What is overfitting and underfitting

A
  1. Over-fitting: where a model performed well on trainig data but poorly on test data
  2. Under-fitting: models poorly bad on both
22
Q

What is generalisability in terms of overfitting and underfitting

A

both over-fitting and under-fitting presence shows low generalisability

23
Q

What is ROC and coverage plot for?

A

A coverage plot and a Reciever Operation Curve (ROC) help summarize all different confusion matrices.

24
Q

what are the axis for coverage plot and roc

A

coverage plot: positives/negatives
roc: True pos ratio/False pos ratio
or sensetivity/(1-specificity)

25
Q

What is AUC Area under the curve

A

Area under the curve used as summary of the model skills

26
Q

What is a scoring classifier

A

a classification score is any score or metric the algorithm is using (or the user has set) that is used in order to compute the performance of the classification. Ie how well it works and its predictive power.

27
Q

what is the line seperating correct and wrong classifications called?

A

the margin

28
Q

What is the name of the function that maps example’s margin to an associated loss. Rewards large posistive margins and penalizes large negative values

A

Loss function

29
Q

Error rate (ERR) ?

A

Error rate (ERR) is calculated as the number of all incorrect predictions divided by the total number of the dataset.

30
Q

What is a probability estimator

A

A class probability estimator is a scoring classifier that outputs probability vectors over classes.
The output of the classifier shows how likely it is that the instancebelongs to a specific class rather than which class it will belong to.

31
Q

Squared Error (SE)?

A

look it up

32
Q

MSE?

A

look it up