Lecture 2 - Binary Classification Flashcards by Alexander Bazba

Tasks, Instances, Instance Space, Label Space, Output space, model

What is the difference between instances and instance space

Instances: the objects of interest in machine learning described with features
Instance Space: the space of all possible instances

How well did you know this?

Not at all

Perfectly

Tasks, Instances, Instance Space, Label Space, Output space, model

What is the difference between Label Space and Output Space

Label Space: labels of example instances
Output space: the space of the outcomes (or targets) of the task

How well did you know this?

Not at all

Perfectly

Tasks, Instances, Instance Space, Label Space, Output space, model

What is a Model?

A mapping from the input space to the output space potentially using labelled examples

How well did you know this?

Not at all

Perfectly

What is a binary classifier?

Binary classifier has only two class labels (example: True or False)

How well did you know this?

Not at all

Perfectly

What is the term “learning a classifier”?

Learning a classifier involves constructing the function c^ such that it matched c as closely as possible

How well did you know this?

Not at all

Perfectly

What are the two ways of assessing performance of a binary classifier?

Contigency table
ill get back

How well did you know this?

Not at all

Perfectly

What is a contingency table?

A matrix that displays the (multivariate) frequency distribution of the variables. c(x) and c^(x)

How well did you know this?

Not at all

Perfectly

table looks like?

predicted + predicted -
actual + True positive False negative
actual - False positive True negative

How well did you know this?

Not at all

Perfectly

Contingency table

Accuracy?

acc = (TP+TN)/Total

How well did you know this?

Not at all

Perfectly

Contingency table

Error rate?

err = (FP + FN)/total

How well did you know this?

Not at all

Perfectly

Contingency table

What is an unbalanced dataset?

a dataset within which one or some of the classes have a much greater number of examples than the other

How well did you know this?

Not at all

Perfectly

Contingency table

What do we use in an unballanced dataset?

name 5 >:)

True Positive rate, sensitivity, recall
True Negative rate, specificity
False Positive rate, false alarm rate
Flase Negative rate
Precision, confidence

How well did you know this?

Not at all

Perfectly

What is F-measure or F1 score?

F-measure is a performance metric that is not affected by negatives as a weighted harmonic mean of precision and recall

How well did you know this?

Not at all

Perfectly

what is harmonic mean used for

helps to find multiplicative or divisor relationships between fractions without worrying about common denominators. Harmonic means are often used in averaging things like rates (e.g the average travel speed given a duration of several trips).

How well did you know this?

Not at all

Perfectly

What is the F-measure formula?

F1 = 2(prec x rec)/(prec + rec)

How well did you know this?

Not at all

Perfectly

What are train/test splits?

Study These Flashcards

The test set is used to make predictions during the inference phase on data not used during the trainig phase

What are folds

Study These Flashcards

a process when all data is randomly split into k folds
train, train,train,train,test
train, train, train,test,train
train, train,test,train,train,

How can we aggregate the folds?

Study These Flashcards

1/N * sum(scores)

What if validation set for?

Study These Flashcards

To test hyperparameters.
To decrease bias from train/test.
train, train,tarin,valid,test
Cross-validation

define train set error and test set error

Study These Flashcards

Train set error is measured on the validation set
Test set error is measured on the test set

Comparing these metrics helps knowing if a model is over-fitting on the training set

What is overfitting and underfitting

Study These Flashcards

Over-fitting: where a model performed well on trainig data but poorly on test data
Under-fitting: models poorly bad on both

What is generalisability in terms of overfitting and underfitting

Study These Flashcards

both over-fitting and under-fitting presence shows low generalisability

What is ROC and coverage plot for?

Study These Flashcards

A coverage plot and a Reciever Operation Curve (ROC) help summarize all different confusion matrices.

what are the axis for coverage plot and roc

Study These Flashcards

coverage plot: positives/negatives
roc: True pos ratio/False pos ratio
or sensetivity/(1-specificity)

What is AUC Area under the curve

Area under the curve used as summary of the model skills

What is a scoring classifier

a classification score is any score or metric the algorithm is using (or the user has set) that is used in order to compute the performance of the classification. Ie how well it works and its predictive power.

what is the line seperating correct and wrong classifications called?

the **margin**

What is the name of the function that maps example's margin to an associated loss. Rewards large posistive margins and penalizes large negative values

Loss function

Error rate (ERR) ?

Error rate (ERR) is calculated as the number of all incorrect predictions divided by the total number of the dataset.

What is a probability estimator

A class probability estimator is a scoring classifier that outputs probability vectors over classes. The output of the classifier shows how likely it is that the instancebelongs to a specific class rather than which class it will belong to.

Squared Error (SE)?

look it up

MSE?

look it up

Lecture 2 - Binary Classification Flashcards

Binary Classification and Related tasks (32 cards)