week 3 - classification Flashcards

Question 1

Q

what is the difference between a parameter and hyperparameter?

Answer

A

parameter = learned by model during training e.g regression coefficients

hyperparameter = set by user or grid search. E.g lambda

Question 2

Q

what is the function for a logistic regression called?

Answer

A

The sigmoidal function
Implicit in this function is a threshold, which is used to make classifications

Question 3

Q

what is the cost function for logistic regression?

Answer

A

You want to minimise the log loss or the cross entropy loss

Question 4

Q

what is the equation for accuracy and what does it mean?

Answer

A

meaning: overall proportion of correct classifications

accuracy = correct predictions/total predictions

Question 5

Q

what is sensitivity?

Answer

A

Proportion of true positives correctly identified/ true positive rate

TP/ TP+FN = TP/P

Sensitivity is also known as recall

AKA how good is it at identifying positives

Question 6

Q

What is specitify?

Answer

A

The proportion of true negatives that are correctly identified

TN/TN+FP = TN/N

This is equivalent to 1 - false positive rate
Also known as true negative rate

AKA how good is it at identifying negatives

Question 7

Q

What is precision?

Answer

A

TP/PP = TP/TP+FP
The proportion of positive results that were correctly classified

Question 8

Q

What do the rows and columns in the confusion matrix correspond to?

Answer

A

Rows = predicted
Columns = Actual truth

Question 9

Q

How can we change the specifity and sensitive of a logistic regression curve?

Answer

A

We can adjust the threshold
A lower threshold will increase sensitivity
A higher threshold will increase specifity

Question 10

Q

What does the ROC curve show?

Answer

A

The balance between sensitivity and specitivity.

The X axis shows FPR (1 - specificity)
The Y axis shows TPR (Sensitivity)

The ROC curve shows how the sensitivity and specificity vary with different logistic regression curve thresholds. E.g, if you had a threshold that classified ALL the true positives correctly, what would be the FPR?. Or if you had a threshold that calculated 0.7 of the true positives correctly, what would be the FPR rate etc..

The point at 0,0 represents a threshold that doesn’t classify anything as positive

ROC curves make it easy to identify the best thresholds for making a decision

Question 11

Q

What does the AUC show?

Answer

A

The AUC is the area under the ROC curve
A bigger AUC indicates that the proportion of true positives vs false positives is maximised

Question 12

Q

When might precision-recall curve be more useful than a ROC curve?

Answer

A

If there’s a highly imbalanced sample. This is because it classifies based on the rate within each classification categories, rather than between categories. This ensures it remains sensitive to the minority class predictions

Question 13

Q

what is the margin in support vector machines?

Answer

A

The distances between the objects and the thresholds

Question 14

Q

how does allowing misclassifications impact the bias variance trade off?

Answer

A

Allowing misclassifications improves the variance. Otherwise the data may be fitting to outliers

SVM then often uses a soft margin, which allows for some misclassifications

Question 15

Q

how does a kernel support vector machine work?

Answer

A

It aims to find a hyperplane that classifies classes with the Maximum margin

SVM machines optimise hinge loss , which measures how well the classifier’s decision boundary separates the classes and penalizes misclassifications based on their distance to the decision boundary. So for points that are classified correctly it doesn’t matter how close they are to the boundary, but for points that are misclassified the penalty is proportional to how far it is from the boundary

Non-lienar kernals can seperate data linearly by adding extra dimensions

Question 16

Q

What is the kernel trick

Answer

Study These Flashcards

A

by adding extra dimensions, the support vector machine can find new ways to linearly seperate the data

SVM machines use kernel functions to systematically find the SVC classifiers in higher dimensions

Polynomial kernel d = 1 uses 1 dimension. Polynomial kernel d = 2 adds a y dimension of x^2
Polynomial kernel d = 3 adds a third dimension of x^3

We can find a good value for d using cross-validation

Question 17

Q

how does k-nearest neighbours work?

Answer

Study These Flashcards

A

for the data-point to classify, it measures the distance to its neighbouring data points.

It then classifies it based on the dominant class amongst the data points nearest neighbours

Question 18

Q

what is the difference between KNN and other classification algorithms?

Answer

Study These Flashcards

A

The model doesn’t fit itself to the training dataset, as the model essentially IS the training data.

However its still useful to use cross validation and split the data because we still have hyperparameters such as optimal K or optimal distance metric

Question 19

Q

how does one vs. rest work for multi-class classification?

Answer

Study These Flashcards

A

Classifications are split into as many different classification problems are there are numbers of classes. E.g AD, MCI AND CN would be AD vs rest. MCI vs. rest and CN vs. rest

Question 20

Q

what is one vs. one?

Answer

Study These Flashcards

A

n classes = n * (n-1)/2 classifiers

out of all the binary classifiers, the prediction = the class with the most votes

If you get equal number from each classifier, you can consider how far the data point is from each of the boundaries

Question 21

Q

What are strengths and weaknesses of SVM, logistic regression and KNN?

Answer

Study These Flashcards

A

LOGISTIC REGRESSION:
- probabilistic interpretation
- can be regularised to avoid overfitting
- However tends to underperform for non-linear boundaries

SVM:
- Can model non-linear boundaries
- However to tricky tune as you need to select the right kernel, and also its computationally intensive

KNN:
- simple with no training time
- However you need to select k and its very computationally intensive for large data

Question 22

Q

Answer

Study These Flashcards

A

week 3 - classification Flashcards

(22 cards)