week 3 - classification Flashcards

1
Q

what is the difference between a parameter and hyperparameter?

A

parameter = learned by model during training e.g regression coefficients

hyperparameter = set by user or grid search. E.g lambda

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is the function for a logistic regression called?

A

The sigmoidal function
Implicit in this function is a threshold, which is used to make classifications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the cost function for logistic regression?

A

You want to minimise the log loss or the cross entropy loss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the equation for accuracy and what does it mean?

A

meaning: overall proportion of correct classifications

accuracy = correct predictions/total predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is sensitivity?

A

Proportion of true positives correctly identified/ true positive rate

TP/ TP+FN = TP/P

Sensitivity is also known as recall

AKA how good is it at identifying positives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is specitify?

A

The proportion of true negatives that are correctly identified

TN/TN+FP = TN/N

This is equivalent to 1 - false positive rate
Also known as true negative rate

AKA how good is it at identifying negatives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is precision?

A

TP/PP = TP/TP+FP
The proportion of positive results that were correctly classified

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What do the rows and columns in the confusion matrix correspond to?

A

Rows = predicted
Columns = Actual truth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can we change the specifity and sensitive of a logistic regression curve?

A

We can adjust the threshold
A lower threshold will increase sensitivity
A higher threshold will increase specifity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does the ROC curve show?

A

The balance between sensitivity and specitivity.

The X axis shows FPR (1 - specificity)
The Y axis shows TPR (Sensitivity)

The ROC curve shows how the sensitivity and specificity vary with different logistic regression curve thresholds. E.g, if you had a threshold that classified ALL the true positives correctly, what would be the FPR?. Or if you had a threshold that calculated 0.7 of the true positives correctly, what would be the FPR rate etc..

The point at 0,0 represents a threshold that doesn’t classify anything as positive

ROC curves make it easy to identify the best thresholds for making a decision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does the AUC show?

A

The AUC is the area under the ROC curve
A bigger AUC indicates that the proportion of true positives vs false positives is maximised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When might precision-recall curve be more useful than a ROC curve?

A

If there’s a highly imbalanced sample. This is because it classifies based on the rate within each classification categories, rather than between categories. This ensures it remains sensitive to the minority class predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the margin in support vector machines?

A

The distances between the objects and the thresholds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how does allowing misclassifications impact the bias variance trade off?

A

Allowing misclassifications improves the variance. Otherwise the data may be fitting to outliers

SVM then often uses a soft margin, which allows for some misclassifications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

how does a kernel support vector machine work?

A

It aims to find a hyperplane that classifies classes with the Maximum margin

SVM machines optimise hinge loss , which measures how well the classifier’s decision boundary separates the classes and penalizes misclassifications based on their distance to the decision boundary. So for points that are classified correctly it doesn’t matter how close they are to the boundary, but for points that are misclassified the penalty is proportional to how far it is from the boundary

Non-lienar kernals can seperate data linearly by adding extra dimensions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the kernel trick

A

by adding extra dimensions, the support vector machine can find new ways to linearly seperate the data

SVM machines use kernel functions to systematically find the SVC classifiers in higher dimensions

Polynomial kernel d = 1 uses 1 dimension. Polynomial kernel d = 2 adds a y dimension of x^2
Polynomial kernel d = 3 adds a third dimension of x^3

We can find a good value for d using cross-validation

17
Q

how does k-nearest neighbours work?

A

for the data-point to classify, it measures the distance to its neighbouring data points.

It then classifies it based on the dominant class amongst the data points nearest neighbours

18
Q

what is the difference between KNN and other classification algorithms?

A

The model doesn’t fit itself to the training dataset, as the model essentially IS the training data.

However its still useful to use cross validation and split the data because we still have hyperparameters such as optimal K or optimal distance metric

19
Q

how does one vs. rest work for multi-class classification?

A

Classifications are split into as many different classification problems are there are numbers of classes. E.g AD, MCI AND CN would be AD vs rest. MCI vs. rest and CN vs. rest

20
Q

what is one vs. one?

A

n classes = n * (n-1)/2 classifiers

out of all the binary classifiers, the prediction = the class with the most votes

If you get equal number from each classifier, you can consider how far the data point is from each of the boundaries

21
Q

What are strengths and weaknesses of SVM, logistic regression and KNN?

A

LOGISTIC REGRESSION:
- probabilistic interpretation
- can be regularised to avoid overfitting
- However tends to underperform for non-linear boundaries

SVM:
- Can model non-linear boundaries
- However to tricky tune as you need to select the right kernel, and also its computationally intensive

KNN:
- simple with no training time
- However you need to select k and its very computationally intensive for large data

22
Q
A