week 3 - classification Flashcards
what is the difference between a parameter and hyperparameter?
parameter = learned by model during training e.g regression coefficients
hyperparameter = set by user or grid search. E.g lambda
what is the function for a logistic regression called?
The sigmoidal function
Implicit in this function is a threshold, which is used to make classifications
what is the cost function for logistic regression?
You want to minimise the log loss or the cross entropy loss
what is the equation for accuracy and what does it mean?
meaning: overall proportion of correct classifications
accuracy = correct predictions/total predictions
what is sensitivity?
Proportion of true positives correctly identified/ true positive rate
TP/ TP+FN = TP/P
Sensitivity is also known as recall
AKA how good is it at identifying positives
What is specitify?
The proportion of true negatives that are correctly identified
TN/TN+FP = TN/N
This is equivalent to 1 - false positive rate
Also known as true negative rate
AKA how good is it at identifying negatives
What is precision?
TP/PP = TP/TP+FP
The proportion of positive results that were correctly classified
What do the rows and columns in the confusion matrix correspond to?
Rows = predicted
Columns = Actual truth
How can we change the specifity and sensitive of a logistic regression curve?
We can adjust the threshold
A lower threshold will increase sensitivity
A higher threshold will increase specifity
What does the ROC curve show?
The balance between sensitivity and specitivity.
The X axis shows FPR (1 - specificity)
The Y axis shows TPR (Sensitivity)
The ROC curve shows how the sensitivity and specificity vary with different logistic regression curve thresholds. E.g, if you had a threshold that classified ALL the true positives correctly, what would be the FPR?. Or if you had a threshold that calculated 0.7 of the true positives correctly, what would be the FPR rate etc..
The point at 0,0 represents a threshold that doesn’t classify anything as positive
ROC curves make it easy to identify the best thresholds for making a decision
What does the AUC show?
The AUC is the area under the ROC curve
A bigger AUC indicates that the proportion of true positives vs false positives is maximised
When might precision-recall curve be more useful than a ROC curve?
If there’s a highly imbalanced sample. This is because it classifies based on the rate within each classification categories, rather than between categories. This ensures it remains sensitive to the minority class predictions
what is the margin in support vector machines?
The distances between the objects and the thresholds
how does allowing misclassifications impact the bias variance trade off?
Allowing misclassifications improves the variance. Otherwise the data may be fitting to outliers
SVM then often uses a soft margin, which allows for some misclassifications
how does a kernel support vector machine work?
It aims to find a hyperplane that classifies classes with the Maximum margin
SVM machines optimise hinge loss , which measures how well the classifier’s decision boundary separates the classes and penalizes misclassifications based on their distance to the decision boundary. So for points that are classified correctly it doesn’t matter how close they are to the boundary, but for points that are misclassified the penalty is proportional to how far it is from the boundary
Non-lienar kernals can seperate data linearly by adding extra dimensions