Classification Flashcards
classification
unlike regression that tries to predict a continuous variable. Classification predicts a categorical variable.
Logistic Regression
K-NN
Tree based ( Decision Trees, bagging (random forests) and boosting (XGBoost))
confusion matrix
to evaluate classification method a table where each column of the matrix represents the number of samples in each predicted class and each row represents the number of samples in each actual class then can copute TP & FP
Logistic regression
Y can be 0 or 1
π½0 shifts the curve right or left and π½1 controls how steep the π-shaped curve is.
π½1 changes the log odds
it is a linear classifier, performs best if the decision boundary between classes is a line. If not a line can eg consider KNN.
KNN
IT a non linear classifier (unlike logisitic regression)
In k-NN classification, the output is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.
small K: low bias but high variance (more flexible)
big k: higher bias but lower variance (more smooth)
ROC CURVE
BEST UPPER LEFT CORNER (best true positive rate and false positive rate), each point is a treshold we try. If we get a straight line then it is the worst performance.
Shows the change in performance with different threshold
(we applied with threshold logistic regression to choose one class or the other)
Decision trees
Decision trees is a non-linear classifier, partion space into regions
Find predictors and treshold that reduce the mean error (error can be entropy, information gain can tell us how useful adding one feature can be). Loop with reduced input each time.
Can overfit but we can limit the depth of the tree and the amount of min sample a node must have to be split (such that the subgroups the decision tree creates arent too specific)
Advantages: very interpretable
Bagging tree
Train multiple trees on subset of the data.
Use all trees to make a predict, output the label that has been output by most trees.
Trees can have a large depth because the average will reduce the variance and higher the bias. If there is a strongly correlated predictor tho, it might not be sufficient as all trees will have that same predictor β¦
Disadvantage is because of the average, the output is no longer interpretable.
Feature importance, see how a predictor affect the RSS of each tree and average it.
Random Forest
Modified bagging tree
More random and thus lower variance more efficiently
Train n models and each choose predictors randomly thus cant have the problem of all choosing the same
Feature importance, see how a predictor affect the RSS of each tree and average it.
Boosted tree
Do a simple tree (eg: use only 1 predictor), see what it doesnβt predict well, do another simple tree to predict that, loop until satisfied then combine all trees together to make the overall prediction
works usually better than Random Forest
Shouldnt do too many trees otherwise can overfit
Entropy
Measure uncertainty (if there is 50/50 chance of an outcome than the situation is as uncertain as it could be). Used to score the error in classification
entropy goes from 0 to 1
1 is the most uncertain (thus we would like the value to be as low as possible)
information gain can tell us how useful adding one feature can be, the higher the better