Classification Flashcards
impute/imputation
In statistics, imputation is the process of replacing missing data with substituted values.
confusion matrix
- A cross-tabulation of our model’s predictions against actual values
- A matrix (table) used to measure the performance of a machine learning algorithm
- Rows: actual classes (Ci)
- Columns: predicted classes (Cj)
What are the 4 possible outcomes of classification task?
True Positive
False Postive
False Negative
True Negative
What is the common choice for the baseline model for a classification problem?
a model that simply predicts the most common class every single time
What are the common evaluation metrics for a classification problem/model?
- Accuracy
- Precision
- Recall
- Specificity
- f1 score
- ROC curve
What is accuracy?
the number of times we predicted correctly divided by the total number of observations
What is precision / positive predictive value?
the percentage of positive predictions that we made that are correct.
What is recall / true positive rate / sensitivity?
the percentage of positive cases we accurately predicted.
What is specificity / true negative rate?
the percentage of negative cases we accurately predicted.
The percentage of predicting true negative out of all negatives.
logistic regression
- A regression algorithm
- To find the values of the coefficients that weight each input variable)
- To assign observations to a discrete set of classes
- To predict discrete outcomes
- binomial and multinomial
- The output is a value between 0 and 1 that represents the probability of one class over the other.
regularized least squares
- A way of solving least squares regression problems
- An extra constraint on the solution, which is called regularization
- It adds a penalty term to the error.
- A argument in LogisticRegression
What are the components of a decision tree?
- root
- condition/internal node
- branches/edges
- decision/leaf
What is classification tree?
to classify the outcome variable
What is regression tree?
to predict continuous values like price of a house
CART = ?
classification and regression trees