General ML Flashcards
Bias vs Variance Tradeoffs
Bias is error introduced by approximating a real-world problem, which may be complex, by a simplified model
- Leads to undercutting.
Variance is error introduced by the model’s sensitivity to small fluctuations in the training dataset, causing it to model noise rather than intended outputs
- Leads to overfitting
Supervised vs Unsupervised
Supervised uses labeled data, while unsupervised uses algorithms to find patterns within data with no explicitly right answer.
Likelihood vs Probability
- Probability is the likelihood of an event happening under certain conditions, while likelihood is a measure of how well data supports a model or parameter values. Probability is more general while likehoood is used in statistics models and inference.
- Probability refers to the possibility of something happening, Likelihood refers to the process of determining the best data distribution given a specific situation in the data. When calculating the probability of a given outcome, you assume the model’s parameters are reliable.
KNN vs K-means
K-means _> Unsupervised clustering algorithm
- Set of unlabeled points and a threshold, also will take points and learn how to cluster them by mean distance.
KNN _> supervised classification algorithm.
- labeled data you want to classify an un-labeled point in
- Seeing if a post should or should’t be monetized based on other factors of the post
Type I error vs Type II error
Type I error is a false positive —> claiming something happened when it didn’t —> telling a man he is pregnant
Type II error is a false negative —> claiming something didn’t happen when it did. —> tell a pregnant woman she isn’t carrying a baby.
Sensitivity vs Specificity
Sensitivity focuses on identifying the positive instances correctly, so critical when you want to minimize false negatives -> Medical Tests, TP/TP + FN
Specificity focuses on identifying negative instances correctly, so is critical when you care about minimizing false positives. -> Spam filters. TN/TN + FP
• Sensitivity: “Sensitive to catching Positives.” • Specificity: “Specific to excluding Negatives.”
Accuracy
General metric used to measure how well a classification model performs across all cases (both positive and negative). Proportion of correct predictions (both true positives and true negatives) out of the total number of predictions.
Accuracy = True Positives + True Negatives / Total prediction.
Useful: When class distribution is balanced, where positives and negatives are roughly equal, and when BOTH false positives and false negatives have similar consequences, and you care about the overall performance of the model.
When it is misleading:
- Imbalanced Classes (95% positive, 5% negative) it can always predict the majority class and have high accuracy, but it won’t be useful for the minority class. (disease detection where only small % of class is positive)
- Different costs for FP and FN
Image detection, sentiment anaylsis for customer reviews.
Precision
Metric used to evaluate the performance of a classification model, especially in the context of binary classification. Measures how many of the positive predictions made by the model are actually correct.
Precision = True Positives / (True Positives + False Positives)
- Note: comparing both actual positives, AND negatives that were classified as positive, meaning it is only evaluating the models ability to classify the positive class, without regarding its ability to classify the negative class.
Particularly useful when the cost of false positives is high, meaning its important to minimize the number of incorrect positive predictions. For example, if a model is predicting whether emails are spam, and it identifies 10 emails as spam, but only 7 of them are actually spam, then precision is 70%, 7/ 7 + 3
Use Cases: Spam detection, product recommendation, Ad-click prediction (showing irrelevant ads waste ad spend).
Recall - TPR - Sensitivity
Proportion of actual positives that are correctly identified by the classifier.
TPR = True Positives / (True Positives + False Negatives)
tp/ tp + fn
Note: all values in equation relate to only the positives that are classified or lack thereof, so it is a good metric to use when we don’t really care about false positives.
Say you said there were 10 apples and 5 oranges in a case of only 10 apples.
- Recall = 100%, there were 10 and you said there were 10
- Precision: 10 / 10 + 5 = 66.7% because out of the 15 events you predicted only 10 are correct.
Use case: Cancer predictions, Fraud detection, Search and rescue, Customer Churn.
FPR
FPR = False Positives / (False Positives + True Negatives)
fp / fp + tn
Note: all values in equation relate to only the negatives that are being classified or lack thereof.
ROC
Measuring Sensitivity vs Fallout - Used for Binary Classification
- Graphical representation of the contrast between TPRs and FPRs at various thresholds.
- Proxy for the trade-off between the sensitivity of the model (true positives) vs the fall-out or the probability it will trigger a false alarm (false positives).
- It helps visualize how well the model discriminates between the positive and negative classes, and allows the engineer to choose the correct algorithm for logistic regression for example.
F1 Score
- Weighted average of the precision and recall of a model, with results tending to 1 being the best. You can use it when doing classification where true negatives don’t matter much.
AUC
The AUC is the area under the ROC curve, and provides a single value to summarize the overall performance of the classifier ranging from 0 - 1
- AUC = 1 -> perfect model
- AUC = 0.5 -> model performs no better than random guessing
- AUC < 0.5 -> model that is worse than random guessing.
Bayes Theorem and why classier is called Naive Bayes.
P(A|B) = P(B|A) * P(A) / P(B)
- P(bought|click) = P(click|bought) * P(bought) / P(click)
- Allows us to find the probability of a cause given its effect.
- Why is it called Naive bayes?
- It assumes absolute independence of features (probably never met in real life) which isn’t the case in this scenario.
- Independence: B happening has no affect on A happening.
L1 vs L2 Regularization + Elastic Net
- L1 (Lasso)
- Encourages sparsity in the model by adding the absolute values of the coefficients to the loss function
- Effect: Drives some coefficients to exactly zero, effectively performing feature selection. This can be particularly useful when dealing with a large number of features, as it helps in identifying the most important predictors
- L2 (Ridge)
- Penalizes large coefficients by adding the squared values of the coefficients to the loss function
- Effect: Shrinks coefficients towards zero but not set them to zero. This helps in handling multicollinearity and improving the stability and generalization of the model.
Elastic Net does both, by adding both the absolute and square value toe the loss function, inherently having benefits from both.
Multicollinearity
Refers to the situation in which two or more independent variables in a regression model are highly correlated, meaning they have a strong linear relationship, making it hard to distinguish the affects of these variables individually on the dependent variable.
You can detect this using correlation matrix.
Feature 2: Buying an item
These can
Feature 1: Adding credit card info
Feature 2: Buying an item
These can have direct correlation