Chapter 4. Anomaly Detection Flashcards
Why unsupervised learning fraud detection systems are in vogue? P 152
Fraud patterns change over time, so supervised systems built using fraud labels can’t adapt to newly emerging patterns, therefore unsupervised fraud detection systems are in vogue.
Why do the algorithms have the largest reconstruction error on anomalies? P 154
Because they are the hardest entries to model
We need to implement min/max scaler on the sum of the squared differences between the original feature matrix and the reconstructed matrix using the dimensionality reduction algorithm. True/False?Why? P 154
True. It’s so that all the reconstruction errors are within a zero to one range.
Why are unsupervised methods harder to evaluate than the supervised ones? P 157
Often, unsupervised learning systems are judged by their ability to catch known patterns of fraud, but what about the unknown ones? Evaluating an unsupervised method on labeled data is an incomplete assessment because there may have been frauds undetected by the company, so they were mislabeled; a better evaluation metric would be to assess them on their ability to identify unknown patterns of fraud, both in the past and in the future. Since we cannot go back to the company and have them
evaluate any unknown patterns of fraud we identify, we will have to
evaluate these unsupervised systems solely based on how well they detect
the known patterns of fraud.
What are the steps to finding anomalies in a data set using a dimensionality reduction method? P 158
- Use the method to learn the underlying structure of the dataset.
- Use the learned model to reconstruct the dataset.
- Calculate how different the reconstructed transactions are from the original transactions (Sum of Squared Error).
- Those entries the method does the poorest job of reconstructing are the most anomalous (and most likely to be fraudulent).
It’s unusual to perform PCA for anomaly detection on an already dimensionality-reduced dataset. True/False? P 158
False
What does anomaly detection using dimensionality reduction methods rely on? P 158
Anomaly detection relies on reconstruction error.
What does reconstruction error depend largely on, when using PCA? P 159
For PCA, the reconstruction error will depend largely on the number of principal components we keep and use, to reconstruct the original transactions.
What happens if we keep too many or too few principal components? P 159
PCA may too easily reconstruct the original transactions, so much so that the reconstruction error will be minimal for all the transactions. If we keep too few principal components, PCA may not be able to reconstruct any of the original transactions well enough—not even the normal, non-fraudulent transactions.
Average precision is one way of calculating the area under the PR curve, True/False? External
True
How can AUC-ROC be interpreted? External
AUC-ROC can be interpreted as the probability that the scores given by a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one.
When is ROC-AUC not reliable? What is the alternative? External
For imbalanced classification with a severe skew and few examples of the minority class, the ROC AUC can be misleading. An ROC curve (or receiver operating characteristic curve) is a plot that summarizes the performance of a binary classification model on the positive class. So a few correct or incorrect predictions can result in a large change in the ROC Curve or ROC AUC score.
A common alternative is the precision-recall curve and area under curve. Both the precision and the recall are focused on the positive class (the minority class) and are unconcerned with the true negatives (majority class) this makes it an effective diagnostic for imbalanced binary classification models.
So, Precision-recall curves (PR curves) are recommended for highly skewed domains where ROC curves may provide an excessively optimistic view of the performance.
How do a perfect skill and a no skill model look on the PR curve? External
A model with perfect skill is depicted as a point at a coordinate of (1,1). A skillful model is represented by a curve that bows towards a coordinate of (1,1). A no-skill classifier will be a horizontal line on the plot with a precision that is proportional to the number of positive examples in the dataset. For a balanced dataset this will be 0.5.
What is a dummy classifier in sklearn? External
Dummy Classifier makes predictions that ignore the input features. (it’s the no skill model). This classifier serves as a simple baseline to compare against other, more complex classifiers.
Roc-Auc is a popular diagnostic tool for classifiers on balanced and imbalanced binary prediction problems alike because it is not biased to the majority or minority class. True/False External
True
ROC analysis does not have any bias toward models that perform well on the minority class at the expense of the majority class—a property that is quite attractive when dealing with imbalanced data.