graphs Flashcards

Question 1

Q

receiver operating characteristic (ROC) graph

Answer

A

usually for ranking classifiers (usually binary); for accepting n most likely classifications (as “positive”), over all the test set of size N, create confusion matrix; plot false positive rate on x axis (N-n divided by total negative in the whole set), and true positive rate on y axis (n divided by total positive in the whole set); plot over all acceptable n

features:

ROC graphs remove class priors (eg class proportion imbalances)–they allow looking at the model’s predictive power (“if there are many negative examples, even a moderate false alarm rate can be unmanageable”)
do not factor in costs/benefits
for ranking classifiers, the area under the ROC curve is of significance (above and to left of diagonal); this statistic is equivalent to the Mann-Whitney-Wilcoxon measure; it’s also equivalent to the Gini coefficient (with a “minor algebraic transformation”)

ROC space details:

a classifier near the LLC (left side and near x-axis, abv main diagonal) are interpreted as “conservative”–they make in-class predictions only with strong evidence, so make few false positive errors (but sacrifice true positives in the process)
a classifier near the URC (abv main diagonal, but on rh side, w/ y close to 1) is interpreted as “permissive”–they make positive classifications with “weak evidence”
diagonal line from (0,0) to (1,1)–the policy of “guessing a class” (in a Bernoulli sense); eg guesses positive class half the time (coin-flip-wise), it will converge to (0.5,0.5); guesses positive 90% of the time, will converge to (0.9,0.9)
any performance in the square half below and to the right of the (0,0) to (1,1) diagonal would be “worse than random guessing”
a ranking model (usually) starts with everything classified as “N” (ie we select the top “zero” entries of the test set in the ranking order)–so in the LL corner of the ROC space (0,0) / nothing is ranked as positive, so both true and false positive rates are 0 (highly conservative)
at the other extreme, for high “n,” the ranking model is assuming everything is positive, arbitrarily, putting points in the UR corner of ROC space (1,1) (highly permissive)
for optimal ranking classifiers, we would expect the curve getting close to ideal–UL corner in ROC space (0,1), where all true positives in the test set have been accurately classified, with no false positives

Question 2

Q

profit curve

Answer

A

with a ranking classifier, create confusion matrix for accepting n most likely correct category classifications; compute profit/loss from the confusion matrix; plot profit/loss as a function of n

Question 3

Q

2x2 classification table

Answer

A

frequency matrix for binary classification problems

usually,
predictions are on rows: (1) positive, (2) negative
true classes are on columns: (1) positive, (2) negative

rates are column-based:

sensitivity aka recall, true positive rate, proportion of positive outcomes predicted positive
specificity aka precision, true negative rate, proportion of negative outcomes predicted negative

Question 4

Q

confusion matrix

Answer

A

a frequency matrix for classification problems; each row a model (class) prediction and each column the actual class; the closer to diagonal the matrix is, the better the model; useful for imbalanced classes, giving more information re accuracy

Question 5

Q

learning curve

Answer

A

for a given model and a fixed holdout set size, plot the model accuracy as a function of training set data size; typically plateaus as marginal gain of more data goes to 0

Question 6

Q

gini coefficient

Answer

A

a general measure of dispersion, as area between Lorenz curve and diagonal line; eg plot the cumulative holdings of wealth by the population, with population ordered in increasing order of wealthiness–if everyone had same wealth, g.c.=0

Question 7

Q

fitting graph

Answer

A

typically x axis is “model complexity” and y axis is model accuracy on (a) training data and (b) holdout data; “sweet spot” is where training data and holdout data plots are about to diverge away from each other–where training data starts to get increasingly accurate (overfitting), and holdout accuracy starts to plunge

Question 8

Q

cumulative response curve and lift curve

Answer

A

for a ranked classifier at cutoff n with test set of size N, plots the true positive rate on the y-axis (n divided by total number of positives in the test set), against the proportion of the population that is considered in the class of relevance (i.e. n/N)

features:

similar to ROC curve, the greater the “lift” (rise abv main diag), the better the performance
in a true lift curve, the performance at any x value registers as the ratio between the curve’s value and the diagonal
cumulative response curves are not entirely independent of class priors–class priors determine potential rate of increase of the curve (unlike with ROC)

Question 9

Q

dendogram

Answer

A

a 2-D visualization for progressive clustering; instances are on the x-axis, and the degree of clustering (low to high) is on the y-axis; the instances are ordered so that initial clusters are immediate neighbors, recursing on this ordering scheme as clustering is increased (i.e. at a given height / level on the dendogram, the ordering scheme applies to subgroups of instances)

Question 10

Q

entropy graph

Answer

A

re segmentation and information gain–a visualization of the weighted-sum-of-entropies resulting from any given segmentation scheme–each segment occupies a proportion (0 to 1) on the x axis, the segment’s height is the classification entropy (so a kind of bar plot); low height means low entropy (so “good” classification for that segment)

Question 11

Q

scree plot

Answer

A

used (at least) in context of PCA, showing the percent of total variance as a function of the number of (leading) PCA components retained; so it allows figuring out how many PCA components to retain for modeling purposes

Question 12

Q

calibration plot / reliability diagram

Answer

A

for checking performance of probabilistic classification models

for k classes, pick the class of interest, C (one plot per class)

define a bin as a probability range [p-low,p-high]

group all instances in the test set with class C predicted probability in [p-low,p-high] into set S

count the number, n, of instances in S that are actually of class C

n / |S| should be approximately within [p-low,p-high]

Question 13

Q

calibration histograms / heat maps

Answer

A

for checking performance of probabilistic classification models

for 2 classes
* group test set into true positive and true negative outcomes
* for each group, plot histogram of probability predictions for (say) negative outcomes
* the true positive histogram should be skewed toward 0 (no probability of negative outcome), and the true positive histogram should be skewed toward 1

for > 2 classes
* construct a per-instance heat map, usually with eg rows grouped by true class
* for each instance, each of k categories gets a color/intensity, reflecting probability
* for each instance true class group, probabilities should be clustered around the given class

Question 14

Q

scatterplot matrix

Answer

A

shows pairwise correlations between all numeric predictors; (note a feature plot may include scatterplots, but is more general)

Question 15

Q

predictor plot

Answer

A

plots each predictor against target variable (varies depending on categoric / numeric types)

Question 16

Q

classifier probability plot

Answer

Study These Flashcards

A

for categorical outcomes, traceplane bar plot, by predictor, of frequency in each outcome factor level

Question 17

Q

mosaic plot

Answer

Study These Flashcards

A

a categorical predictor vs categorical outcome trace plane plot; shows instance frequencies over the discrete 2D space

Question 18

Q

volcano plot

Answer

Study These Flashcards

A

visualize statistical significance against a related variable
log of p-value (from t-test or ANOVA eg) on y-axis
eg
- look over several one-hot-encoded predictors, comparing means of outcome variable between the presence / absence of a given (binary) predictor
- plot on x-axis the difference bewteen the means, and the t-test result log(p) on the y-axis

Question 19

Q

scree plot

Answer

Study These Flashcards

A

fundamental to PCA, showing total variance accounted for as a function of (ordered) PCA components

Question 20

Q

added variable plot

Answer

Study These Flashcards

A

show partial relationship between single predictor and outcome, by controlling for other predictors (attempts to address the loss of information in single predictor trace-plane plots)
eg, given Y ~ X1 + X2 + X3
- do 2 linear sub-regressions:
  - X1 ~ X2 + X3
  - Y ~ X2 + X3
- plot residual coordinate pairs, (r1,r2), arising from each of the 2 sub-regressions
can “the part of X1 not contained in X2, X3 explain the part of Y not contained in X2, X3”

Question 21

Q

calibration plots

Answer

Study These Flashcards

A

used for classifiers that produce class probabilities
eg if we group together all training set predictions with ca 20% probability of class A, then about 20% of those training set samples should be in class A (checking labels)
can be used to compare model performance, and/or to create post-model-fitting steps to “correct” the probability scores (eg with another model (like Bayes rule))

graphs Flashcards

(21 cards)