Credibility: Robust Evaluation Models Flashcards

Question 1

Q

Credibility

Answer

A

Metrics Performance Evaluation: How evaluate?
Methods for Performance: How to obtain reliable?
Methods for Comparison: How to compare?
Model Selection: Which model?

Question 2

Q

Metrics Performance Evaluation: How evaluate?

Answer

A

Residual Sum of Squares (RSS): sum(yi - sum(wjhj(xi))^(2))
R2: TSS = sum(yi - y)^2 R2 = 1 - RSS/TSS
MSE = (1/N)sum(yi*-yi)^2
RMSE = sqrt(MSE)

Question 3

Q

Metrics Performance Evaluation: Classification

Answer

A

TP - FN
FP - TN

Accuracy = TP + TN / (ALL)
Cost(X): evaluate, or guide the search or to build model (decision trees)

Precision: positive that are actually positive. TP/(TP+FP)
Recall: positive classified w.r.t existing good doc. TP/(TP+FN)

Question 4

Q

Methods for Performance: How to obtain reliable?

Answer

A

Training, Validation and Testing

Holdout: 2/3 - 1/3
Random Subsampling: repeated holdout.
Cross Validation: k disjoints subsets. k-fold. Leave one out.
Stratified sampling: each class represented with approx equal proportions.

Bootstrap: sampling with replacement.
0.632 bootstrap: probability of ending up in test data is: 0.368
error: 0.632etest + 0.368etrain
repeat several times.

Question 5

Q

Methods for Comparison: How to compare?

Answer

A

Paired t-Test using k-fold Crossvalidation
Generate k folds and for each compute performance of model A and B
di = perf ai - perf bi
mean = (1/k)sum(di)
std=sqrt((1/k)sum(di-mean)^2)
Two hypothesis: H0: mean = 0, Ha: mean != 0

Question 6

Q

Methods for Comparison: Multiple Testing

Answer

A

0.05 threshold in 20 different observations.
0.95^20 = 0.358
at least one mistake = 1 - 0.358 = 0.642

Bonferroni Correction
	Tests are independent
	Divide p-value by number of tests
	0.05/20 = 0.0025
	at least one mistake = 1 - 0.9512 = 0.0488

Question 7

Q

Methods for Comparison: Probabilistic Classifiers

Answer

A

Logistic Regression returns a probability
P(yi|xi) threshold of 0.5
Threshold of 0.75 for one label.
Higher Threshold, more precision, low recall
Lower Threshold, less precision, more recall

Question 8

Q

Methods for Comparison: Precision-Recall Curves

Answer

A

Precision as function of recall varying threshold values.

Best classifier would be the one with precision always equal to one

Question 9

Q

Methods for Comparison: Receiver Operating Characteristicc ROC

Answer

A

TPR = TP/(TP+FN)
FPR = FP/(FP+TN)

(0,0) everything negative
(1,1) everything positive
(0,1) ideal

No model consistently outperform other.

Question 10

Q

Methods for Comparison: Lift Charts

Answer

A

Measure of effectiveness of predictive model
Ratio between with and without predictive model
Cumulative Gains and Lift Charts are visual aids
Greater area, better the model.
100.000 response rate is 0.4% that is 400 response.

Question 11

Q

Model Selection: Which model?

Answer

A

Occam’s Razor: best theory is the smallest one that describes all the facts.
No free lunch Theorem: no favor one.

Credibility: Robust Evaluation Models Flashcards

(11 cards)