Machine Learning - Model Evaluation Flashcards

Question 1

Q

Accuracy

Answer

A

Percent of all predictions that were correct.

Question 2

Q

Confusion Matrix

Answer

A

A matrix showing the predicted and actual classifications. A confusion matrix is of size LxL, where L is the number of different label values. Rows for each of the actual values cross-tabbed against columns of the predicted values.

Question 3

Q

Cross-validation: Overview

Answer

A

A method for estimating the accuracy of an inducer by dividing the data into k mutually exclusive subsets, or folds, of approximately equal size. The inducer is trained and tested k times. Each time it is trained on the data set minus a fold and tested on that fold. The accuracy estimate is the average accuracy for the k folds.

Question 4

Q

Cross-validation: How

Answer

A

Leave-one-out cross validation

K-fold cross validation

Training and validation data sets have to be drawn from the same population

The step of choosing the kernel parameters of a SVM should be cross-validated as well

Question 5

Q

Model Comparison

Question 6

Q

Model Evaluation: Adjusted R^2 (R-Square)

Answer

A

The method preferred by statisticians for determining which variables to include in a model. It is a modified version of R^2 which penalizes each new variable on the basis of how many have already been admitted. Due to its construct, R^2 will always increase as you add new variables, which result in models that over-fit the data and have poor predictive ability. Adjusted R^2 results in more parsimonious models that admit new variables only if the improvement in fit is larger than the penalty, which improves the ultimate goal of out-of-sample prediction. (Submitted by Santiago Perez)

Question 7

Q

Model Evaluation: Decision tables

Answer

A

simplest way of expressing output from machine learning, cells in table represent the resulting decision based on the row and column which represent the conditions

Question 8

Q

Model Evaluation: Mis-classification error

Answer

A

Define Test error as summed error for classification (false prediction)

Question 9

Q

Model Evaluation: Negative class

Answer

A

Negative means not having the symptoms

Question 10

Q

Model Evaluation: Positive class

Answer

A

presence of something we are looking for - 1

Question 11

Q

Model Evaluation: Precision

Answer

A

Of all predicted positives, how many are positive?

Question 12

Q

Model Evaluation: Recall

Answer

A

Of all positives how many were predicted as positive?

Question 13

Q

Model Evaluation: True negative

Answer

A

Hypotheses correctly predicts negative output.

Question 14

Q

Model Evaluation: True positive

Answer

A

Hypotheses correctly predicts positive output.

Question 15

Q

Model Selection Algorithm

Answer

A

algorithm that automatically selects a good model function for a dataset.

Question 16

Q

Precision

Answer

A

Percent of predicted Positives that were correct.

Question 17

Q

P-values

Question 18

Q

Receiver-Operator Curves

Question 19

Q

Sensitivity

Answer

A

aka Recall or True positive rate. Percent of actual Positives that were correctly predicted.

Question 20

Q

Specificity

Answer

A

aka True negative rate. Percent of actual Negatives that were correctly predicted.