model evaluation Flashcards

Question 1

Q

What is the requirement for a goodness metric for backpropagation to work?

Answer

A

has to be differentiable

Question 2

Q

How do you define the goodness of your model?

Answer

A

Ability to generalize to unseen data

Question 3

Q

What can affect generalizability?

Answer

A

algorithm
hyperparameter values
training data
random initialization (can affect accuracy)

Question 4

Q

What are 4 methods of quantifying generalization goodness?

Answer

A

1) accuracy, precision, recall
2) mean absolute error
3) RMSE
4) area under ROC curve

Question 5

Q

Come up with a metric for Predicting lung cancer from chest x-rays

Answer

A

Recall, precision, FN/P

Question 6

Q

Come up with a metric for predicting high school GPA

Question 7

Q

Come up with a metric for evaluating search engine results

Question 8

Q

Come up with a metric for predicting the location of an object in 3D space

Answer

A

euclidean/cosine distance

Question 9

Q

Come up with a metric for predicting if twitter user is a liberal or conservative

Question 10

Q

What is sensitivity?

Answer

A

True positive rate, also recall

Question 11

Q

What is specificity?

Answer

A

True negative rate

Question 12

Q

What is precision?

Answer

A

A measure of exactness. The percentage of tuples labeled as positive and are actually such

Question 13

Q

What is the F-measure?

Answer

A

Harmonic mean of precision and recall, giving equal weight to each

Question 14

Q

What metrics can you use for regression?

Answer

A

MAE, RMSE

Question 15

Q

Why do we take the square of metrics?

Answer

A

Taking the square results in steeper gradients when searching and gives higher penalities. RMSE accentuates the impact of outliers. Otherwise might take too long t

Question 16

Q

What is a condition of the AUC

Answer

A

Binary prediction metric that only applies when label is binary, predictions have to be probablistic.

Question 17

Q

What is the AUC?

Answer

A

Probability that a randomly chosen positive label prediction will be greater than a randomly chosen negative label prediction

Question 18

Q

What does an AUC of 0.5 mean?

Answer

A

AUC = 0.50 means no better than random chance

Question 19

Q

What is K-fold CV?

Answer

A

Tuples randomly partitioned into K mutually exclusive subsets

Question 20

Q

What is a standard value for K?

Question 21

Q

What is the algorithm for K-fold CV?

Answer

A

1) Choose a K and randomly assign each row to a single fold (between 1 and K)
- This fold assignment indicates the fold in which that row will serve as a tuple in the test set
2) Conduct K phases of training/testing, start with fold 1, which will serve as the test set, with the other folds serving as the training set
3) Calculate error metric on the whole dataset (concatenation of results from each fold)

Question 22

Q

Do you calculate an error metric for each fold in K-fold CV?

Answer

A

No, it is still only calculated once

Question 23

Q

What is CV used for?

Answer

A

Model selection

Question 24

Q

When would CV NOT give you better results than handpicking your train and test data?

Answer

A

With temporal data

Question 25

Q

What is ROC curve?

Answer

A

Shows tradeoff between TPR and FPR. It allows us to visualize the tradeoff between the rate at which it mistakenly identifies negative cases as positive for different portions of the test set