Chapter3: Machine Learning Experiments Flashcards

Question 1

Q

Describe the accuracy of a classification model

Answer

A

correct / real

Question 2

Q

describe the error of a classification model

Answer

A

incorrect / real

Question 3

Q

what is the problem with error and accuracy? what is a better alternative?

Answer

A

it is unreliable for imbalanced data

confusion matrix

Question 4

Q

how can we compute accuracy from a confusion matrix

Question 5

Q

what does precision and recall measure about a classifier

Answer

A

its ability to classify positive samples

Question 6

Q

recall =

Question 7

Q

Precision =

Question 8

Q

what is the f_1 score

Answer

A

precision + recall

Question 9

Q

what is specificity

Answer

A

measure for assessing a classifiers ability to classify negative samples

Question 10

Q

specificity =

Question 11

Q

1 - specificity is otherwise known as

Answer

A

false positive rate

Question 12

Q

what is ROC analysis

Answer

A

applies to binary classifiers. we plot sensitivity(true positive rate) and 1 - specificity.

we want the area under the curve to be as close to 1 as possible

Question 13

Q

what is error in a regression model

Answer

A

difference between predicted and desired output

Question 14

Q

list types of error for a regression model

Answer

A

root mean square error
mean absolute error
mean absolute percentage error
sum of squares error

Question 15

Q

what is the coefficient of determination

Answer

A

R^2 score in single output case.

sum of squares error / sum to n (y - 1/n sum to n y) ^2

Question 16

Q

what is sample error

Answer

A

the error computed using a performance metric from a set of samples

Question 17

Q

what is true error

Answer

A

the probability a random sample is misclassified

Question 18

Q

how is true and sample error different in regression

Answer

A

it is the expectation of the error

Question 19

Q

how do we get bias and variance values

Answer

A

from the expected squared prediction error

Question 20

Q

what is bias error

Answer

A

(y - E(f))^2

repeat with different sets of training data and measure how true they are

Question 21

Q

what is variance error

Answer

A

E[(f - E[f])^2]

repeat with different sets of training data and measure how much prediction varies

Question 22

Q

what is overfitting

Answer

A

model is over complex. Low bias high variance

Question 23

Q

what is underfitting

Answer

A

model is too simple. high bias, low variance

Question 24

Q

what is a confidence interval

Answer

A

how good of an estimate of true error is provided by sample error

Question 25

Q

what is a z test

Answer

A

compares two classifiers

Question 26

Q

what is the confidence interval of a classifier

Answer

A

a range [error -a, error + a ]

a = zp root(error * (1 - error) / n)

Question 27

Q

describe the steps of a z test

Answer

A

calculate zp = d / o
where d = error A - error B
o = root (zpa + zpb)
get p using table
c = 1 - (1-p /2)

Question 28

Q

what is a hypothesis

Answer

A

a model trained on a sample set

Question 29

Q

how do we evaluate a model

Answer

A

multiple train-test trials and average these error rates

Question 30

Q

what methods for data splitting are there

Answer

A

holdout
random subsampling
k fold cross validation
leave one out
bootstrap

Question 31

Q

what is holdout

Answer

A

we have a data set split into a single test and train division

Question 32

Q

what is random subsampling

Answer

A

have k splits of a number of test samples, chosen randomly. the rest are training samples

Question 33

Q

what is k fold cross validation

Answer

A

we split the data into k partitions. all examples are used in train and test, but only in a test set once.

low k = not enough trails, high k = small test set -> high variance

Question 34

Q

what is Leave one out

Answer

A

like k fold cross validation but k = n, so only one sample in the test set each time

Question 35

Q

what is bootstrap

Answer

A

randomly select m samples, use these for training. use the rest for testing

Question 36

Q

what is hyperparameter selection

Answer

A

choose the model with the least error.