Chapter3: Machine Learning Experiments Flashcards

1
Q

Describe the accuracy of a classification model

A

correct / real

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

describe the error of a classification model

A

incorrect / real

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the problem with error and accuracy? what is a better alternative?

A

it is unreliable for imbalanced data

confusion matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how can we compute accuracy from a confusion matrix

A

total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what does precision and recall measure about a classifier

A

its ability to classify positive samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

recall =

A

TP + FN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Precision =

A

TP + FP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the f_1 score

A

precision + recall

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is specificity

A

measure for assessing a classifiers ability to classify negative samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

specificity =

A

TN + FP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

1 - specificity is otherwise known as

A

false positive rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is ROC analysis

A

applies to binary classifiers. we plot sensitivity(true positive rate) and 1 - specificity.

we want the area under the curve to be as close to 1 as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is error in a regression model

A

difference between predicted and desired output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

list types of error for a regression model

A

root mean square error
mean absolute error
mean absolute percentage error
sum of squares error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the coefficient of determination

A

R^2 score in single output case.

sum of squares error / sum to n (y - 1/n sum to n y) ^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is sample error

A

the error computed using a performance metric from a set of samples

17
Q

what is true error

A

the probability a random sample is misclassified

18
Q

how is true and sample error different in regression

A

it is the expectation of the error

19
Q

how do we get bias and variance values

A

from the expected squared prediction error

20
Q

what is bias error

A

(y - E(f))^2

repeat with different sets of training data and measure how true they are

21
Q

what is variance error

A

E[(f - E[f])^2]

repeat with different sets of training data and measure how much prediction varies

22
Q

what is overfitting

A

model is over complex. Low bias high variance

23
Q

what is underfitting

A

model is too simple. high bias, low variance

24
Q

what is a confidence interval

A

how good of an estimate of true error is provided by sample error

25
Q

what is a z test

A

compares two classifiers

26
Q

what is the confidence interval of a classifier

A

a range [error -a, error + a ]

a = zp root(error * (1 - error) / n)

27
Q

describe the steps of a z test

A
  1. calculate zp = d / o
    where d = error A - error B
    o = root (zpa + zpb)
  2. get p using table
  3. c = 1 - (1-p /2)
28
Q

what is a hypothesis

A

a model trained on a sample set

29
Q

how do we evaluate a model

A

multiple train-test trials and average these error rates

30
Q

what methods for data splitting are there

A
holdout
random subsampling
k fold cross validation
leave one out
bootstrap
31
Q

what is holdout

A

we have a data set split into a single test and train division

32
Q

what is random subsampling

A

have k splits of a number of test samples, chosen randomly. the rest are training samples

33
Q

what is k fold cross validation

A

we split the data into k partitions. all examples are used in train and test, but only in a test set once.

low k = not enough trails, high k = small test set -> high variance

34
Q

what is Leave one out

A

like k fold cross validation but k = n, so only one sample in the test set each time

35
Q

what is bootstrap

A

randomly select m samples, use these for training. use the rest for testing

36
Q

what is hyperparameter selection

A

choose the model with the least error.