Chapter3: Machine Learning Experiments Flashcards
Describe the accuracy of a classification model
correct / real
describe the error of a classification model
incorrect / real
what is the problem with error and accuracy? what is a better alternative?
it is unreliable for imbalanced data
confusion matrix
how can we compute accuracy from a confusion matrix
total
what does precision and recall measure about a classifier
its ability to classify positive samples
recall =
TP + FN
Precision =
TP + FP
what is the f_1 score
precision + recall
what is specificity
measure for assessing a classifiers ability to classify negative samples
specificity =
TN + FP
1 - specificity is otherwise known as
false positive rate
what is ROC analysis
applies to binary classifiers. we plot sensitivity(true positive rate) and 1 - specificity.
we want the area under the curve to be as close to 1 as possible
what is error in a regression model
difference between predicted and desired output
list types of error for a regression model
root mean square error
mean absolute error
mean absolute percentage error
sum of squares error
what is the coefficient of determination
R^2 score in single output case.
sum of squares error / sum to n (y - 1/n sum to n y) ^2
what is sample error
the error computed using a performance metric from a set of samples
what is true error
the probability a random sample is misclassified
how is true and sample error different in regression
it is the expectation of the error
how do we get bias and variance values
from the expected squared prediction error
what is bias error
(y - E(f))^2
repeat with different sets of training data and measure how true they are
what is variance error
E[(f - E[f])^2]
repeat with different sets of training data and measure how much prediction varies
what is overfitting
model is over complex. Low bias high variance
what is underfitting
model is too simple. high bias, low variance
what is a confidence interval
how good of an estimate of true error is provided by sample error
what is a z test
compares two classifiers
what is the confidence interval of a classifier
a range [error -a, error + a ]
a = zp root(error * (1 - error) / n)
describe the steps of a z test
- calculate zp = d / o
where d = error A - error B
o = root (zpa + zpb) - get p using table
- c = 1 - (1-p /2)
what is a hypothesis
a model trained on a sample set
how do we evaluate a model
multiple train-test trials and average these error rates
what methods for data splitting are there
holdout random subsampling k fold cross validation leave one out bootstrap
what is holdout
we have a data set split into a single test and train division
what is random subsampling
have k splits of a number of test samples, chosen randomly. the rest are training samples
what is k fold cross validation
we split the data into k partitions. all examples are used in train and test, but only in a test set once.
low k = not enough trails, high k = small test set -> high variance
what is Leave one out
like k fold cross validation but k = n, so only one sample in the test set each time
what is bootstrap
randomly select m samples, use these for training. use the rest for testing
what is hyperparameter selection
choose the model with the least error.