Classification and Machine Learning Evaluation Flashcards
What is classification?
This is when you find patterns in input data and divide it into categories
What is regression?
Building a model to solve a problem
Define clustering?
This is where data which is in different categories forms clusters (groups) when graphed
What is the machine learning process? (5 steps)
- Data collection
- Feature selection
- Algorithm Choice
- Training
- Evaluation
What is overfitting?
A model overfits when it describes the randomness associated with the data, rather than the underlying relationship between the data points
What is underfitting?
This is when the model is too simple to understand the complex problem (so it is no so accurate)
What is the rule with under/over fitting (occam’s razor)
We should use the simplest model unless we have to use a more complex one
What are the 3 types of data set and what are they each used for?
> Training data: Used to train the model
> Validation data: Used to evaluate the different models that we have created
> Test data: Use to test the bench mark the model at the end
[Picture 1]

What is important about the data for the 3 types of data?
The training, validation and test data must not overlap. (contain any of the same data)
What can happen with regards to the training data if we make the model too complex?
It doesnt learn the underlying principle, it just starts to learn the data set.
Is it possible to get the error during training to 0? Would we want to do this? why?
Yes it is possible. No we do not want to do this because the AI would just be learning the data set
When do we stop training a machine learning AI? why?
We want to stop when the error of the validation data set starts to increase. The error will increase because the model is starting to over fit
[Picture 2]

What happens if you dont stop training a machine learning AI?
The model would become too complex and it would start to overfit
What is cross validation?
> When there is not enough data to create three sets large enough, cross validation is a common way to test the learned model on more data points.
> The data is split into several batches
> Each batch is tested in turn whilst others are used for training
What is a binary classifier?
This is a method used to evaluate a class. You form the following diagram:
[Picture 3]

How is accuracy predicted with a binary classifier? [Picture 3]

Accuracy % = 100 × (#TP+#FP) / (#TP + #TN + #FP + #FN)
How is left collumn sensitivity calculated? [Picture 3]

Sensitivity % = 100 × #TP / (#TP + #FN)
How is right collumn sensitivity calculated? [Picture 3]

Sensitivity % = 100 × #TN / (#TN + #FP)
What is the equation for recall? [Picture 3]

Recall % = 100 × #TP / (#TP + #FN)
What is recall also?
Recall = Left collumn Sensitivity
How is top row precision calculated? [Picture 3]

Precision % = 100 × #TP / (#TP + #FP)
How is bottom row precision calculated? [Picture 3]

Precision % = 100 × #TN / (#TN + #FN)
Which is the least informative method of evaluation?
Accuracy alone
What is the equation for F1?
F1 = 2 × (precision × recall) / (precision + recall)
What is MCC (not the equation)?
Matthew’s Correlations Coefficient
What is the equation for MCC?
MCC = (#TP × #TN - #FP × #TN) / √ ((#TP + #FP) (#TP + #FN) (#TN + #FP) (#TN + #FN))
What is the benefit of MCC?
It is a better measurement when the data set is unbalanced
What is the purpose of the confusion matrix?
It is a convenient way to represent the accuracy of a multi-class (non-binary) classifiers. It is like a heat map
[Picture 4]

What are the axis of the confusion matrix?
Each entry at coordinate (x,y) in the matrix corresponds to the number of elements of class x classified as y
[Picture 4]

What is the ideal result of a confusion matrix (best classification)?
All the elements should be diagonal [Picture 4]

What is a ROC curve?
A convenient way to compare different models in the receiver operator characteristic
What does a diagonal classifier of a ROC curve mean?
Along the line, there is a 50% chance of being correct and 50% chance of being incorrect
What is the perfect classifier on an ROC curve?
The perfect classifier would be in the top left corner (0,100)
On an ROC curve, which line is best?
The one with the largest area under it
What is a parametric method?
This is when you decide what parameters and how many the AI is going to learn before training starts
What happens to the training data after the AI has been training with a parametric method?
After the rules have been learned by the AI the data can be discarded
What are some issues with the parametric method?
We cannot always separate everything with a single hyperplane
What is a non-parametric method?
There are no defined parameters at the start. The parameters will be learned as training happens. This method focusses on the data rather than a particular structure