Classification and Machine Learning Evaluation Flashcards
What is classification?
This is when you find patterns in input data and divide it into categories
What is regression?
Building a model to solve a problem
Define clustering?
This is where data which is in different categories forms clusters (groups) when graphed
What is the machine learning process? (5 steps)
- Data collection
- Feature selection
- Algorithm Choice
- Training
- Evaluation
What is overfitting?
A model overfits when it describes the randomness associated with the data, rather than the underlying relationship between the data points
What is underfitting?
This is when the model is too simple to understand the complex problem (so it is no so accurate)
What is the rule with under/over fitting (occam’s razor)
We should use the simplest model unless we have to use a more complex one
What are the 3 types of data set and what are they each used for?
> Training data: Used to train the model
> Validation data: Used to evaluate the different models that we have created
> Test data: Use to test the bench mark the model at the end
[Picture 1]
What is important about the data for the 3 types of data?
The training, validation and test data must not overlap. (contain any of the same data)
What can happen with regards to the training data if we make the model too complex?
It doesnt learn the underlying principle, it just starts to learn the data set.
Is it possible to get the error during training to 0? Would we want to do this? why?
Yes it is possible. No we do not want to do this because the AI would just be learning the data set
When do we stop training a machine learning AI? why?
We want to stop when the error of the validation data set starts to increase. The error will increase because the model is starting to over fit
[Picture 2]
What happens if you dont stop training a machine learning AI?
The model would become too complex and it would start to overfit
What is cross validation?
> When there is not enough data to create three sets large enough, cross validation is a common way to test the learned model on more data points.
> The data is split into several batches
> Each batch is tested in turn whilst others are used for training
What is a binary classifier?
This is a method used to evaluate a class. You form the following diagram:
[Picture 3]