Challenges of Machine Learning Flashcards
what is overfitting?
when the model performs well on the training data, but doesnt generalise well
what are some solutions to overfitting?
choose a simpler model
gather more training data
fix the data, remove any outliers
what is regularization?
constraining a model to make it simpler and reduce the risk of overfitting
when does overfitting happen?
when the model is too complex relative to the amount and noisiness of the training data
what is a hyper-parameter?
a parameter that is not affected by the algorithm itself, it is global and must be set prior to training and remains constant
what is underfitting?
when the model is too simple to learn underlying structure of the data
what are some solutions to underfitting?
select a more complex model gather better features to the learning algorithm reducing constraints (reduce regularization hyperparameter)
what are some examples of bad data?
insufficient quantity of training data
nonrepresentative data
poor quality data
irrelevant features
what are two main challenges of machine learning?
bad data and bad algorithms
what is feature selection?
selecting the most useful features to train among existing features
what is feature extraction?
combining existing features to produce a more powerful one
what is poor quality data?
if the training data is full of errors, outliers and noise, it will make it harder for the system to detect the underlying patterns, so your system is less likely to perform well.
what is non-representative data?
the training data needs to be representative of the new cases you want to generalise to
what happens if a sample is too small?
you will have sampling noise
what happens if a sample is too large?
can be nonrepresentative if the sampling method is flawed