LU3 Flashcards
What is semi- supervised learning
Only some of the data is annotated
what is classification
learns how to assign a class label to examples from the problem domain ( dog / not dog)
What is a regression problem
Learns to predict continuous variables (temperature)
What is a clustering problem
Groups data samples into a specified number of groups ( grouping lemons according to sizes)- is unsupervised
what is underfitting?
Underfitting is when a model does not capture the underlying trend of the data
what will happen if we have a underfitted model
The accuracy will be bad
what does it say about our model if it is underfitted
our algorithm does not fit the data well enough
when does underfitting occur
when we have less data, or if we build a linear model with non-linear data
how can you avoid underfitting
more data and reducing the features by feature selection
what does it mean for bias and variance when we underfit
high bias
low variance
name some techniques to reduce underfitting
increase model complexity
increase number of features (feature engineering)
remove noise from data
increase the number of epochs or increase the duration of training to get better results
why does overfitting occur
when we train our data with a lot of data (too much)
what happens when overfitting occurs
the model starts learning from the noise and then does not categorize the data correctly because of too many details and noise
why does overfitting occur
the algorithm has too much freedom in building the model and leads to unrealistic models
How do you avoid overfitting
a linear algorithm (for linear data) or using parameters such as maximal depth (decision trees