4404 Flashcards
What is machine learning
Machine learns from exp E with respect to class of task T and performance measure of P. Performance P at tasks in T improves with exp E
Machine learning branches
Supervised | Unsupervised
–
Regression, Classification |
Regression for continous data and predictions, Classification for discrete data and predictions
Vectorization
use vectors and matrices to represent data and feed into an algorithm. This is called vectorization
Cause of underfitting
Solution to underfitting
Cause:
Occurs when model is too simple to learn underlying structure of data
Solution:
Select a more complex model
Feed better features to learning alg
reduce regularization
Leads to high bias
Cause of and solutions for overfitting
Cause:
Too complex of a model compared to size and noise of data
Solutions:
Use model with fewer parameters (linear vs high degree polynomial)
Constraint the model through regularization
Use more training data
Fix data errors and remove outliers
Cost/Performance Y-axis with respect to Jtrain and Jval meanings
Lower cost represents less errors in predictions
Higher cost reps more errors
Relation between Jtrain and Jval represent how well a model is generalizing to the data
Generalization
Ability of a model to make predictions on unseen data
How to mitigate overfitting
increase regularization parameter
increase data samples
use fewer features
early stopping/fewer epochs for training
How to mitigate underfitting
decrease regularization parameter
use a more complicated model/ more features
Regularization
Prevent overfitting by adding a penalty to model complexity
Discourage model from fitting the training data to closely | includes noise and outliers
Regularization parameter
Term added to cost function to train the model. Controls trade-off between fitting and keeping parameters small.
Larger param penalizes larger coefficients more strongly
Lower param allows model to fit training data more closely
L1 Regularization (Lasso)
Adds a penalty to the absolute value of the coefficients
encourages sparsity in the model, driving some coefficients to zero selecting a simpler model with fewer features
L2 Regularization (Ridge)
Adds a penalty equal to the square of the coefficients
encourages small coefficients, but not necessarily 0.
Linear Classification
Perceptron is a linear supervised binary classification alg that can apply when classes are linearly separable.
If training data can be separated by linear decision rule, they are linearly separable.