adv/disadv. Flashcards
Linear Regression adv
Main advantages :
- very simple algorithm
- doesn’t take a lot of memory
- quite fast
- easy to explain
Linear Regression drawbacks
Main drawbacks:
- requires the data to be linearly spread (see « Polynomial Regression » if you think you need a polynomial fitting)
- is unstable in case features are redundant, i.e if there is multicollinearity (note that, in that case you should have a look to « Elastic-Net or « Ridge-Regression »).
Decision Tree
adv
Main advantages:
- quite simple
- easy to communicate about
- easy to maintain
- few parameters are required and they are quite intuitive
- prediction is quite fast
Decision Tree
disadv
Main drawbacks:
- can take a lot of memory (the more features you have, the deeper and larger your decision tree is likely to be)
- naturally overfits a lot (it generates high-variance models, it suffers less from that if the branches are pruned, though)
- not capable of being incrementally improved
Random Forest
adv
Main advantages:
- is robust to overfitting (thus solving one of the biggest disadvantages of decision trees)
- parameterization remains quite simple and intuitive
- performs very well when the number of features is big and for large quantity of learning data
Random Forest
disadv
Main disadvantages:
- models generated with Random Forest may take a lot of memory
- learning may be slow (depending on the parameterization)
- not possible to iteratively improve the generated models.
Boosting
adv
Main advantages are:
- parameterization is quite simple, even a very simple weak-predictor may allow the training of a strong model at the end (for instance: having a decision stump as a weak predictor may lead to great performance!)
- is quite robust to overfitting (as it’s a serial approach, it can be optimized for prediction)
- performs well for large amounts of data
Boosting
disadv
Main drawbacks:
- training may be time consuming (especially if we train, on top of it, an optimization approach for the prediction, such as a Cascade or a Soft-Cascade approach)
- may take a lot of memory, depending on the weak-predictor
Support Vector Machine (SVM)
adv
Main advantages:
- is mathematically designed to reduce the overfitting by maximizing the margin between data points
- prediction is fast
- can manage a lot of data and a lot of features (high dimensional problems)
- doesn’t take too much memory to store
Support Vector Machine (SVM)
disadv
Main drawbacks:
- can be time consuming to train
- parameterization can be tricky in some cases
- communicating isn’t easy
Neural networks
adv
Main advantages:
- very complex models can be trained
- can be used as a kind of black box, without performing a complex feature engineering before training the model
- numerous kinds of network structures can be used, allowing you to enjoy very interesting properties (CNN, RNN, LSTM, etc.). Combined with the “deep approach” even more complex models can be learned unleashing new possibilities: object recognition has been recently greatly improved using Deep Neural Networks.
Neural networks
disadv
Main drawbacks:
- very hard to simply explain (people usually say that a Neural Network behaves and learns like a little humain brain)
- parameterization is very complex (what kind of network structure should you choose? What are the best activation functions for my problem?)
- requires a lot more learning data than usual
- final model may takes a lot of memory.
The K-Means algorithm
adv
Main advantages:
• parametrization is intuitive and works well with a lot of data.
Advantage or drawback:
• the K-Means algorithm is actually more a partitioning algorithm than a clustering algorithm. It means that, if there is noise in your unlabelled data, it will be incorporated within your final clusters. In case you want to avoid modelizing the noise, you might want to go to a more elaborated approach such as the HDBSCAN clustering algorithm or the OPTICS algorithm.
The K-Means algorithm
disadv
Main drawbacks:
- needs to know in advance how many clusters there will be in your data … This may require a lot of trials to “guess” the best K number of clusters to define.
- Clusterization may be different from one run to another due to the random initialization of the algorithm