adv/disadv. Flashcards

1
Q

Linear Regression adv

A

Main advantages :

  • very simple algorithm
  • doesn’t take a lot of memory
  • quite fast
  • easy to explain
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Linear Regression drawbacks

A

Main drawbacks:

  • requires the data to be linearly spread (see « Polynomial Regression » if you think you need a polynomial fitting)
  • is unstable in case features are redundant, i.e if there is multicollinearity (note that, in that case you should have a look to « Elastic-Net or « Ridge-Regression »).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Decision Tree

adv

A

Main advantages:

  • quite simple
  • easy to communicate about
  • easy to maintain
  • few parameters are required and they are quite intuitive
  • prediction is quite fast
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Decision Tree

disadv

A

Main drawbacks:

  • can take a lot of memory (the more features you have, the deeper and larger your decision tree is likely to be)
  • naturally overfits a lot (it generates high-variance models, it suffers less from that if the branches are pruned, though)
  • not capable of being incrementally improved
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Random Forest

adv

A

Main advantages:

  • is robust to overfitting (thus solving one of the biggest disadvantages of decision trees)
  • parameterization remains quite simple and intuitive
  • performs very well when the number of features is big and for large quantity of learning data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Random Forest

disadv

A

Main disadvantages:

  • models generated with Random Forest may take a lot of memory
  • learning may be slow (depending on the parameterization)
  • not possible to iteratively improve the generated models.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Boosting

adv

A

Main advantages are:

  • parameterization is quite simple, even a very simple weak-predictor may allow the training of a strong model at the end (for instance: having a decision stump as a weak predictor may lead to great performance!)
  • is quite robust to overfitting (as it’s a serial approach, it can be optimized for prediction)
  • performs well for large amounts of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Boosting

disadv

A

Main drawbacks:

  • training may be time consuming (especially if we train, on top of it, an optimization approach for the prediction, such as a Cascade or a Soft-Cascade approach)
  • may take a lot of memory, depending on the weak-predictor
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Support Vector Machine (SVM)

adv

A

Main advantages:

  • is mathematically designed to reduce the overfitting by maximizing the margin between data points
  • prediction is fast
  • can manage a lot of data and a lot of features (high dimensional problems)
  • doesn’t take too much memory to store
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Support Vector Machine (SVM)

disadv

A

Main drawbacks:

  • can be time consuming to train
  • parameterization can be tricky in some cases
  • communicating isn’t easy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Neural networks

adv

A

Main advantages:

  • very complex models can be trained
  • can be used as a kind of black box, without performing a complex feature engineering before training the model
  • numerous kinds of network structures can be used, allowing you to enjoy very interesting properties (CNN, RNN, LSTM, etc.). Combined with the “deep approach” even more complex models can be learned unleashing new possibilities: object recognition has been recently greatly improved using Deep Neural Networks.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Neural networks

disadv

A

Main drawbacks:

  • very hard to simply explain (people usually say that a Neural Network behaves and learns like a little humain brain)
  • parameterization is very complex (what kind of network structure should you choose? What are the best activation functions for my problem?)
  • requires a lot more learning data than usual
  • final model may takes a lot of memory.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The K-Means algorithm

adv

A

Main advantages:

• parametrization is intuitive and works well with a lot of data.
Advantage or drawback:

• the K-Means algorithm is actually more a partitioning algorithm than a clustering algorithm. It means that, if there is noise in your unlabelled data, it will be incorporated within your final clusters. In case you want to avoid modelizing the noise, you might want to go to a more elaborated approach such as the HDBSCAN clustering algorithm or the OPTICS algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The K-Means algorithm

disadv

A

Main drawbacks:

  • needs to know in advance how many clusters there will be in your data … This may require a lot of trials to “guess” the best K number of clusters to define.
  • Clusterization may be different from one run to another due to the random initialization of the algorithm
How well did you know this?
1
Not at all
2
3
4
5
Perfectly