Session 7.1 Flashcards
Deep Learning consists in…
training a neural network in which the inputs and outputs are the same
- Input and output are the same image
Hidden layers have fewer perceptrons
- Each perceptron in a hidden layer has to represent a more elaborate
concept - Features are automatically created: no need to define them manually
Deep learning uses relatively old methods:
Neural networks have been around for a long time (1950s).
Why now?
Big Data
- Large amounts of data are now available
- Possibility of using thousands of machines to solve a single problem
Underfitting
A model that is too simple does not fit the data well (high bias)
e.g., fitting a quadractic function with a linear model
Overfitting
A model that is too complex fits the data too well (high variance)
e.g., fitting a quadractic function with a 3rd degree function
Bias
a model that underfits is wrong many times (high bias) but is not highly affected by slightly different training data
Variance
a model that overfits is right on average, but is highly sensitive to specific training data
We can reduce variance by
by putting many models together and aggregating their outcomes (without necessarily increasing bias)
-> this is the concept of ensemble methods
Ensemble methods use
multiple algorithms to obtain better predictive performance than could be obtained from any of the algorithms by itself
Using multiple algorithms usually increases model performance by:
reducing variance: models are less dependent on the specific training data
Bagging (or bootstrap aggregation)
Creates multiple data sets from the original training data by bootstrapping – re-sample with repetition.
Runs several models and aggregates output with a voting system
Other ensemble methods
Random Forest
combines bagging with random selection of features (or predictors)
Other ensemble methods
Boosting
applies classifiers sequentially, assigning higher weights to observations that have been mis-classified by the previous methods
We can use predictive models for predicting outcomes based on individual attributes
However, models based only on observational data…
do not inform how users would react to a specific intervention
In most cases the best way to find out the effect of a specific intervention is to run an experiment:
1 Randomly assign customers to different treatment groups
2 Compare differences in behavior among treatment groups