Lecture 6:Deep Learning, CNNs Flashcards
What are the 3 major breakthroughs in deep learning?
- Speech Recognition & machine translation(2010+)
2.Image Recognition & computer vision(2012+)
3.Natural language processing (2014+)
How to compute input to hidden?
- compute net activation net = x*W + b
x–>inputs
W–>weights
b–>bias weights
2.compute activation function : h=S(neth)
How to compute hidden to output?
o=S(h*W + b)
What are the three initial drawbacks?
1.Standard back propagation with sigmoid activation does not scale well with multiple layers
2.Overfitting
3.Multilayered ANNs need lots of labeled data
What are the two types of problems when multiplying the gradients many times for each layer?
- Vanishing gradient problem
2.Exploding gradient problem
What does the vanishing gradient problem consist of(4)?
-gradients shrink exponentially with nb of layers–>weight updates get smaller–>weights of early layers change very slowly –> learning very slow
What does the exploding gradient problem consist of(4)?
-multiplying gradients makes them grow exponentially–>weight updates get larger and larger–>weights become so large as to overflow and result in NaN values
What are the two solutions to initial drawback #1?
1.Use other activation functions
2.Do gradient clipping: set bounds on the gradients
What is overfitting(second initial drawback)?
-Large network–> lots of parameters–>increased capacity to learn by heart
What are the 2 solutions to overfitting?
1.Regularization
2.Dropout
What does regularization consist of?
-modify the error function that we minimize to penalize large weights
What does dropout consist of(2)?
-keep a neutron active with some probability p or setting it to 0 otherwise
-prevents the network from becoming too dependent on any one neuronal
What is the problem with the third initial drawback?
Most data is not labeled
What is the solution to the third initial drawback?
Pre-train the network with features found automatically using unsupervised data –> automatic feature learning
What is Classic ML?
-Manual extraction of features