Lecture 6:Deep Learning, CNNs Flashcards
What are the 3 major breakthroughs in deep learning?
- Speech Recognition & machine translation(2010+)
2.Image Recognition & computer vision(2012+)
3.Natural language processing (2014+)
How to compute input to hidden?
- compute net activation net = x*W + b
x–>inputs
W–>weights
b–>bias weights
2.compute activation function : h=S(neth)
How to compute hidden to output?
o=S(h*W + b)
What are the three initial drawbacks?
1.Standard back propagation with sigmoid activation does not scale well with multiple layers
2.Overfitting
3.Multilayered ANNs need lots of labeled data
What are the two types of problems when multiplying the gradients many times for each layer?
- Vanishing gradient problem
2.Exploding gradient problem
What does the vanishing gradient problem consist of(4)?
-gradients shrink exponentially with nb of layers–>weight updates get smaller–>weights of early layers change very slowly –> learning very slow
What does the exploding gradient problem consist of(4)?
-multiplying gradients makes them grow exponentially–>weight updates get larger and larger–>weights become so large as to overflow and result in NaN values
What are the two solutions to initial drawback #1?
1.Use other activation functions
2.Do gradient clipping: set bounds on the gradients
What is overfitting(second initial drawback)?
-Large network–> lots of parameters–>increased capacity to learn by heart
What are the 2 solutions to overfitting?
1.Regularization
2.Dropout
What does regularization consist of?
-modify the error function that we minimize to penalize large weights
What does dropout consist of(2)?
-keep a neutron active with some probability p or setting it to 0 otherwise
-prevents the network from becoming too dependent on any one neuronal
What is the problem with the third initial drawback?
Most data is not labeled
What is the solution to the third initial drawback?
Pre-train the network with features found automatically using unsupervised data –> automatic feature learning
What is Classic ML?
-Manual extraction of features
What does classic ML require?
-Labeled data and hand-crafted features
What are 3 cons of classic ML?
-Needs expert knowledge
-Time-consuming and expensive
-Dos not generalize to other domains
What does automatic feature learning consist of?
Each layer learns more abstract features that are then combined/composed into higher-level features automatically
What are 3 pros of automatic feature learning?
-We feed the network the raw data
-The features are learned by the network
-Features learned can be re-used in similar tasks
What are 5 advantages of unsupervised feature learning?
-more unlabeled data available than labeled data
-Humans learn first from unlabeled examples
-less risk of over-fitting
-no need for manual feature engineering
-features are organized into multiple layers : each level creates new features from combinations of features from level below + more abstract than the ones below (hierarchy of features)
What are the 2 steps of the general architectures of a deep network?
1.Unsupervised pre-training of neural network using unlabeled data eg.autoencoder
2.Supervised training with labeled data using features learned from above with a standard classifier eg.ANN
What are 2 ways to learn a representation of the data(1st step)?
-Deep Belief Networks (mid 2000s)
-Autoencoders(2006)
What is a CNN?
Convolutional Neural Network
What does the convolutional layer consist of?
-Uses a filter/kernel that convolves on the image
-The filter is a small weight matrix to learn