Lecture 6:Deep Learning, CNNs Flashcards by Nour Hadie Hassoun

What are the 3 major breakthroughs in deep learning?

Speech Recognition & machine translation(2010+)

2.Image Recognition & computer vision(2012+)

3.Natural language processing (2014+)

How well did you know this?

Not at all

Perfectly

How to compute input to hidden?

compute net activation net = x*W + b
x–>inputs
W–>weights
b–>bias weights
2.compute activation function : h=S(neth)

How well did you know this?

Not at all

Perfectly

How to compute hidden to output?

o=S(h*W + b)

How well did you know this?

Not at all

Perfectly

What are the three initial drawbacks?

1.Standard back propagation with sigmoid activation does not scale well with multiple layers
2.Overfitting
3.Multilayered ANNs need lots of labeled data

How well did you know this?

Not at all

Perfectly

What are the two types of problems when multiplying the gradients many times for each layer?

Vanishing gradient problem
2.Exploding gradient problem

How well did you know this?

Not at all

Perfectly

What does the vanishing gradient problem consist of(4)?

-gradients shrink exponentially with nb of layers–>weight updates get smaller–>weights of early layers change very slowly –> learning very slow

How well did you know this?

Not at all

Perfectly

What does the exploding gradient problem consist of(4)?

-multiplying gradients makes them grow exponentially–>weight updates get larger and larger–>weights become so large as to overflow and result in NaN values

How well did you know this?

Not at all

Perfectly

What are the two solutions to initial drawback #1?

1.Use other activation functions
2.Do gradient clipping: set bounds on the gradients

How well did you know this?

Not at all

Perfectly

What is overfitting(second initial drawback)?

-Large network–> lots of parameters–>increased capacity to learn by heart

How well did you know this?

Not at all

Perfectly

What are the 2 solutions to overfitting?

1.Regularization
2.Dropout

How well did you know this?

Not at all

Perfectly

What does regularization consist of?

-modify the error function that we minimize to penalize large weights

How well did you know this?

Not at all

Perfectly

What does dropout consist of(2)?

-keep a neutron active with some probability p or setting it to 0 otherwise
-prevents the network from becoming too dependent on any one neuronal

How well did you know this?

Not at all

Perfectly

What is the problem with the third initial drawback?

Most data is not labeled

How well did you know this?

Not at all

Perfectly

What is the solution to the third initial drawback?

Pre-train the network with features found automatically using unsupervised data –> automatic feature learning

How well did you know this?

Not at all

Perfectly

What is Classic ML?

-Manual extraction of features

How well did you know this?

Not at all

Perfectly

What does classic ML require?

Study These Flashcards

-Labeled data and hand-crafted features

What are 3 cons of classic ML?

Study These Flashcards

-Needs expert knowledge
-Time-consuming and expensive
-Dos not generalize to other domains

What does automatic feature learning consist of?

Study These Flashcards

Each layer learns more abstract features that are then combined/composed into higher-level features automatically

What are 3 pros of automatic feature learning?

Study These Flashcards

-We feed the network the raw data
-The features are learned by the network
-Features learned can be re-used in similar tasks

What are 5 advantages of unsupervised feature learning?

Study These Flashcards

-more unlabeled data available than labeled data
-Humans learn first from unlabeled examples
-less risk of over-fitting
-no need for manual feature engineering
-features are organized into multiple layers : each level creates new features from combinations of features from level below + more abstract than the ones below (hierarchy of features)

What are the 2 steps of the general architectures of a deep network?

Study These Flashcards

1.Unsupervised pre-training of neural network using unlabeled data eg.autoencoder
2.Supervised training with labeled data using features learned from above with a standard classifier eg.ANN

What are 2 ways to learn a representation of the data(1st step)?

Study These Flashcards

-Deep Belief Networks (mid 2000s)
-Autoencoders(2006)

What is a CNN?

Study These Flashcards

Convolutional Neural Network

What does the convolutional layer consist of?

Study These Flashcards

-Uses a filter/kernel that convolves on the image
-The filter is a small weight matrix to learn

What is the objective of the convolutional layer?

The network learns the values of the filter(s) that activate when they see some visual feature that is useful to identify the object(final classification)

What are the 2 convolution hyper-parameters?

1.Stride 2.Padding

What is stride?

How many steps do you move the filter every time

What is padding?

Filter should pick up high values surrounded by low values

What are pooling layers used for(2)?

-to reduce the size of the activation maps -so that we reduce the nb of parameters of the network and avoid overfitting

What is max pooling?

Similar to the convolution step, but instead of performing a matrix multiplication, max pooling takes the maximum value within the window.

What is average pooling?

Taking the average value over an input window for each channel of the input.

What is the architecture of a CNN?

1.Stack: -convolutional layers -pooling layers 2.Finish off with a fully connected layer at the end for final classification

What are 2 examples of successful CNN networks?

LeNet: -first successful applications of CNNs -1990s -used to read zip codes AlexNet: -first work that made CNNs popular for computer vision

History of AI...

Artificial Intelligence(1950s- 1980s)--> rules written by experts:( Machine Learning(1980s-2010s)-->rules learns via the data BUT features identified by experts Deep Learning(2010s-)-->rules AND features learned from the data

Lecture 6:Deep Learning, CNNs Flashcards

(34 cards)