Lecture 6:Deep Learning, CNNs Flashcards

1
Q

What are the 3 major breakthroughs in deep learning?

A
  1. Speech Recognition & machine translation(2010+)

2.Image Recognition & computer vision(2012+)

3.Natural language processing (2014+)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How to compute input to hidden?

A
  1. compute net activation net = x*W + b
    x–>inputs
    W–>weights
    b–>bias weights
    2.compute activation function : h=S(neth)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to compute hidden to output?

A

o=S(h*W + b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the three initial drawbacks?

A

1.Standard back propagation with sigmoid activation does not scale well with multiple layers
2.Overfitting
3.Multilayered ANNs need lots of labeled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two types of problems when multiplying the gradients many times for each layer?

A
  1. Vanishing gradient problem
    2.Exploding gradient problem
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the vanishing gradient problem consist of(4)?

A

-gradients shrink exponentially with nb of layers–>weight updates get smaller–>weights of early layers change very slowly –> learning very slow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does the exploding gradient problem consist of(4)?

A

-multiplying gradients makes them grow exponentially–>weight updates get larger and larger–>weights become so large as to overflow and result in NaN values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the two solutions to initial drawback #1?

A

1.Use other activation functions
2.Do gradient clipping: set bounds on the gradients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is overfitting(second initial drawback)?

A

-Large network–> lots of parameters–>increased capacity to learn by heart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the 2 solutions to overfitting?

A

1.Regularization
2.Dropout

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does regularization consist of?

A

-modify the error function that we minimize to penalize large weights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does dropout consist of(2)?

A

-keep a neutron active with some probability p or setting it to 0 otherwise
-prevents the network from becoming too dependent on any one neuronal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the problem with the third initial drawback?

A

Most data is not labeled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the solution to the third initial drawback?

A

Pre-train the network with features found automatically using unsupervised data –> automatic feature learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Classic ML?

A

-Manual extraction of features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does classic ML require?

A

-Labeled data and hand-crafted features

17
Q

What are 3 cons of classic ML?

A

-Needs expert knowledge
-Time-consuming and expensive
-Dos not generalize to other domains

18
Q

What does automatic feature learning consist of?

A

Each layer learns more abstract features that are then combined/composed into higher-level features automatically

19
Q

What are 3 pros of automatic feature learning?

A

-We feed the network the raw data
-The features are learned by the network
-Features learned can be re-used in similar tasks

20
Q

What are 5 advantages of unsupervised feature learning?

A

-more unlabeled data available than labeled data
-Humans learn first from unlabeled examples
-less risk of over-fitting
-no need for manual feature engineering
-features are organized into multiple layers : each level creates new features from combinations of features from level below + more abstract than the ones below (hierarchy of features)

21
Q

What are the 2 steps of the general architectures of a deep network?

A

1.Unsupervised pre-training of neural network using unlabeled data eg.autoencoder
2.Supervised training with labeled data using features learned from above with a standard classifier eg.ANN

22
Q

What are 2 ways to learn a representation of the data(1st step)?

A

-Deep Belief Networks (mid 2000s)
-Autoencoders(2006)

23
Q

What is a CNN?

A

Convolutional Neural Network

24
Q

What does the convolutional layer consist of?

A

-Uses a filter/kernel that convolves on the image
-The filter is a small weight matrix to learn

25
What is the objective of the convolutional layer?
The network learns the values of the filter(s) that activate when they see some visual feature that is useful to identify the object(final classification)
26
What are the 2 convolution hyper-parameters?
1.Stride 2.Padding
27
What is stride?
How many steps do you move the filter every time
28
What is padding?
Filter should pick up high values surrounded by low values
29
What are pooling layers used for(2)?
-to reduce the size of the activation maps -so that we reduce the nb of parameters of the network and avoid overfitting
30
What is max pooling?
Similar to the convolution step, but instead of performing a matrix multiplication, max pooling takes the maximum value within the window.
31
What is average pooling?
Taking the average value over an input window for each channel of the input.
32
What is the architecture of a CNN?
1.Stack: -convolutional layers -pooling layers 2.Finish off with a fully connected layer at the end for final classification
33
What are 2 examples of successful CNN networks?
LeNet: -first successful applications of CNNs -1990s -used to read zip codes AlexNet: -first work that made CNNs popular for computer vision
34
History of AI...
Artificial Intelligence(1950s- 1980s)--> rules written by experts:( Machine Learning(1980s-2010s)-->rules learns via the data BUT features identified by experts Deep Learning(2010s-)-->rules AND features learned from the data