Lecture 6:Deep Learning, CNNs Flashcards

1
Q

What are the 3 major breakthroughs in deep learning?

A
  1. Speech Recognition & machine translation(2010+)

2.Image Recognition & computer vision(2012+)

3.Natural language processing (2014+)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How to compute input to hidden?

A
  1. compute net activation net = x*W + b
    x–>inputs
    W–>weights
    b–>bias weights
    2.compute activation function : h=S(neth)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to compute hidden to output?

A

o=S(h*W + b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the three initial drawbacks?

A

1.Standard back propagation with sigmoid activation does not scale well with multiple layers
2.Overfitting
3.Multilayered ANNs need lots of labeled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two types of problems when multiplying the gradients many times for each layer?

A
  1. Vanishing gradient problem
    2.Exploding gradient problem
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the vanishing gradient problem consist of(4)?

A

-gradients shrink exponentially with nb of layers–>weight updates get smaller–>weights of early layers change very slowly –> learning very slow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does the exploding gradient problem consist of(4)?

A

-multiplying gradients makes them grow exponentially–>weight updates get larger and larger–>weights become so large as to overflow and result in NaN values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the two solutions to initial drawback #1?

A

1.Use other activation functions
2.Do gradient clipping: set bounds on the gradients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is overfitting(second initial drawback)?

A

-Large network–> lots of parameters–>increased capacity to learn by heart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the 2 solutions to overfitting?

A

1.Regularization
2.Dropout

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does regularization consist of?

A

-modify the error function that we minimize to penalize large weights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does dropout consist of(2)?

A

-keep a neutron active with some probability p or setting it to 0 otherwise
-prevents the network from becoming too dependent on any one neuronal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the problem with the third initial drawback?

A

Most data is not labeled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the solution to the third initial drawback?

A

Pre-train the network with features found automatically using unsupervised data –> automatic feature learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Classic ML?

A

-Manual extraction of features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does classic ML require?

A

-Labeled data and hand-crafted features

17
Q

What are 3 cons of classic ML?

A

-Needs expert knowledge
-Time-consuming and expensive
-Dos not generalize to other domains

18
Q

What does automatic feature learning consist of?

A

Each layer learns more abstract features that are then combined/composed into higher-level features automatically

19
Q

What are 3 pros of automatic feature learning?

A

-We feed the network the raw data
-The features are learned by the network
-Features learned can be re-used in similar tasks

20
Q

What are 5 advantages of unsupervised feature learning?

A

-more unlabeled data available than labeled data
-Humans learn first from unlabeled examples
-less risk of over-fitting
-no need for manual feature engineering
-features are organized into multiple layers : each level creates new features from combinations of features from level below + more abstract than the ones below (hierarchy of features)

21
Q

What are the 2 steps of the general architectures of a deep network?

A

1.Unsupervised pre-training of neural network using unlabeled data eg.autoencoder
2.Supervised training with labeled data using features learned from above with a standard classifier eg.ANN

22
Q

What are 2 ways to learn a representation of the data(1st step)?

A

-Deep Belief Networks (mid 2000s)
-Autoencoders(2006)

23
Q

What is a CNN?

A

Convolutional Neural Network

24
Q

What does the convolutional layer consist of?

A

-Uses a filter/kernel that convolves on the image
-The filter is a small weight matrix to learn

25
Q

What is the objective of the convolutional layer?

A

The network learns the values of the filter(s) that activate when they see some visual feature that is useful to identify the object(final classification)

26
Q

What are the 2 convolution hyper-parameters?

A

1.Stride
2.Padding

27
Q

What is stride?

A

How many steps do you move the filter every time

28
Q

What is padding?

A

Filter should pick up high values surrounded by low values

29
Q

What are pooling layers used for(2)?

A

-to reduce the size of the activation maps
-so that we reduce the nb of parameters of the network and avoid overfitting

30
Q

What is max pooling?

A

Similar to the convolution step, but instead of performing a matrix multiplication, max pooling takes the maximum value within the window.

31
Q

What is average pooling?

A

Taking the average value over an input window for each channel of the input.

32
Q

What is the architecture of a CNN?

A

1.Stack:
-convolutional layers
-pooling layers
2.Finish off with a fully connected layer at the end for final classification

33
Q

What are 2 examples of successful CNN networks?

A

LeNet:
-first successful applications of CNNs
-1990s
-used to read zip codes

AlexNet:
-first work that made CNNs popular for computer vision

34
Q

History of AI…

A

Artificial Intelligence(1950s- 1980s)–> rules written by experts:(

Machine Learning(1980s-2010s)–>rules learns via the data BUT features identified by experts

Deep Learning(2010s-)–>rules AND features learned from the data