DL CNNs & RNNs Flashcards

1
Q

How to calculate number of parameters (weights and biases) in a CNN?

A

((filter width * filter height) * (num of old channels/filters) + 1 for bias) * num of new filters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is dropout?

A

Randomly setting a fraction of input units to 0 at each update during training time - helps prevent overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do we combat exploding and vanishing gradients?

A
  1. Normaliztion of inputs + Careful initialization of weights
  2. Regularization
  3. Gradient clipping
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Tips for Choosing Initial weights

A
  1. Never set to all zero
  2. Try somewhere between -0.2 and 0.2 (or fanin)
  3. Biases are often initialized to 0.01 or similar small value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is regularization?

A

“Regularization is any modification we make to a
learning algorithm that is intended to reduce its
generalization error but not its training error.”
Goodfellow et al, 2016

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Regularization methods for regression

A
  1. Lasoo - L1 encourages sparseness in weight matrices
  2. ridge - L2 (weight decay/parameter shrinkage)
  3. elastic net - combines lasoo and ridge
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Inverted dropout

A

During training randomly drop out units according

to a dropout probability at each training epoch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does dropout work?

A

Spread out weights - cannot rely on any one input too much

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Disadv. of dropout

A

Introduces another hyperparameter - dropout probability - often one for each layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What’s another type of regularization apart from inverted dropout?

A
  • Dataset augmentation
  • Synthesize examples by flipping, rotating, cropping, distorting
  • makes dataset more robust
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is early stopping?

A

Allowing a model to overfit and then rollback to the point at which the error curve on the training and test sets begin to diverge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

1:1 RNN

A

Vanilla network without RNN

Image classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

1:M RNN

A

Image captionning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

M:1 RNN

A

Sentiment analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

M:M RNN

A

Machine translation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

M:M RNN

A

Video classification (syncned sequence input and output)

17
Q

Why Convolutions?

A

The main advantages of using convolutions are parameter sharing and sparsity of connections. Parameter sharing is helpful because reduces the number of weight parameters is one layer without losing accuracy. Additionally, the convolution operation breaks down the input features into a smaller feature space, so that each output value depends on a small number of inputs and can be quickly adjusted.

18
Q

CNNs

A

Designed to process data that come in the form of multiple arrays, for example, colour images composed of 3 2-D arrays containing pixel intensities in the 3 colour channels

19
Q

Key features of CNNs

A
  • local connections
  • shared weights
  • pooling
  • use of many layers
  • roots in neocognition
20
Q

Role of convolutional layer

A
  • detect local conjunctions of features from the previous layer
21
Q

Role of pooling layer

A

To merge semantically similar features into one

22
Q

Success of CNNs

A

ImageNet 2012

Halved error rates of competing approaches
Efficient use of GPUs, ReLUs
New regularizations - dropout
Techniques to generate more training examples by deforming existing ones

23
Q

RNNs are good for what tasks

A

Those that involve sequential input (speech and language)

  • process an input sequence one element at a time, maintaining in their hidden units a ‘state vector’ that implicitly contains information about the history of all the past elements of the sequence
24
Q

Useful things about CNNs

A
  1. Partial connectivity (i.e. sparse connections) - not all the units in layer i are connected to all the units in layer i + 1
  2. Weight sharing - different parts of the network are forced to use the same weights
25
Q

Four key ideas behind CNNs that take adv of the properties of natural signals

A
  1. local connections
  2. shared weights
  3. pooling
  4. use of many layers
26
Q

Convolutional layer

A
  • units in a conv layer are organized in feature map, within which each unit is connected to local patches in the feature maps of the previous layer through a set of weights called a filter bank
  • result of this local weighted sum is then passed through a non-linearity such as a Relu
  • All units in a feature map share the same filter bank
  • Different feature maps in a layer use different filter banks

Why this architecture:

  1. Local groups of values are often highly correlated, forming distinctive motifs that are easily detected
  2. local statistics of images are invariant to location (a motif can appear anywhere on the image - hence the idea that units at different locations share the same weights

Mathematically, the filtering operation performed by a feature map is a discrete convolution (name)

27
Q

Role of conv layer

A

To detect local conjunctions of features from the previous layer

28
Q

Role of pooling

A

To merge semantically similar features into one

Reduces dimensions of the representation and creates an invariance to small shifts and distortions

29
Q

How is training in a CNN done?

A

Backpropagating is the same

Allows all the filter banks to be trained

30
Q

Deep NNs exploit the property that many natural signals are compositional hierarchies in which higher-level features are obtained by composing lower-level ones.

A

Images -> local combos of edges form motifs -> parts -> objects

Same with speech

Pooling allows representations to vary very little when elements in the previous layer vary in position and appearance

31
Q

Issue of representation

A

debate between logic-inspired and NN-inspired paradigms

logic - symbols are identical or non-identical to other symbols, no reasoning, rule-based inference

NN - just uses big activity vectors, big weight matrics and scalar non-linearities to perform the type of fast ‘intuitive’ inferene that underpins effortless commonsense reasoning

32
Q

LSTM

A

RNNs have difficult holding onto information for very long

memory cells - account for explicit memory = special hidden units, accumulator or gated leaky neuron

33
Q

RNNs unfolded

A

Very deep FF networks - in which all the layers share the same weights

34
Q

CNNs can make use of RL to focus on where to look

A

attention