DL Part 2 Flashcards

1
Q

Name the advantages of the LSTM cell/GRU compared to the Elman cell

A
  • ability to capture dependencies of different time scales (LSTM –> )
  • control information flow via gates
  • additive calculation of state preserves error during backpropagation –> address the vanishing/exploding gradient problem
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why are RNNs suitable for problems with time series?

A

They contain hidden states as “short memories” to connect between times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What role does the hidden state play in RNNs?

A
  1. memory of the network
  2. model temporal dynamic/dependencies
  3. context capturing in sequence data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the pros and cons of a typical RNN architecture?

A

(+) * Possibility of processing input of any length
* Model size not increasing with the size of the input
* Computation takes into account historical information
* Weights are shared across time
(-)* Computation is slow
* Difficulty accessing information from a long time ago
* Cannot consider any future input for the current state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some RNN basic architectures? Name three applications where many-to-one and one-to-many RNNs would be beneficial.

A

1-to-1: Classic feed-forward for image classification
1-to-many: image captioning
many-to-1: sentiment analysis
many-to-many: 1. machine translation 2. video classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What role does the hidden state play in RNNs?

A
  • the memory of the network
  • temporal dependencies
  • context capturing in sequence data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

describe what an element of a batch would be for a recurrent network e.g. by using an example.

A

an element of a batch represents a sequence of data points.
e.g, in a language modeling task where the input is a sequence of words, an element of a batch would be a sentence or a paragraph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why does the required memory space increase with higher batch sizes during training?

A
  • more activation tensors
  • larger gradient tensors stored until used to update parameters
  • various intermediate calculations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between BPTT and TBPTT?

A
  • BPTT: One update requires backpropagation through a complete sequence
  • TBPTT: truncated BPTT - keep processing sequence as whole but the input is truncated into manageable fixed-size segments
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the main challenges of training RNNs?

A
  • maintaining long-term dependencies due to the problem known as vanishing gradients.
  • hard to detect long-term dependencies due to the hidden state being overwritten at each time step
  • Short-term dependencies work fine
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the problem with deep RNNs?

A
  • Gradients prone to vanishing or exploding
    + Initial input and weights >0, N is large (deep NN)→ exploding gradients
    + Using activation functions that tend to produce small gradients e.g. sigmoid or tanh → Output is passed through tanh → bounded between -1 and 1 → gradients get smaller after back propagated through tanh → vanishing Gradients
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Give several applications where a recurrent neural network can be useful and explain why.

A

due to their ability to process sequential data and capture temporal dependencies
- NLP e.g language translation, sentiment analysis, text generation, speech recognition
- Time Series Analysis: analyze historical data trends, forecasting future patterns of stock prices/market demand
- speech and audio processing
- image and video analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the main idea behind LSTMs?

A

introduction of gates that control writing and accessing “memory” in additional cell state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the role of LSTM cell state?

A
  • memory unit that carries information across different time steps,
  • allowing the network to remember relevant information from the past and utilize it when needed for making predictions or processing sequential data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How is the update of internal states in LSTM unit?

A

1) Forget gate: Forgetting old information in the cell state
2) Input gate: Deciding on new input for the cell state
3) Computing the updated cell state
4) Computing the updated hidden state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does the Forget Gate in LSTM do?

A

controls how much of the previous cell state is forgotten –> compute new cell state + hidden state

17
Q

What does the input gate in LSTM do?

A

Deciding on which info from the current inputs(inputs x + hidden state) to store in the new cell/ hidden state

18
Q

What does the output gate in LSTM do?

A

deciding on which information from the hidden state to output as the final prediction

19
Q

What is the main idea of GRU?

A

A variant of the LSTM unit, but with simpler and fewer parameters
No additional cell state!
→Memory operates only and directly via the hidden state

20
Q

How does GRU control the flow of information?

A
  • Reset gate: control how much of the previous state to remember → capture short-term dependencies
  • Update gate: control how much of the new state should replace the old one → facilitate long-term dependency learning
21
Q

In which scenarios would LSTMs be beneficial compared to GRUs?

A
  • long sequences
  • more complex tasks
  • larger datasets
22
Q

why are confounds a problem?

A
  • The network learned the correlated feature, not to identify the desired feature –> miss the true underlying relationship we are interested in!
  • Important: Not a fault in the learning algorithm, but in the data!
  • Confounding variables can distort the relationship between the variables of interest, leading to biased results.
23
Q

What is a confound in the context of ML and statistical analysis?

A

refers to an extraneous variable that correlates with both the input variables and the outcome variable being studied.

24
Q

Give an example of a confound problem and non-confound problem

A

Confound problem:
- learn correlation features due to an imbalance dataset e.g. training task of identifying a tank in an image by using all tank images recorded on cloudy days, all non-tank images on sunny
- noise e.g. speech recordings with 2 microphones in the environment containing confounds such as sensor, lighting, age/sex of the participants, temperature, …
Non-confound problem:
recognize handwritten digits from 0 to 9 (like in the MNIST dataset). Each digit is represented equally across various writing styles, orientations, and thicknesses.

25
Q

Why could adversarial examples pose a security problem?

A
  • are inputs that differ only by specifically optimized, added “noise” (perturbation)
    –> deceive machine learning models –> make incorrect predictions
    –> can be used to exploit vulnerabilities in ML systems e.g. by attacking face recognition algorithms in security systems
26
Q

What is occlusion in the context of image processing?

A

occlusion refers to the phenomenon where certain parts of an object or image are hidden or obscured by other objects or elements within the scene.

27
Q

How do we account for occlusions during training?

A

Train the model to handle dynamic occlusion
By systematically moving a mask over the input image, different regions are occluded to observe their impact on the model’s classification performance.
- the model’s prediction probabilities are monitored, generating a heatmap that highlights regions crucial for accurate classification.

28
Q

What is guided backpropagation?

A

Positive gradients = features the neuron is interested in
Negative gradients = features the neuron is NOT interested in
set all negative gradients in the BP process to 0

29
Q

What is deconvnet?

A
  • CNN
  • reverse: reconstruct the input image from the feature maps of higher layers
  • visualize what features in the input image activate specific neurons in the network.
30
Q

What is saliency map?

A
31
Q

Name measures that can only evaluate segmentation tasks

A

Pixel Accuracy / Mean Pixel Accuracy / Mean Intersection over Union / Frequency Weighted Intersection
over Union (FWIoU)

32
Q

Name some ways of simple parameter visualization in deep learning

A
  1. direct visualization of learned kernel
  2. visualize activations generated by kernels (easier to visualize effect of conv layer instead of interpret kernel weights)
  3. investigating features via occlusion
  4. via Maximally activating images
  5. t-SNE visualization of CNN codes (t-distributed Stochastic neighbor embedding)
33
Q
A