lecture 6 - predictive modeling with time series Flashcards

Question

RNN: recurrent connections calculation

Answer 1

- the weight update rule is similar to forward connections, but includes a temporal component - weight update rule = learning rate * error term at time t * predicted output of node j at time **t-1** - the error term formula is similar to the hidden node case but considers the error propagation over time

Answer 2

- have longer memory than RNNs - **solve the vanishing gradient problem**, which allows them to learn long-term dependencies more effectively - **internal cell state C** (vector of numbers) is updated based on gates (i.e., our memory)

Answer 3

- C represents memory 1. **forget gate**: Decides which information to discard from C (i.e., each element in the vector) 2. **addition gate**: controls which information to add to the cell state (1) candidate values C_t: Determines what new information could be added to the cell state. (i.e., how to update) (2) update magnitude i_t - Determines how much of the new information to add. 3. **updating cell state**: Combines the previous cell state C_{t-1} (scaled by the forget gate) with the new candidate values (scaled by the input gate). - Combines the forget gate and input gate effects to update the cell state. 4. **output gate**: new prediction based on previous prediction, cell state, and current input. - Controls what information from the cell state to output as the hidden state.

Answer 4

- network based on convolutions - outperforms LSTM networks and have better long term memory - sequence to sequence

Answer 5

Only past information is used to predict future values, ensuring the model doesn't look ahead in time.

Answer 6

1. **Equal Lengths**: Input and output sequences are of equal length, with zero padding used if necessary. 2. **Uniform Layer Size**: All layers in the network have the same size.

Answer 7

1. kernel size **k** expresses how many values we are considered in a layer 2. dilation factor **d** expresses how detailed we consider the timesteps to come to k values - Higher d: Larger gaps, capturing a broader range of history with fewer layers needed. - Lower d: Smaller gaps, capturing more detailed local information. - d exponentially increases over layers (d = O(2^i))

Answer 8

1. guarantees that each input is used 2. captures a bigger and bigger history

Answer 9

- Instead of simple convolutional layers, TCNs use residual blocks, which are structures that help the network learn more effectively - The residual block typically includes elements like dropout (to prevent overfitting), activation functions (e.g., ReLU), and normalization steps (e.g., WeightNorm). - The key benefit of residual blocks is that they help address the problem of vanishing gradients

Answer 10

- ESNs simplify complexity of RNNs by having a **reservoir** of neurons with fixed, random connections. This reduces the number of trainable parameters. - reservoir can have cycles to allow for a memory component

Answer 11

- not all the weights are trained, but are set randomly 1. input weights and weights in the reservoir are set randomly (and not changed during the process) 2. weights from reservoir to output are the only ones that are being trained

Answer 12

- if you have a **big enough reservoir**, you can create all kinds of variants of the **signal over time as features** - then you just need to pick up the right features in the output layer in order to decide how to predict for the next time step

Answer 13

- learning a **W_{out}** that minimizes the difference between the actual and predicted y - since we only train W_{out} its not hard to train this model, given that the reservoir has enough richness

Answer 14

1. state of reservoir at i+1 = reservoir activation function( input weights*x_{t+1} + reservoir weights* state of reservoir at i ) 2. predicted output at i+1 = output layer activation function( output weights * state of reservoir at i+1)

Answer 15

- tells us how to create a **random reservoir that has useful signals** - meaning that the effect of a previous state (r_i) and a previous input (x_i) on a future state (r_i+k) should vanish gradually as time passes (k --> inf), and not persist or get amplified - **The echo state property ensures that the influence of a previous state or input on future states diminishes over time**. This prevents the amplification of past inputs and maintains stability in the network.

Answer 16

1. RNN 2. LSTM 3. TCN

lecture 6 - predictive modeling with time series Flashcards

(40 cards)