lecture 6 - predictive modeling with time series Flashcards

1
Q

stationarity

A
  • time series require a form of stationarity

stationary if
1. trends and periodic variations are removed. i.e., it has no trends. mean and variance do not change over time
2. variance of the remaining residuals are constant over time. i.e., fluctuations around the mean are uniform over time
3. lagged autocorrelation should remain constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

time series focus on:

A
  1. understanding periodicity and trends
  2. forecasting
  3. control
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

time series can be decomposed into these components

A
  1. periodic variations (daily, weekly, etc.)
  2. trend (how the mean evolves over time)
  3. irregular variations (left after we remove periodic variations and trend)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

why stationarity

A

simplifies the model building process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

lagged auto correlation

A
  • additional criterion for time series
  • represents in how far there is a correlation between a time series and a shifted version of itself (with λ time steps)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

way to get rid of trends (to reach stationarity)

A
  1. apply filter/smoothing to data
    - assume time series of values x_t with a fixed step size Δt
  2. remove a trend
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

apply filter to data

A
  1. taking q points in the future and past into account. this generates a new time series z_t
  2. choosing the a weight
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

filtering weight: triangular shape

A

this a weights points based on their distance from r

  • choose this if measurements closer to t are more important
  • measurements closer to t get more weight
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

filtering weight: moving average

A

inverse of (2q + 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

filtering weight: exponential smoothing

A

exponentially decreases the weight as you move away

  • choose this when mostly past points are important
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

filtering weight: q parameter

A

determines how many points before and after the current point are considered in the smoothing process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

removing a trend

A
  • z_t = the difference between the current and previous measurement
  • with differencing x_{t} and x_{t-1}, we remove the linear trend component, making the series stationary
  • the idea is that is there is a trend in the data, it will affect x_{t} and x_{t-1} similarly. by subtracting one from the other, the trend component is eliminated, leaving behind fluctuations that are more stationary
  • we can apply this differencing operator d times for more complex trends
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

removing a trend: if x_{t-1} does not give a good estimation of the trend

A

we can use an exponential smoothing z_t and take x_t - z_t.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

learning algorithms with time

A
  1. ARIMA
  2. NNs with time
    • RNN
  3. deep learning
    • LSTM
    • TCN
  4. echo state
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

ARIMA components

A
  1. probability distribution: assume that measurements are generated by a probability distribution P_t at each time point t
  2. expected mean mu(t) of the distribution at time t. this represents the central tendency of the time series at any point
  3. auto-covariance function (gamma(t1,t2)): measures the covariance of the time series at two different times.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

ARIMA goal

A

estimate P_t based on previous values for this distribution

  • P_t = probability distribution of measurements at time point t
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

ARIMA: when is a series stationary

A
  1. when the mean is constant
  2. when the autocovariance only depends on the time difference λ = t2 - t1
18
Q

ARIMA: W_t parameter

A
  • represents the noise we encounter
  • we can account for this noise by a moving average component (with q past values)
19
Q

ARIMA: d parameter

A

differecing with order d

  • ARIMA removes the trends with this parameter
20
Q

ARIMA: p & q parameters

A

number of steps we’re looking back

  • p: for measurement at time t
  • q: for noise
21
Q

ARIMA: finding parameter values

A
  1. p: look at correlation between x_t and x_{t-p}. i.e., a partial correlation function
  2. q & d: grid search, then determine goodness of fit
  3. other parameters: use past data to optimize weights, autoregressive component, etc.
22
Q

simple recurrent neural network

A
  • neural network with time (i.e., designed to handle sequential data)
  • include cycles that allow information to persist (memory)
  1. forward path: input to output
  2. once prediction is made, the error is propagated back into the network. this allows you to update the weights
23
Q

RNN: backpropagation through time

A
  • cycles of the simple RNN make training complex as the output at a given time step depends on previous computations
  • this is fixed by unfolding the network by creating an instance of the network for each previous time point and connecting these
  • this combined network is without cycles
24
Q

RNN: forward paths calculation

A
  • weight update rule = learning rate * error term at time t * predicted output of node j at time t
  • error term depends on if node j is an output node or hidden node
25
Q

RNN: recurrent connections calculation

A
  • the weight update rule is similar to forward connections, but includes a temporal component
  • weight update rule = learning rate * error term at time t * predicted output of node j at time t-1
  • the error term formula is similar to the hidden node case but considers the error propagation over time
26
Q

LSTM networks

A
  • have longer memory than RNNs
  • solve the vanishing gradient problem, which allows them to learn long-term dependencies more effectively
  • internal cell state C (vector of numbers) is updated based on gates (i.e., our memory)
27
Q

LSTM: internal state C & gates

A
  • C represents memory
  1. forget gate: Decides which information to discard from C (i.e., each element in the vector)
  2. addition gate: controls which information to add to the cell state

(1) candidate values C_t: Determines what new information could be added to the cell state. (i.e., how to update)
(2) update magnitude i_t - Determines how much of the new information to add.

  1. updating cell state: Combines the previous cell state C_{t-1} (scaled by the forget gate) with the new candidate values (scaled by the input gate).
    - Combines the forget gate and input gate effects to update the cell state.
  2. output gate: new prediction based on previous prediction, cell state, and current input.
    - Controls what information from the cell state to output as the hidden state.
28
Q

temporal convolutional networks (TCN)

A
  • network based on convolutions
  • outperforms LSTM networks and have better long term memory
  • sequence to sequence
29
Q

TCN: causal connections

A

Only past information is used to predict future values, ensuring the model doesn’t look ahead in time.

30
Q

TCN: sequence to sequence

A
  1. Equal Lengths: Input and output sequences are of equal length, with zero padding used if necessary.
  2. Uniform Layer Size: All layers in the network have the same size.
31
Q

TCN: features - dilations and kernel size

A
  1. kernel size k expresses how many values we are considered in a layer
  2. dilation factor d expresses how detailed we consider the timesteps to come to k values
    - Higher d: Larger gaps, capturing a broader range of history with fewer layers needed.
    - Lower d: Smaller gaps, capturing more detailed local information.
    - d exponentially increases over layers (d = O(2^i))
32
Q

TCN: exponential increase of d

A
  1. guarantees that each input is used
  2. captures a bigger and bigger history
33
Q

TCN: residual blocks

A
  • Instead of simple convolutional layers, TCNs use residual blocks, which are structures that help the network learn more effectively
  • The residual block typically includes elements like dropout (to prevent overfitting), activation functions (e.g., ReLU), and normalization steps (e.g., WeightNorm).
  • The key benefit of residual blocks is that they help address the problem of vanishing gradients
34
Q

echo state networks

A
  • ESNs simplify complexity of RNNs by having a reservoir of neurons with fixed, random connections. This reduces the number of trainable parameters.
  • reservoir can have cycles to allow for a memory component
35
Q

echo state network: weights

A
  • not all the weights are trained, but are set randomly
  1. input weights and weights in the reservoir are set randomly (and not changed during the process)
  2. weights from reservoir to output are the only ones that are being trained
36
Q

echo state networks: reservoir

A
  • if you have a big enough reservoir, you can create all kinds of variants of the signal over time as features
  • then you just need to pick up the right features in the output layer in order to decide how to predict for the next time step
37
Q

echo state networks: learning

A
  • learning a W_{out} that minimizes the difference between the actual and predicted y
  • since we only train W_{out} its not hard to train this model, given that the reservoir has enough richness
38
Q

echo state networks: output

A
  1. state of reservoir at i+1 = reservoir activation function( input weightsx_{t+1} + reservoir weights state of reservoir at i )
  2. predicted output at i+1 = output layer activation function( output weights * state of reservoir at i+1)
39
Q

echo state networks: echo state property

A
  • tells us how to create a random reservoir that has useful signals
  • meaning that the effect of a previous state (r_i) and a previous input (x_i) on a future state (r_i+k) should vanish gradually as time passes (k –> inf), and not persist or get amplified
  • The echo state property ensures that the influence of a previous state or input on future states diminishes over time. This prevents the amplification of past inputs and maintains stability in the network.
40
Q

memory capacity, worst to best:

A
  1. RNN
  2. LSTM
  3. TCN