lecture 6 - predictive modeling with time series Flashcards
stationarity
- time series require a form of stationarity
stationary if
1. trends and periodic variations are removed. i.e., it has no trends. mean and variance do not change over time
2. variance of the remaining residuals are constant over time. i.e., fluctuations around the mean are uniform over time
3. lagged autocorrelation should remain constant
time series focus on:
- understanding periodicity and trends
- forecasting
- control
time series can be decomposed into these components
- periodic variations (daily, weekly, etc.)
- trend (how the mean evolves over time)
- irregular variations (left after we remove periodic variations and trend)
why stationarity
simplifies the model building process
lagged auto correlation
- additional criterion for time series
- represents in how far there is a correlation between a time series and a shifted version of itself (with λ time steps)
way to get rid of trends (to reach stationarity)
- apply filter/smoothing to data
- assume time series of values x_t with a fixed step size Δt - remove a trend
apply filter to data
- taking q points in the future and past into account. this generates a new time series z_t
- choosing the a weight
filtering weight: triangular shape
this a weights points based on their distance from r
- choose this if measurements closer to t are more important
- measurements closer to t get more weight
filtering weight: moving average
inverse of (2q + 1)
filtering weight: exponential smoothing
exponentially decreases the weight as you move away
- choose this when mostly past points are important
filtering weight: q parameter
determines how many points before and after the current point are considered in the smoothing process
removing a trend
- z_t = the difference between the current and previous measurement
- with differencing x_{t} and x_{t-1}, we remove the linear trend component, making the series stationary
- the idea is that is there is a trend in the data, it will affect x_{t} and x_{t-1} similarly. by subtracting one from the other, the trend component is eliminated, leaving behind fluctuations that are more stationary
- we can apply this differencing operator d times for more complex trends
removing a trend: if x_{t-1} does not give a good estimation of the trend
we can use an exponential smoothing z_t and take x_t - z_t.
learning algorithms with time
- ARIMA
- NNs with time
- RNN
- deep learning
- LSTM
- TCN
- echo state
ARIMA components
- probability distribution: assume that measurements are generated by a probability distribution P_t at each time point t
- expected mean mu(t) of the distribution at time t. this represents the central tendency of the time series at any point
- auto-covariance function (gamma(t1,t2)): measures the covariance of the time series at two different times.
ARIMA goal
estimate P_t based on previous values for this distribution
- P_t = probability distribution of measurements at time point t
ARIMA: when is a series stationary
- when the mean is constant
- when the autocovariance only depends on the time difference λ = t2 - t1
ARIMA: W_t parameter
- represents the noise we encounter
- we can account for this noise by a moving average component (with q past values)
ARIMA: d parameter
differecing with order d
- ARIMA removes the trends with this parameter
ARIMA: p & q parameters
number of steps we’re looking back
- p: for measurement at time t
- q: for noise
ARIMA: finding parameter values
- p: look at correlation between x_t and x_{t-p}. i.e., a partial correlation function
- q & d: grid search, then determine goodness of fit
- other parameters: use past data to optimize weights, autoregressive component, etc.
simple recurrent neural network
- neural network with time (i.e., designed to handle sequential data)
- include cycles that allow information to persist (memory)
- forward path: input to output
- once prediction is made, the error is propagated back into the network. this allows you to update the weights
RNN: backpropagation through time
- cycles of the simple RNN make training complex as the output at a given time step depends on previous computations
- this is fixed by unfolding the network by creating an instance of the network for each previous time point and connecting these
- this combined network is without cycles
RNN: forward paths calculation
- weight update rule = learning rate * error term at time t * predicted output of node j at time t
- error term depends on if node j is an output node or hidden node