Kursusgang 10 (Time series models) Flashcards

1
Q

Why use time series model?

A

It can model dependencies in the input, useful for when inputs are no longer iid.
It is suited for use in speech, handwriting and DNA sequences for respectively; words in a sequence, pen movements and base pairs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are some prominent examples of time series models?

A

Autoregressive (AR)
Moving Average (MA)
ARMA
Recurrent Neural Networks
Transformers
Hidden Markov Models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a Markov chain?

A

The defining characteristic of a Markov Chain is the Markov Property, which states that the future state depends only on the current state and not on the sequence of states that preceded it.
It is typically represented by a set of states, s = {s_1, s_2, …, s_N}, and a transition matrix that defines the probabilities of moving from one state to another, A = {a_{ij}}, a_{ij}= P(q_{t+1} = s_j /q_t = s_i), 1 ≤ i ≤ N.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a hidden Markov model?

A

In a hidden Markov model, the states in a markov chain are hidden, meaning they cannot be observed directly.
Instead, each hidden state produces observable outputs, M, according to a probability distribution, known as the observation probability distribution, B = {b_j(k)}, where b_j(k) = P(O_t = v_k | q_t = s_j), 1 ≤ j ≤ N, 1 ≤ k ≤ M.

The initial state distribution is denoted by π = {π_i}, 1 ≤ i ≤ N.
For convenience, use the notation λ = (A, B, π).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the three basic problems of hidden Markov models?

A

Scoring:
* Given an observation sequence O = {o_1, o_2, …, o_T} and a model λ = (A, B, π), how does one compute P(O, λ), the probability of the observation sequence?
Solution: Forward-backward algorithm

Matching:
* Given an observation sequence O = {o_1, o_2, …, o_T} and a model λ = (A, B, π), how should a state sequence q = {q_1, q_2, …, q_T} be chosen to be an optimum in some sense?
Solution: Viterbi Algorithm

Training:
* Given an observation sequence O = {o_1, o_2, …, o_T} and an untrained model λ = (A, B, π), how should the model parameters be adjusted to maximize P(O | λ)?
Solution: Baum-Welch Re-estimation procedures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain the scoring problem of hidden Markov models.

A

Given an observation sequence O = {o_1, o_2, …, o_T} and a model λ = (A, B, π), how does one compute P(O, λ), the probability of the observation sequence?
A naive approach considers all possible state sequences (N^T) of length T. However, this requires N^T * 2T computations, quickly becoming too many.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the forward-backward algorithm?

A

The forward-backward algorithm is a technique used in hidden Markov models to calculate the posterior probabilities of hidden states given an observed sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Mathematically define the forward algorithm in the forward-backward algorithm.

A

The forward variable α_t(i) = P(o_1o_2…o_t, q_t = i | λ) is the probability of the partial observation sequence until time t and for state at time t being i, given the model λ.
α_t(i) can be solved inductively by
* Initialisation, for 1 ≤ i ≤ N:
α_t(i) = π_i * b_i(o_1)

  • Induction, for 1 ≤ t ≤ T-1, 1 ≤ j ≤ N:
    α_{t+1}(i) = \sum_{j=1}^N [ α_t(j) * a_{ji} ] * b_i(o_{t+1}).
  • Termination:
    P(O | λ) = \sum_{i=1}^N α_T(i)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Mathematically define the backward algorithm in the forward-backward algorithm.

A

Used for the scoring problem in hidden Markov models.
The backward variable β_t(i) = P(o_{t+1}o_{t+2}…o_T, q_t = i | λ) is the probability of the partial observation sequence from time t+1 to the end, given state i at time t and model λ. Similarly, β_t(i) can be solved inductively by
* Initialisation, for 1 ≤ i ≤ N:
β_T(i) = 1

  • Induction, for t=T-1, T-2, …, 1:
    β_t(i) = \sum_{j=1}^N a_{ij} * b_j(o_{t+1}) * β_{t+1}(j)
  • Termination:
    P(O | λ) = \sum_{i=1}^N π_i * b_i(o_1) β_1(i)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain the matching problem of hidden Markov models

A

Given an observation sequence O = {o_1, o_2, …, o_T} and a model λ = (A, B, π), how should a state sequence q = {q_1, q_2, …, q_T} be chosen to be an optimum in some sense?
In applications like speech recognition or gene prediction, the hidden states represent the underlying structure we want to infer.
One optimality criterion is to choose the states q_i that are individually most likely at each time t,

q_t^* = argmax_{1 ≤ i ≤ N} [α_t(i) * β_t(i)] / (\sum_{i=1}^N α_t(i) * β_t(i)).

However, the “optimal” state sequence may not even be a valid sequence, a_{ij} = 0 for some i and j.

Another optimality criterion is to find the single best state sequence (path), i.e., to maximize P(q , O | λ), which gives rise to the Viterbi algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does the Viterbi aIgorithm do?

A

It is related to the matching problem in hidden Markov models. It finds the most likely sequence of hidden states for a given observation sequence by maximizing the joint probability of transitions and emissions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain the training problem of hidden Markov models

A

Given an observation sequence O = {o_1, o_2, …, o_T} and an untrained model λ = (A, B, π), how should the model parameters be adjusted to maximize the likelihood of a given observation sequence, P(O | λ)?
There is no efficent algorithm for global optimisation. This problem is crucial when the hidden Markov model’s parameters are unknown and need to be learned from data.
A solution to this is the Baum-Welch re-estimation that refines the parameteres iteratively to locally maximize P(O | λ).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain the Baum-Welch re-estimation algorithm

A

It is a special case of the expectation-maximization algorithm that computes probabilities using the current model, λ, and then refines λ to λ* such that P(O | λ) is locally maximized.
It uses α and β from the forward-backward algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Can a hidden Markov model have continuous densities?

A

Yes. Replace the discrete observation probabilities, b_j(k) by a continuous PDF, b_j(x). The PDF is often represented as a mixture of Gaussians.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly