MIDA 2 Flashcards

Question

[Ch1] How does noise affect the Hankel Matrix in the 4SID algorithm?

Answer 1

Noise adds distortions, leading to overestimated ranks and inaccurate matrix factorization. SVD helps filter out noise by retaining only dominant singular values.

Answer 2

A larger matrix provides better precision but increases computational complexity. The ideal size balances accuracy and feasibility.

Answer 3

By visually analyzing the block scheme: - Observabillity: Check if all state contributions can be traced to the output. - Controllability: Ensure input paths can influence all states.

Answer 4

- Highly sensitive to noise in the impulse response. - Inefficient use of data

Answer 5

It adapts to handle general input signals by modifying the data organization and ID Steps, but increases complexity.

Answer 6

It is used to decompose a noisy Hankel matrix into system-related singular values and noise-related singular values.

Answer 7

The number of singular values corresponds to the system order. In an ideal case, there is a sharp drop after the system-related singular values, making the order evident (Knee).

Answer 8

H = USV^T where: - U: Contains the left singular vectors related to observability. - S: A diagonal matrix with singular values, indicating the system's dynamics and noise levels. - V: Contains the right singular vectors related to controllability.

Answer 9

1.- Build Hankel Matrix: using the measured dataset (W(1), W(2) ... W(n)). 2.- Perform SVD: Identify system/noise singular values. 3.- Determine system order: Analyze the singular value curve for a "jump" or "knee" 4.- Reconstruct Clean Hankel Matrix: H_system = U_nS_nV_n^T 5.- Factorize H_system decompose it into the extended observability and controllability matrices. 6.- Estimate State-Space matrices

Answer 10

- Larger q and d improve precision but is harder to compute. - A balanced choice is recommended. q>d/2

Answer 11

A time series is a sequence of data points measured at successive equally spaced points in time. It represents data like pollutant concentrations, stock values, or sport statistics.

Answer 12

Because they focus on the output y(t) without measuring or modeling all the inputs (These can be too many, unmeasurable, or have minima influence on the output)

Answer 13

Time series are modeled using a fictitious input e(t), which is considered white noise. This input is not physically available but serves as a mathematical construct.

Answer 14

Modeling as a stationary stochastic process integrates practical data into a theoretical framework, enabling predictions and analysis of y(t) based on consistent statistical properties.

Answer 15

- Mean value (m_y) is constant over time. - Covariance Function (gamma_y (Tao)) Depends only on the time difference Tao. - Spectrum (S_y(omega)) is the frequency domain representation of the covariance via a Fourier Transform.

Answer 16

- Time: White noise is completely uncorrelated, making it unpredictable. - Frequency: It has a flat spectrum, meaning energy is evenly distributed across all frequencies.

Answer 17

It's an Auto Regressive Moving Average model. Containing: - AR : a linear combination of past outputs. - MA : a linear combination of current and past inputs (white noise). y(t) = Sum_i=1^N a_i y(t-1) + Sum_j=0^M c_j e(t-j)

Answer 18

- ARMA (0, M): Moving Average model. - ARMA (N, 0): Auto regressive model

Answer 19

Stationarity is ensured if all poles of the transfer funtion are inside the unit circle.

Answer 20

1.- C(z) and A(z) have the same degree. 2.- Both are monic (highest degree term = 1) 3.- They are co-prime (no common factors). 4.- All roots of C(z) and A(z) lie strictly inside the unit circle. EX: y(t) = (z + 1/2) / (z - 1/3) e(t)

Answer 21

It minimizes the model order, ensuring no redundancy while maintaining equivalence with the original representation.

Answer 22

To estimate future values y(N + k | N) using data up to the present time N, and k being the prediction horizon.

Answer 23

A prediction is optimal if the prediction error is uncorrelated with the predictor, meaning no additional information can improve the prediction.

Answer 24

- Predictor from noise: uses white noise e(t) but is impractical since e(t) is un-measurable. - Predictor from data: Uses past outputs of y(t) and is practical for real-world applications.

Answer 25

1. Rewrite y(t) in terms of white noise e(t) using canonical representation. 2. Perform Polynomial division of C(z) / A(z) to separate predictable and unpredictable components. 3. Use predictable past data for the prediction.

Answer 26

- Variance increases with the horizon (k) since predicting farther into the future introduces more uncertainty. - for k ~ ∞, predictions converge to the process mean.

Answer 27

An all-pass filter has the form: T(z) = 1/a (z+a)/(z+1/a) where |a| < 1. It preserves the spectral characteristics of the signal but introduces a phase shift

Answer 28

They simplify representations by removing unstable zeros while maintaining spectral equivalence.

Answer 29

ESR is a normalized metric that compares prediction error variance to the signal variance: ESR(k) = (var[y(t) - y(t+k|t)]) / var[y(t)] It evaluates the prediction quality. 0 = perfect, 1 = trivial (predicting the mean)

Answer 30

White noise is highly unpredictable due to its flat spectrum and lack of correlation, making optimal prediction inherently less accurate

Answer 31

In the absence of a theoretical model of the system, a given model is needed to make predictions, as it provides mathematical basis to approximate the system's behavior using the limited dataset

Answer 32

- AR process: prediction depends only on past outputs y(t-1), y(t-2), ... making computations easiera - MA process: Predictions depend on both past outputs and unmeasured past white noise e(t), requiring assumptions about initial conditions

Answer 33

All poles and zeros must lie strictly inside the unit circle, ensuring stability and minimal representation.

Answer 34

It is impossible when the poles or zeros lie on or outside the unit circle, as reciprocal transformations fail to satisfy stability and minimality requirements.

Answer 35

ARIMA models include Integration to handle non-stationary processes, represented by poles at z=1

Answer 36

It models cumulative effects over time, with the time-domain behavior y(t) = y(t-1) + e(t), representing unpredictable, non-stationary patterns.

Answer 37

Using the Prediction Error Method: J_N(theta) = 1/N SUM_t=1^N [ y(t) - phi^T(t) theta^2] where: phi(t) is the vector of past outputs theta contains the model parameters

Answer 38

The predictor is linear with respect to parameters, making the performance index quadratic and solvable using explicit least-squares solutions.

Answer 39

The presence of the moving average part introduces non-linearity, making the performance index non-quadratic and therefore requiring iterative methods to find optimal parameters

Answer 40

By testing multiple models with different orders and choosing the one with the minimum prediction error while balancing simplicity and generalization using cross-validation?

Answer 41

ARMAX adds an exogenous input component (u(t)) to the ARMA framework enabling modeling of input-output systems, which is very useful for control applications.

Answer 42

- When measurable inputs significantly influence the output. - For control applications requiring input-output relationships

Answer 43

- Simplicity and fewer variables to measure. - Suitable when only past output values are available and no dominant input exists.

Answer 44

By splitting the dataset into training and validation sets, models are evaluated on unseen data to ensure generalization, avoiding overfitting or underfitting.

Answer 45

The model with the lowest prediction error variance on the validation set is considered optimal

Answer 46

To confirm that the residual error contains no predictable patterns, indicating the model has captured all significant dynamics of the system.

Answer 47

- The assumptions were incorrect (linearity) - Some important dynamics are missing (Nonlinearities or higher-order terms)

Answer 48

y(t) = B(z) / A(z) z^(-k) u(t) + C(z) / A(z) e(t)

Answer 49

1.- Collect and preprocess the dataset 2.- Choose the ARMAX model structure (n, m, p, k) based on prior knowledge or starting with the simplest and increasing complexity. 3.- Derive predictor using available data using y(t|t-k) = B(z)/A(z) z^(-k) u(t) + R(z)/A(z) y(t-k) R being the remainder after C/A 4.-Use PEM to define performance index. 5.- Minimize PEM J_N to estimate model parameters. 6.- Check residuals and/or use cross-validation to validate the model. 7.- Perform the forecast

Answer 50

Kalman Filter is an algorithm based on state-space representation, used for state estimation and software sensing in control and modeling applications. It is model-based and originates from classical modeling and control theory.

Answer 51

It assumes a state-space model of the system with linear, time-invariant dynamics and incorporates two types of noise, state noise v₁(t) and output noise v₂(t)

Answer 52

1. k-steps ahead prediction of Output y(t+k|t), predicting future outputs 2. k- steps ahead prediction of States x(t+k|t), predicting future state variables 3. Filtering of the states x(t|t), estimating current state variables based on present data

Answer 53

It enables state estimation when there are fewer sensors than states. Critical for: - Control design - Monitoring

Answer 54

State equation: x(t +1|t) = Fx(t|t-1) + K(t) e(t) Output Equation: y(t|t-1) = Hx(t|t-1)

Answer 55

State Block: FP(t)F^T + V₁ Output Block: HP(t)H^T + V₂ Mix Block: FP(t)H^T + V₁₂

Answer 56

K(t) = Mix_Block * (Output_Block)^-1 DRE: P(t+1) = State_Block - (Mix_Block) * (Output_Block)^-1 * (Mix_Block)^T

Answer 57

1. State noise (v₁(t)): - Mean: E[v₁(t)] = 0 - Covariance: E[v₁(t) v₁(t)^T] = V₁ (positive semi-definnite). 2. The same is true for the the Output noise, but the Covariance is Positive Definite

Answer 58

1.- Multi-step Predictor. 2.- Filter Form. 3.- Inclusion of Exogenous Inputs. 4.- Time-Varying Systems. 5.- Nonlinear Systems -> Extended Kalman Filter (EKF)

Answer 59

The ARE provides a steady-state solution for "infinite-horizon" problems, and is NOT time varying. (uses P_bar) DRE is a time-varying solution for finite-horizon problems. (Uses P(t))

Answer 60

Starting from the 1-step predictor, propagate future states by repeatedly multiplying by F: x(t+k|t) = F^(k-1) x(t+1|t) y(t+k | t)=H x(t+k |t)

Answer 61

The asymptotic KF uses a steady-state gain K when P(t) converges to a constant P. It asimplifies computations and ensures stability in LTI systems?

Answer 62

1: - V₁₂=0 - All Eig (F) inside Unit Circle 2: - V₁₂=0 - (F, H) is Fully Observable. - (F, Γ) is Fully Controllable Where Γ * Γ^T = V¹

Answer 63

1.- Linearizes the system around the current state at each time step using jacobians: F(t) = ∂f(x(t), u(t))/∂x , H(t) = ∂h(x(t))/∂x 2.- Applies KF equations with these locally linearized matrices. 3.- Recomputes F(t) and H(t) at each time step.

Answer 64

- Lack of guaranteed stability. - Computationally demanding due to repeated linearization. - Sensitive to model inaccuracies.

Answer 65

It estimates unmeasurable states, reduces costs by minimizing the need for physical sensors, and provides redundancy in safety-critical systems.

Answer 66

Estimating the vertical speed of a sea in an off-road vehicle (tractor). Uses a white-box model to describe the seat dynamics, incorporate accelerometer data, and apply KF for software sensing to estimate unmeasurable speed.

Answer 67

- Accurately modeling noise covariances V₁ and V₂ - Ensuring numerical stability of the Riccati Equations. - Dealing with computational demands, especially in high-dimensional spaces.

Answer 68

By embedding noise dynamics into the state-space model using state extension, transforming non-white noise into a compatible form.

Answer 69

- White-Box relies on a predefined model of the system and does not require a training set. - Black-Box uses system identification techniques to estimate models and requires a training dataset where states must be physically measured.

Answer 70

The training dataset is required to estimate the system states during the training phase. These measurements are replaced by the estimation algorithm during the production phase.

Answer 71

For LTI systems, software sensing involves estimating transfer functions S_ux(z, θ) and S_yx(z, θ) using parametric system identification techniques. Where S_ux and S_yx are the Transfer Functions used to transform the input, and output information into the States information.

Answer 72

1.- Data collection: Gather input, output and state measurements in the training phase. 2.- System Identification: Estimate the parametric T.F. S_ux and S_yx 3.- Deployment: Use the identifies model to estimate states in real-time, replacing physical sensors.

Answer 73

These would require more complex architectures such as: - Recurrent Neural Networks - FIR-Based Nonlinear Architectures. - Recursive Architectures.

Answer 74

- Recurrent Neural Networks: Handles nonlinear dynamics but has issues with stability and training complexity. - FIR-Based: Ensures stability by design but may become computationally intensive in high-dimensional systems. - Recursive (IIR): Reduces input dimensionality but risks instability during production.

Answer 75

It ensures stability due to its finite impulse response scheme and simplifies training by focusing only on the non-linear static part.

Answer 76

Advantages: - Do not require a predefined model of the system. - Can achieve higher accuracy when high-quality dataset is available. Disadvantages: - Cannot estimate completely unmeasurable variables. - Requires a training dataset and supervised learning. - Lacks interpretability of results

Answer 77

Gray-Box combines white and black-box modeling approaches. It uses physical equations (White-Box) with some unknown parameters estimated from data (Black-Box), making it suitable for scenarios with partial physical insights.

Answer 78

The K.F. estimates both system states and unknown parameters by treating the parameters as extended states in the system.

Answer 79

Unknown parameters are added as state variables with fictitious dynamics: θ(t+1)=θ(t)+v_θ(t) where: v_θ(t) is a fictitious noise, making the KF able to adjust parameters.

Answer 80

Fictitious noise ensures that the K.F. does not over-rely on initial conditions and adjusts parameters dynamically to fit the data.

Answer 81

1.- Start with physical model: mx_dd + cx_d +kx = F(t) 2.- Discretize using a method like Euler Forward. 3.- Extend the state to include c as a fictitious variable. 4.- Apply the K.F. to estimate both x(t) and c.

Answer 82

Non-linearities arise from interactions between state variables and unknown parameters, requiring methods like the EKF for estimation.

Answer 83

SEM minimizes the simulation error between measured outputs and simulated outputs by optimizing parameters in a system model. It does not predict states but evaluates a performance index offline (Not like K.F)

Answer 84

SEM is preferred for systems with a small number of constant parameters, especially in offline identification scenarios.

Answer 85

- Small v_θ: Slow convergance but low variance in parameter estimates. - Big v_θ: Fast convergance but high variance in estimates. The choice depends on the application's precision and speed requirements.

Answer 86

The ratio of measured variables (Sensors) to estimated parameters must be high enough to ensure reliable estimation. Ex: 3 sensors, 2 parameters, and 5 states is a GOOD system. 3 sensors, 15 parameters and 10 states is a BAD system

Answer 87

If det([A]) != 0, it means that the matrix is full rank, so the rank is the same value as the size of the matrix (if the matrix is 3x3, the rank = 3) If det(A) = 0, it means that the matrix is Not full rank, so the rank is < size of the matrix.

Answer 88

Given matrix A = [a b; c d] the inverse is A^-1 = det(A) * [d -b; -c a]

Answer 89

PSD=GAMMA = |H(e^jw)|^2 * var(WN)

Answer 90

var[y(t)] = 1/(1-a^2) * lambda^2 where a is the value in a y(t) = a*y(t-1)+e(t) and lambda^2 is of the input white noise.

Answer 91

For y(t) = [C0 + C1 z^(-1) + C2 z^(-2) .... C_m z^(-m)] e(t) var[y(t)] = SUM_j=0^m C_j^2 * Lambda^2

Answer 92

e^jw+e^-jw = 2 cos(w)

Answer 93

GAMMA = W(z=e^jw) * W(z=e^-jw) * Lambda_n Where n is the input white-noise of the Transfer Function W

Answer 94

J_N = 1/N SUM_t=0^N-1 E(t+1|t)^2 Where N is the number of samples and E is the Prediction Error Also: J = var[ E(t+1

Answer 95

1.- Given u(t) and y(t) compute an estimate of the system Frequency Response. 2.- Find the model parameters using fitting of W^hat(ω) min_θ SUM_ω |W^hat(ω) - W(ω, θ)|

Answer 96

In the first step. The non-parametric description of the Frequency Response needs to be estimated with different tools.

Answer 97

In general we need to use "Longer" signals, to ensure transients due to unknown init. cond. have disappeared. - Single-Tone Sinewave u(t) = A cos(k w_o t) with w_o being the frequency resolution = 2pi/T and k=1,..., pi/w_o - Multi-Tone Sinewave: SUM_k A cos (k w_o t + Φ _k) with Φ is a Random Phase between 0 and 2pi. - Pseudo-Random Binary Signal (PRBS): Uses XOR operator and signal delays. Change of State is Random -Gaussian Noise: u(t) ~ WN(m, l^2) . Only Non-periodic one.

Answer 98

- It is much less time consuming (It allows testing many frequencies in a single input) - Within 1 period the signal is asymptotically Gaussian (Non-predictable), after the 1st period it can be predicted.

Answer 99

y(ω) = W(ω) U(ω)

Answer 100

Using the Discrete Fourier Transform (DFT) (NOT the Discrete Time Fourier Transform DTFT).

Answer 101

DTFT uses a sumation from - infinity to + infinity and takes into account a continuous variable, which is not Useful for practical applications. DFT takes into account a SUM up to the number of available samples.

Answer 102

V_D(ω) = SUM^N-1₀ v(t) e^-Jω_Dt Where: ω_D = (N-1)/2 * ω_o , ... , (N-1)/2 * ω_o and ω_o = 2pi/T

Answer 103

V_D(ω) = A/2 * N for ω = +- ω_o and V_D(ω) = 0 for ω != +- ω_o When T=N The DFT is very simillar to the DTFT results.

Answer 104

It's a function representing a very short and intense input to a system as the mathematical idealization of a perfect impulse.

Answer 105

The result is not defined for ω = ω_o There is a lot of Leakage

Answer 106

The spectral content of a frequency ω_o appears also on other frequencies. The information is distributed, so the result is not correct.

Answer 107

W^hat (ω) = Y_D(ω) / U_D(ω) Also called Empirical TF Estimation where the sub D represents solutions using DFT.

Answer 108

Windowing is multiplying a signal by a smooth Window Function w. What this does is reduce the side lobes in the frequency domain, reducing leakage, and emphasizes the main lobe. It smoothens transitions at the edges reducing high-frequency components, it makes it so that frequencies close to each other are less likely to "spill" into one another, and maintains the main frequency components.

Answer 109

1.- Properly select the experiment duration (N = T). Not always possible. 2.- Properly selecting the Window shape. (Windowing)

Answer 110

It's called the Hanning Window, it has a large peak in the middle and it goes to zero on either side (looks like a gaussian bell). It's not perfect, but it helps a lot to reduce leakage.

Answer 111

Because the sampling of the signal is being done on frequencies which are not the correct ones, we don't know the correct frequencies, therefore it is essentially impossible to choose the correct value for the N

Answer 112

In general, we need to perform multiple experiments (or a single very long one) with the input being the same in each one, which allow us to average the results of Y and U, and therefore estimate the means. Y^bar = Expected[Y(ω)] U^bar = Expected[U(ω)] W^bar_hat = Y^bar / U^bar

Answer 113

Because the noise to signal ratio at high frequency is bigger (the response of the signal for high frequencies is smaller than for low frequencies)

Answer 114

Using the Power Spectral Density of the Input, and the Cross-PSD of the output. W^hat(ω) = Γ_yu (ω) / Γ_uu (ω) Γ_yu = Y(ω) U^*(ω) = W(ω) U(ω) U^*(ω) + E(ω) U^*(ω) = W(ω) Γ_uu(ω) + 0

Answer 115

x^hat(t+1|t) = (F - KH) x^hat(t|t-1) + K y(t) Where K is found using the usual Formula K = Mix_Block * Out_Block ^-1

Answer 116

1.- y(t) = B(z)/A(z) u(t-k) + C(z)/A(z) e(t) with e(t) ~ WN(0,lambda^2) 2.- Compute C(z)/A(z) = R(z)/A(z) + E(z) and substitute in #1: y(t) = B(z)/A(z) u(t-k) + R(z)/A(z) e(t) + E(z) e(t) Remember E(z)e(t) can be neglected 3.- From #1 solve for e(t) and substitute in #2 to get: y^hat(t|t-k) 4.- From the computation of C/A find substitute for C(z) - R(z) 5.- Simplify to get: y^hat(t|t-k) = B(z)E(z)/C(z) u(t-k) + R(z)/C(z) y(t)

Answer 117

H(z) = 1/a (z+a)/(z+1/a)

Answer 118

Prior: State that H = O . R 1.- From the Hankel Matrix, build the extended O (N+1 x N) and R (N x N+1) matrices. 2.- From the Extended Matrices build O1 and O2 (as well as R1 and R2), each of size (N x N) with a shift between them. 3.- Having O1 and O2, build F_hat = O1^-1 . O2