Statistical modelling in Space and Time Flashcards
What variables is a Gaussian process dependent on/defined by?
A mean function
A covariance function (as a function of the distance between two points at x1 and x2)
Do Gaussian processes model space or time?
Spatial fields
What is a strictly stationary process? (stochastic processes)
A strictly stationary process has the same statistical properties everywhere.
Denote the process by Y. Then the distribution of Y(x) is the same as for Y(x + h).
Define a deterministic function. What is the difference between a deterministic and a stochastic process?
A function is considered deterministic if it always returns the same result set when it’s called with the same set of input values.
In deterministic models, the output of the model is fully determined by the parameter values and the initial conditions. Stochastic models possess some inherent randomness. The same set of parameter values and initial conditions will lead to an ensemble of different outputs.
Define a weakly stationary process
For a weakly stationary process the first (mean) and second order moments are the same everywhere, and the covariance simply depends on distance.
This means the process has the same mean at all time points, and that the covariance between the values at any two points, x and x+h, depend only on h, the distance between the two points, and not on the location of the points in the region.
(w.l.o.g assume mean is 0)
( Cov(x1, x2) = Cov(x1 + h, x2 + h) )
What is another term used for Weak Stationarity?
second order stationarity
Does Strict stationarity imply weak stationarity?
Yes
In general the converse does not apply but it does for Gaussian processes; and only for Gaussian processes which are the only stochastic processes defined solely by their first and second moments.
Define a Gaussian Process.
A Gaussian process is an infinite dimensional (continuous) stochastic function/process all of whose marginal, conditional and joint distributions are Gaussian.
What are the first and second order moments of a (stochastic) process?
The first moment of xᵢ is the expected value/mean
E[xᵢ]
The second moment of xᵢ is the expected value of xᵢ²
E[xᵢ²]
What is intrinsic stationarity?
Assume a constant mean process. Then
E[Y(x + h) − Y(x)]² = Var[Y(x − h) − Y(x)] = 2γ(h)
If this only depends on h then the process is said to have intrinsic stationarity
What property does the weakly stationary process’s covariance function have? Proof.
Covariance function has to be positive definite.
Weakly stationary, therefore the second moment of xᵢ is finite for all t;
i.e. ∀t, E[xᵢ²] < ∞
Which also implies E[(xᵢ-𝜇)²] = Var(xᵢ) < ∞; i.e. that variance is finite for all t)
(CARD NOT FINISHED)
The Wiener-Khinchin (or Khinchine) theorem is a special case of which theorem for time series?
Bochner’s Theorem
Define an isotropic process
If the covariance depends only on distance (not direction) the process is isotropic.
For an isotropic process the covariance function is univariate (involving one variable quantity).
Separable process
In 2D, the correlation structure in the x direction does not change with y (and vice versa).
This holds for multivariate extensions …
Difference between a Gaussian distribution and Gaussian Process
Gaussian distribution is defined by its mean and variance, whereas a Gaussian process is defined by a mean function and a covariance function (positive definite).
Weierstrass Theorem
By increasing the order of a polynomial we can fit any smooth function to arbitrary precision
Gaussian process with Matérn 3/2 covariance possesses how many derivatives?
One
Via Bochner’s theorem the associated spectral density of a Matérn covariance function is the pdf of what distribution?
t-distribution
Define a nugget
an independent (iid normally distributed) error added to each data point
Three reasons why you would add a nugget?
1) Instrumental error (often small) - data isn’t entirely smooth because of instrumental error
2) Small scale variation that we don’t want to model by the Gaussian Process - concerned with the larger scale stuff
3) Sometimes we add nuggets for numerical reasons to prevent the covariance matrix being too smooth - guarantees the matrix will be positive definite and numerically stable.
Name the layers of the data when considering it in a hierarchical way (model).
Data layer
Process layer
Parameter layer
What layer is missing from the Empirical Hierarchical model and why?
Parameter layer
The parameters θ are fixed numbers
Bayesian Hierarchical model steps
Put a prior distribution on the θ (length scale)
And use conditional distributions to find the distribution of Z (the data).
Bayes theorem then allows to ‘reverse’ the hierarchy.
Another name for variogram
Semi-variogram
Structure function
Equation for the variogram
γ(h) = σ² − C(h)
Given a covariance function we can calculate the corresponding variogram
Assume the process is ergodic, what can consequently be concluded about the covariance function, and hence the variogram?
C(h) → 0 as h → ∞
lim h→∞ [γ(h)] = σ²
And we have
C(h) = σ² − γ(h) = lim k→∞ [γ(k)] − γ(h)
Define kriging
Kriging is using the variogram to interpolate between data
i.e. using a variogram instead of a covariance function
Define simple kriging
In simple kriging the mean is assumed to be constant and known (i.e. zero)
Define Ordinary Kriging
The mean is estimated as well as the parameters in the variogram
Define Universal Kriging
The mean is a function of some covariates, usually but not exclusively, spatial co-ordinates.
Different weights to use when fitting the theoretical variogram to the sample variogram by least squares?
- number of pairs in each bin - more data so trust more thus larger weighting than elsewhere
- the theoretical variogram - where variogram is higher, trust it more(?)
- equal weights
Outline the steps in the Method of Moments to fit a variogram?
Calculate the sample variogram
Choose a shape for the variogram
Fit that variogram to the sample variogram by weighted least squares
What is Hawkin’s and Cressie used for?
Hawkins and Cressie is an alternative to estimating the sample variogram.
Name some loss functions used to get a point estimate from the posterior distribution.
squared loss = mean of the posterior;
absolute loss = median;
(0, 1) loss = mode (also known as maximum a posteriori (MAP) estimates)
Name different types of priors
Subjective Bayes Objective Bayes Conjugate Priors Non-Informative Priors Informative Priors (MCMC methods)
2 varieties of MCMC methods
Gibbs Sampler
Metropolis-Hastings
Describe a conjugate prior
Conjugate prior is one such that the formula for the posterior and the prior are the same
What type of prior is mostly used in Bayesian Inference?
Improper priors / Non-informative priors
Different methods of obtaining a prior distribution of the length scale (delta) in Bayesian Inference.
Maximise the posterior (MAP)
MCMC
Approximate the posterior and sample from that
Discretise the prior on δ
Discretising the prior on δ uses what method?
Monte Carlo
Different validation methods for fitting a GP
Leave one out
Leave N out
If a completely independent data set, hold some back and use to check (Indivudual Prediction Errors)
For Leave one out method, how many points would you expect to see outside +/- 2 standard deviations?
1 in 20
Name some of the ‘inputs’ for a simulator/computer model?
Parameters
ICs
BCs
What are reasons for the uncertainty in simulators/computer models?
STRUCTURAL UNCERTAINTY;
- Uncertainty in the underlying science (Don’t know the world perfectly, hence equations not perfect)
- Uncertainty in the solution of the equations (the discretisation adds additional uncertainty etc)
UNCERTAINTY IN THE INPUTS
Define an emulator
A Gaussian process to model the simulator output as a function of its input
What is Sensitivity Analysis? (Emulators)
How sensitive is the simulator output to a change in an input (or combination of inputs)
What is Uncertainty Analysis? (Emulators)
If we are uncertain about the simulator inputs what does that say about our uncertainty on the outputs
Name two designs for a set of simulator runs to span input space
Optimised Latin Hypercubes
and
quasi-Monte Carlo sequences(Sobol)
Two ways of assessing how well spread a Latin Hypercube is?
Maximin - maximise the minimum distance between points Orthogonal designs (e.g. good coverage on x1 + x2 )
Idea behind discrepancy (theory) in space?
Distributing points in some space such that they are evenly distributed with respect to some (mostly geometrically defined) subsets. The discrepancy (irregularity) measures how far a given distribution deviates from an ideal one.
Steps in Building an Emulator
Specify the Gaussian process model (mean and covariance function)
Select the prior distributions for the GP hyperparameters
Choose a design for training and validation
Run the ensemble of model runs
Fit the emulator to the simulator runs
Validate and re-fit if needed
Why can’t time series data be modelled like spatial data?
Time has one direction
Time series is considered as discrete as it is often collected at regular intervals
(Trends more common in time series - need to consider more)
Time series are used to extrapolate, whereas spatial data is normally used for interpolation
2 ways of dealing with time series seasonality?
Seasonal anomalies
Seasonal differences
How to describe data that has second order stationarity (weak) and Gaussian?
Strictly Stationary
Strict stationarity implies weak/second-order stationarity and the converse is true for Gaussian processes only.
Which type of stationarity is not (equivalently) defined in time series data?
Intrinsic stationarity
Can time series give negative correlations?
Yes
Assumption on the variance for Auto correlation functions?
Variance = 1
Name of the time series equivalent of the Bochner Theorem
The Wiener-Khintchine Theorem
The Wiener-Khintchine Theorem
The Fourier transform of a valid covariance (correlation) function is a density function
(and vice versa)
Allows you to talk about time series in a Fourier Space
Relationship of spectrum and ACF?
The fourier transform of the ACF is called the Spectral Density Function (spectrum)
NB the fourier transform of the spectrum is also the ACF
Name a condition for calculating the spectrum
mean = 0
How to calculate the Cross Spectrum?
Take the fourier transform of the cross-covariance between 2 time series x_t and y_t
Describe the form of the Cross Spectrum
Complex number S_xy = c_xy − iq_xy c_xy is the co-spectrum q_xy is the quad-spectrum S_yx is the complex conjugate of S_xy
Define coherency between two time series x and y
Coherency = correlation between the x and y at a particular frequency
How to represent the Cross spectrum in terms of Amplitude and Phase
Can because of complex number
Amplitude: sqrt[(c_xy)^2 + (q_xy)^2]
Phase: arctan(c_xy/q_xy)
Is kriging model-free?
Yes
Is spectral analysis non-parametric?
Yes
It is model-free
What type of equations are the MA and AR processes?
Difference equations
What is ε_t representing in the MA and AR processes?
εt is white noise.
εt is i.i.d (independently and identically distributed) from a normal distribution with mean zero and with variance σ^2_w
How to find the auto-correlation function, ρ(h), of a MA or AR process?
Auto-covariance function / variance of the process
h is the lag
ρ(h=0) = 1
MA(1) process
x_t = ε_t + β1*ε_{t-1} x_t = (1 + β1*B)ε_t
AR(1) process
x_t = α*x_{t-1} + ε_t
Condition for stationary AR(1) process
|α| < 1 ~ stationary process
α = 1 ~ random walk
|α| > 1 ~ explosive process
Relationship between AR and MA models
An AR(q) model is actually an MA model with infinite order.
An MA model is an infinite order AR model.
(Can be shown for q=1 and for any q i.e. all AR processes)
Another term for white noise variance
Innovation variance
Contextually describe the PACF?
PACF: what correlation do I have in the next lag, that isn’t explained by all the previous lags
How to write an ARMA process in its causal and invertible form?
Causal form, define xt in terms of εt (put everything on the MA side of the ARMA model)
Invertible form, define εt in terms of xt (put everything of the AR side of the ARMA model)
When is an ARMA process causal?
Let θ(z) and Φ(z) be the AR and MA polynomials with B (the backward shift operator) replaced with a complex number z.
An ARMA process is causal iff
Φ(z) =/= 0 for |z| <= 1
When is an ARMA process invertible?
Let θ(z) and Φ(z) be the AR and MA polynomials with B (the backward shift operator) replaced with a complex number z.
An ARMA process is invertible iff
θ(z) is =/= 0 for |z| <= 1
Are MA processes unique?
No
Are AR(p) models linear or non-linear?
Linear
How to choose order of ARMA model?
Best AIC or BIC values
Akaike Information Critera (AIC)
Bayesian Information Critera (BIC)
What time series model can be used to capture seasonality?
ARIMA
DLM
Indication of a good fitting time series model? (AR, MA, ARMA, ARIMA)
Residuals should be normally distributed
The residuals should be uncorrelated (can tell by ACF, or spectrum)
How to forecast using time series models? (AR, MA, ARMA)
AR can be done differently because linear; One step ahead prediction error, by minimising the mean squared prediction error - substituting in the best linear predictor.
ARMA and MA;
The Durbin-Levinson Algorithm
What part of the DLM is not observed?
The state space
What we’re most interested in…
In which model can the error variance change with time?
DLM
Dynamic Linear Model
What are the names of the 2 equations in a DLM?
State equation
Observation equation
Notation for the state and observation regression matrices in their respective equations?
G_t (Φ_t) in state equation
F_t (A_t) in observation equation (F^T)
Name and describe the quadruple of matrices defining a DLM
REGRESSION MATRICES
F_t (A_t) ~ for the observation equation
G_t (Φ_t) ~ for the state equation
VARIANCE MATRICES OF THE NORMAL DISTRIBUTION OF THE ERROR TERM
V_t ~ for the observable equation
W_t ~ for the state equation
If the ‘regression’ matrices are constant in time the DLM is known as what? And if the variance matrices are also constant?
Time Series DLM (TSDLM)
Constant DLM
Describe the properties of a univariate DLM
A Univariate DLM has y_t (Y_t) and v_t univariate. Note x_t (θ_t) can still be a vector in a univariate DLM.
Difference between Forecasting, Filtering and Smoothing
We are trying to estimate the properties of the state (x_t) from the data (y_s)
If t > s this is forecasting (using only the past)
If t = s this is filtering (using the past and present)
If t < s this is smoothing (using past, present and future)
What ways can you fit a DLM to the data (i.e. estimate the parameters)?
Numerically
MLE
Bayes Apart from some special cases MCMC
Gibbs sampler (Usually use this)
What is the Kalman Filter?
A way to forecast data
Define data assimilation and name 2 ways in which this can be done.
Combine data with output from a numerical model to make a better forecast
1) Kalman filter
2) Variational Methods
What is a particle filter
Run an ensemble of models that are spread enough to
allow us to calculate P_ft
Two approaches to dealing with non-stationary spatial fields (non-stationary covariance function).
- CHANGE SPACE: Warp space so that our conventional methods work
- CHANGE METHODS: Explicitly use a GP model that includes the non-stationarity
General form of GP
y(x) = µ(x) + σ(x) + ε(x)
• y(x) - our output • µ(x) - deterministic mean function • σ(x) - zero mean GP • ε(x) - nugget to cope with measurement error and small scale variability
Give names to the ‘spaces’ when warping space to deal with non-stationary spatial fields.
- Geographic space is G-space
* Deformed space is D-space (D=f(G))
What is an identifiable model?
A model where you can estimate all the parameters
What are the three different approaches to modifying the methods/theory when accounting for non-stationarity in the covariance matrix for spatial fields?
- Use a non-stationary covariance function (scale/marginalise)
- Use the process convolution definition of GPs
- Reformulate the GP as the solution to a stochastic partial differentiable equation
The kernel function is equivalent to what?
The covariance function
What does SPDE stand for and what is it?
Stochastic Partial Differential Equation
(NB, not deterministic)
These are Partial Differential Equations driven by white noise.
What SPDEs have a solutions that are Gaussian Processes?
LINEAR SPDEs
The solution of all linear SPDEs are Gaussian processes.
What is the precision matrix?
The inverse of the covariance matrix
What useful property does the precision matrix have?
It is sparse
So can use sparse-matrix methods
Which are fast calculations
What does INLA stand for and what is it?
Integrated Nested Laplace Approximation
A procedure that uses the Laplace approximation to
approximate the posterior for a set of Gaussian models (including GMRF)
What is INLA an alternative to?
MCMC for Bayesian inference
In Spatio-temporal modelling, what do S_s and S_t represent?
S_s ~ spatial variance matrix (qxq)
S_t ~ temporal variance matrix (pxp)
Name a method, widely used in climate and weather forecasting, for spatio-temporal modelling.
Empirical Orthogonal Functions (EOFs)
PCA
What does separability imply about spatio-temporal data?
Separability implies that
- the spatial correlation structure does not change with time
and that
- the time structure does not change with space.
The covariance function in space and time can be separated into a spatial part and a temporal part.
Name a model to use to model spatio-temporal data that is non-seperable?
Generalised DLM
Coregionalisation?
Which is faster? INLA or MCMC?
INLA