Statistical modelling in Space and Time Flashcards

1
Q

What variables is a Gaussian process dependent on/defined by?

A

A mean function

A covariance function (as a function of the distance between two points at x1 and x2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Do Gaussian processes model space or time?

A

Spatial fields

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a strictly stationary process? (stochastic processes)

A

A strictly stationary process has the same statistical properties everywhere.
Denote the process by Y. Then the distribution of Y(x) is the same as for Y(x + h).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define a deterministic function. What is the difference between a deterministic and a stochastic process?

A

A function is considered deterministic if it always returns the same result set when it’s called with the same set of input values.

In deterministic models, the output of the model is fully determined by the parameter values and the initial conditions. Stochastic models possess some inherent randomness. The same set of parameter values and initial conditions will lead to an ensemble of different outputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define a weakly stationary process

A

For a weakly stationary process the first (mean) and second order moments are the same everywhere, and the covariance simply depends on distance.

This means the process has the same mean at all time points, and that the covariance between the values at any two points, x and x+h, depend only on h, the distance between the two points, and not on the location of the points in the region.

(w.l.o.g assume mean is 0)
( Cov(x1, x2) = Cov(x1 + h, x2 + h) )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is another term used for Weak Stationarity?

A

second order stationarity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Does Strict stationarity imply weak stationarity?

A

Yes
In general the converse does not apply but it does for Gaussian processes; and only for Gaussian processes which are the only stochastic processes defined solely by their first and second moments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define a Gaussian Process.

A

A Gaussian process is an infinite dimensional (continuous) stochastic function/process all of whose marginal, conditional and joint distributions are Gaussian.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the first and second order moments of a (stochastic) process?

A

The first moment of xᵢ is the expected value/mean
E[xᵢ]

The second moment of xᵢ is the expected value of xᵢ²
E[xᵢ²]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is intrinsic stationarity?

A

Assume a constant mean process. Then
E[Y(x + h) − Y(x)]² = Var[Y(x − h) − Y(x)] = 2γ(h)
If this only depends on h then the process is said to have intrinsic stationarity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What property does the weakly stationary process’s covariance function have? Proof.

A

Covariance function has to be positive definite.

Weakly stationary, therefore the second moment of xᵢ is finite for all t;
i.e. ∀t, E[xᵢ²] < ∞
Which also implies E[(xᵢ-𝜇)²] = Var(xᵢ) < ∞; i.e. that variance is finite for all t)

(CARD NOT FINISHED)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The Wiener-Khinchin (or Khinchine) theorem is a special case of which theorem for time series?

A

Bochner’s Theorem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define an isotropic process

A

If the covariance depends only on distance (not direction) the process is isotropic.
For an isotropic process the covariance function is univariate (involving one variable quantity).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Separable process

A

In 2D, the correlation structure in the x direction does not change with y (and vice versa).
This holds for multivariate extensions …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Difference between a Gaussian distribution and Gaussian Process

A

Gaussian distribution is defined by its mean and variance, whereas a Gaussian process is defined by a mean function and a covariance function (positive definite).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Weierstrass Theorem

A

By increasing the order of a polynomial we can fit any smooth function to arbitrary precision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Gaussian process with Matérn 3/2 covariance possesses how many derivatives?

A

One

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Via Bochner’s theorem the associated spectral density of a Matérn covariance function is the pdf of what distribution?

A

t-distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Define a nugget

A

an independent (iid normally distributed) error added to each data point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Three reasons why you would add a nugget?

A

1) Instrumental error (often small) - data isn’t entirely smooth because of instrumental error
2) Small scale variation that we don’t want to model by the Gaussian Process - concerned with the larger scale stuff
3) Sometimes we add nuggets for numerical reasons to prevent the covariance matrix being too smooth - guarantees the matrix will be positive definite and numerically stable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Name the layers of the data when considering it in a hierarchical way (model).

A

Data layer
Process layer
Parameter layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What layer is missing from the Empirical Hierarchical model and why?

A

Parameter layer

The parameters θ are fixed numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Bayesian Hierarchical model steps

A

Put a prior distribution on the θ (length scale)
And use conditional distributions to find the distribution of Z (the data).
Bayes theorem then allows to ‘reverse’ the hierarchy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Another name for variogram

A

Semi-variogram

Structure function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Equation for the variogram
γ(h) = σ² − C(h) Given a covariance function we can calculate the corresponding variogram
26
Assume the process is ergodic, what can consequently be concluded about the covariance function, and hence the variogram?
C(h) → 0 as h → ∞ lim h→∞ [γ(h)] = σ² And we have C(h) = σ² − γ(h) = lim k→∞ [γ(k)] − γ(h)
27
Define kriging
Kriging is using the variogram to interpolate between data | i.e. using a variogram instead of a covariance function
28
Define simple kriging
In simple kriging the mean is assumed to be constant and known (i.e. zero)
29
Define Ordinary Kriging
The mean is estimated as well as the parameters in the variogram
30
Define Universal Kriging
The mean is a function of some covariates, usually but not exclusively, spatial co-ordinates.
31
Different weights to use when fitting the theoretical variogram to the sample variogram by least squares?
1. number of pairs in each bin - more data so trust more thus larger weighting than elsewhere 2. the theoretical variogram - where variogram is higher, trust it more(?) 3. equal weights
32
Outline the steps in the Method of Moments to fit a variogram?
Calculate the sample variogram Choose a shape for the variogram Fit that variogram to the sample variogram by weighted least squares
33
What is Hawkin's and Cressie used for?
Hawkins and Cressie is an alternative to estimating the sample variogram.
34
Name some loss functions used to get a point estimate from the posterior distribution.
squared loss = mean of the posterior; absolute loss = median; (0, 1) loss = mode (also known as maximum a posteriori (MAP) estimates)
35
Name different types of priors
``` Subjective Bayes Objective Bayes Conjugate Priors Non-Informative Priors Informative Priors (MCMC methods) ```
36
2 varieties of MCMC methods
Gibbs Sampler | Metropolis-Hastings
37
Describe a conjugate prior
Conjugate prior is one such that the formula for the posterior and the prior are the same
38
What type of prior is mostly used in Bayesian Inference?
Improper priors / Non-informative priors
39
Different methods of obtaining a prior distribution of the length scale (delta) in Bayesian Inference.
Maximise the posterior (MAP) MCMC Approximate the posterior and sample from that Discretise the prior on δ
40
Discretising the prior on δ uses what method?
Monte Carlo
41
Different validation methods for fitting a GP
Leave one out Leave N out If a completely independent data set, hold some back and use to check (Indivudual Prediction Errors)
42
For Leave one out method, how many points would you expect to see outside +/- 2 standard deviations?
1 in 20
43
Name some of the 'inputs' for a simulator/computer model?
Parameters ICs BCs
44
What are reasons for the uncertainty in simulators/computer models?
STRUCTURAL UNCERTAINTY; - Uncertainty in the underlying science (Don't know the world perfectly, hence equations not perfect) - Uncertainty in the solution of the equations (the discretisation adds additional uncertainty etc) UNCERTAINTY IN THE INPUTS
45
Define an emulator
A Gaussian process to model the simulator output as a function of its input
46
What is Sensitivity Analysis? (Emulators)
How sensitive is the simulator output to a change in an input (or combination of inputs)
47
What is Uncertainty Analysis? (Emulators)
If we are uncertain about the simulator inputs what does that say about our uncertainty on the outputs
48
Name two designs for a set of simulator runs to span input space
Optimised Latin Hypercubes and quasi-Monte Carlo sequences(Sobol)
49
Two ways of assessing how well spread a Latin Hypercube is?
``` Maximin - maximise the minimum distance between points Orthogonal designs (e.g. good coverage on x1 + x2 ) ```
50
Idea behind discrepancy (theory) in space?
Distributing points in some space such that they are evenly distributed with respect to some (mostly geometrically defined) subsets. The discrepancy (irregularity) measures how far a given distribution deviates from an ideal one.
51
Steps in Building an Emulator
Specify the Gaussian process model (mean and covariance function) Select the prior distributions for the GP hyperparameters Choose a design for training and validation Run the ensemble of model runs Fit the emulator to the simulator runs Validate and re-fit if needed
52
Why can't time series data be modelled like spatial data?
Time has one direction Time series is considered as discrete as it is often collected at regular intervals (Trends more common in time series - need to consider more) Time series are used to extrapolate, whereas spatial data is normally used for interpolation
53
2 ways of dealing with time series seasonality?
Seasonal anomalies | Seasonal differences
54
How to describe data that has second order stationarity (weak) and Gaussian?
Strictly Stationary | Strict stationarity implies weak/second-order stationarity and the converse is true for Gaussian processes only.
55
Which type of stationarity is not (equivalently) defined in time series data?
Intrinsic stationarity
56
Can time series give negative correlations?
Yes
57
Assumption on the variance for Auto correlation functions?
Variance = 1
58
Name of the time series equivalent of the Bochner Theorem
The Wiener-Khintchine Theorem
59
The Wiener-Khintchine Theorem
The Fourier transform of a valid covariance (correlation) function is a density function (and vice versa) Allows you to talk about time series in a Fourier Space
60
Relationship of spectrum and ACF?
The fourier transform of the ACF is called the Spectral Density Function (spectrum) NB the fourier transform of the spectrum is also the ACF
61
Name a condition for calculating the spectrum
mean = 0
62
How to calculate the Cross Spectrum?
Take the fourier transform of the cross-covariance between 2 time series x_t and y_t
63
Describe the form of the Cross Spectrum
``` Complex number S_xy = c_xy − iq_xy c_xy is the co-spectrum q_xy is the quad-spectrum S_yx is the complex conjugate of S_xy ```
64
Define coherency between two time series x and y
Coherency = correlation between the x and y at a particular frequency
65
How to represent the Cross spectrum in terms of Amplitude and Phase
Can because of complex number Amplitude: sqrt[(c_xy)^2 + (q_xy)^2] Phase: arctan(c_xy/q_xy)
66
Is kriging model-free?
Yes
67
Is spectral analysis non-parametric?
Yes | It is model-free
68
What type of equations are the MA and AR processes?
Difference equations
69
What is ε_t representing in the MA and AR processes?
εt is white noise. εt is i.i.d (independently and identically distributed) from a normal distribution with mean zero and with variance σ^2_w
70
How to find the auto-correlation function, ρ(h), of a MA or AR process?
Auto-covariance function / variance of the process h is the lag ρ(h=0) = 1
71
MA(1) process
``` x_t = ε_t + β1*ε_{t-1} x_t = (1 + β1*B)ε_t ```
72
AR(1) process
x_t = α*x_{t-1} + ε_t
73
Condition for stationary AR(1) process
|α| < 1 ~ stationary process α = 1 ~ random walk |α| > 1 ~ explosive process
74
Relationship between AR and MA models
An AR(q) model is actually an MA model with infinite order. An MA model is an infinite order AR model. (Can be shown for q=1 and for any q i.e. all AR processes)
75
Another term for white noise variance
Innovation variance
76
Contextually describe the PACF?
PACF: what correlation do I have in the next lag, that isn't explained by all the previous lags
77
How to write an ARMA process in its causal and invertible form?
Causal form, define xt in terms of εt (put everything on the MA side of the ARMA model) Invertible form, define εt in terms of xt (put everything of the AR side of the ARMA model)
78
When is an ARMA process causal?
Let θ(z) and Φ(z) be the AR and MA polynomials with B (the backward shift operator) replaced with a complex number z. An ARMA process is causal iff Φ(z) =/= 0 for |z| <= 1
79
When is an ARMA process invertible?
Let θ(z) and Φ(z) be the AR and MA polynomials with B (the backward shift operator) replaced with a complex number z. An ARMA process is invertible iff θ(z) is =/= 0 for |z| <= 1
80
Are MA processes unique?
No
81
Are AR(p) models linear or non-linear?
Linear
82
How to choose order of ARMA model?
Best AIC or BIC values Akaike Information Critera (AIC) Bayesian Information Critera (BIC)
83
What time series model can be used to capture seasonality?
ARIMA | DLM
84
Indication of a good fitting time series model? (AR, MA, ARMA, ARIMA)
Residuals should be normally distributed | The residuals should be uncorrelated (can tell by ACF, or spectrum)
85
How to forecast using time series models? (AR, MA, ARMA)
AR can be done differently because linear; One step ahead prediction error, by minimising the mean squared prediction error - substituting in the best linear predictor. ARMA and MA; The Durbin-Levinson Algorithm
86
What part of the DLM is not observed?
The state space | What we're most interested in...
87
In which model can the error variance change with time?
DLM | Dynamic Linear Model
88
What are the names of the 2 equations in a DLM?
State equation | Observation equation
89
Notation for the state and observation regression matrices in their respective equations?
G_t (Φ_t) in state equation | F_t (A_t) in observation equation (F^T)
90
Name and describe the quadruple of matrices defining a DLM
REGRESSION MATRICES F_t (A_t) ~ for the observation equation G_t (Φ_t) ~ for the state equation VARIANCE MATRICES OF THE NORMAL DISTRIBUTION OF THE ERROR TERM V_t ~ for the observable equation W_t ~ for the state equation
91
If the ‘regression’ matrices are constant in time the DLM is known as what? And if the variance matrices are also constant?
Time Series DLM (TSDLM) | Constant DLM
92
Describe the properties of a univariate DLM
``` A Univariate DLM has y_t (Y_t) and v_t univariate. Note x_t (θ_t) can still be a vector in a univariate DLM. ```
93
Difference between Forecasting, Filtering and Smoothing
We are trying to estimate the properties of the state (x_t) from the data (y_s) If t > s this is forecasting (using only the past) If t = s this is filtering (using the past and present) If t < s this is smoothing (using past, present and future)
94
What ways can you fit a DLM to the data (i.e. estimate the parameters)?
Numerically MLE Bayes Apart from some special cases MCMC Gibbs sampler (Usually use this)
95
What is the Kalman Filter?
A way to forecast data
96
Define data assimilation and name 2 ways in which this can be done.
Combine data with output from a numerical model to make a better forecast 1) Kalman filter 2) Variational Methods
97
What is a particle filter
Run an ensemble of models that are spread enough to | allow us to calculate P_ft
98
Two approaches to dealing with non-stationary spatial fields (non-stationary covariance function).
1. CHANGE SPACE: Warp space so that our conventional methods work 2. CHANGE METHODS: Explicitly use a GP model that includes the non-stationarity
99
General form of GP
y(x) = µ(x) + σ(x) + ε(x) ``` • y(x) - our output • µ(x) - deterministic mean function • σ(x) - zero mean GP • ε(x) - nugget to cope with measurement error and small scale variability ```
100
Give names to the 'spaces' when warping space to deal with non-stationary spatial fields.
* Geographic space is G-space | * Deformed space is D-space (D=f(G))
101
What is an identifiable model?
A model where you can estimate all the parameters
102
What are the three different approaches to modifying the methods/theory when accounting for non-stationarity in the covariance matrix for spatial fields?
1. Use a non-stationary covariance function (scale/marginalise) 2. Use the process convolution definition of GPs 3. Reformulate the GP as the solution to a stochastic partial differentiable equation
103
The kernel function is equivalent to what?
The covariance function
104
What does SPDE stand for and what is it?
Stochastic Partial Differential Equation (NB, not deterministic) These are Partial Differential Equations driven by white noise.
105
What SPDEs have a solutions that are Gaussian Processes?
LINEAR SPDEs | The solution of all linear SPDEs are Gaussian processes.
106
What is the precision matrix?
The inverse of the covariance matrix
107
What useful property does the precision matrix have?
It is sparse So can use sparse-matrix methods Which are fast calculations
108
What does INLA stand for and what is it?
Integrated Nested Laplace Approximation A procedure that uses the Laplace approximation to approximate the posterior for a set of Gaussian models (including GMRF)
109
What is INLA an alternative to?
MCMC for Bayesian inference
110
In Spatio-temporal modelling, what do S_s and S_t represent?
S_s ~ spatial variance matrix (qxq) | S_t ~ temporal variance matrix (pxp)
111
Name a method, widely used in climate and weather forecasting, for spatio-temporal modelling.
Empirical Orthogonal Functions (EOFs) | PCA
112
What does separability imply about spatio-temporal data?
Separability implies that - the spatial correlation structure does not change with time and that - the time structure does not change with space. The covariance function in space and time can be separated into a spatial part and a temporal part.
113
Name a model to use to model spatio-temporal data that is non-seperable?
Generalised DLM | Coregionalisation?
114
Which is faster? INLA or MCMC?
INLA