Statistical modelling in Space and Time Flashcards

1
Q

What variables is a Gaussian process dependent on/defined by?

A

A mean function

A covariance function (as a function of the distance between two points at x1 and x2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Do Gaussian processes model space or time?

A

Spatial fields

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a strictly stationary process? (stochastic processes)

A

A strictly stationary process has the same statistical properties everywhere.
Denote the process by Y. Then the distribution of Y(x) is the same as for Y(x + h).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define a deterministic function. What is the difference between a deterministic and a stochastic process?

A

A function is considered deterministic if it always returns the same result set when it’s called with the same set of input values.

In deterministic models, the output of the model is fully determined by the parameter values and the initial conditions. Stochastic models possess some inherent randomness. The same set of parameter values and initial conditions will lead to an ensemble of different outputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define a weakly stationary process

A

For a weakly stationary process the first (mean) and second order moments are the same everywhere, and the covariance simply depends on distance.

This means the process has the same mean at all time points, and that the covariance between the values at any two points, x and x+h, depend only on h, the distance between the two points, and not on the location of the points in the region.

(w.l.o.g assume mean is 0)
( Cov(x1, x2) = Cov(x1 + h, x2 + h) )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is another term used for Weak Stationarity?

A

second order stationarity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Does Strict stationarity imply weak stationarity?

A

Yes
In general the converse does not apply but it does for Gaussian processes; and only for Gaussian processes which are the only stochastic processes defined solely by their first and second moments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define a Gaussian Process.

A

A Gaussian process is an infinite dimensional (continuous) stochastic function/process all of whose marginal, conditional and joint distributions are Gaussian.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the first and second order moments of a (stochastic) process?

A

The first moment of xᵢ is the expected value/mean
E[xᵢ]

The second moment of xᵢ is the expected value of xᵢ²
E[xᵢ²]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is intrinsic stationarity?

A

Assume a constant mean process. Then
E[Y(x + h) − Y(x)]² = Var[Y(x − h) − Y(x)] = 2γ(h)
If this only depends on h then the process is said to have intrinsic stationarity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What property does the weakly stationary process’s covariance function have? Proof.

A

Covariance function has to be positive definite.

Weakly stationary, therefore the second moment of xᵢ is finite for all t;
i.e. ∀t, E[xᵢ²] < ∞
Which also implies E[(xᵢ-𝜇)²] = Var(xᵢ) < ∞; i.e. that variance is finite for all t)

(CARD NOT FINISHED)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The Wiener-Khinchin (or Khinchine) theorem is a special case of which theorem for time series?

A

Bochner’s Theorem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define an isotropic process

A

If the covariance depends only on distance (not direction) the process is isotropic.
For an isotropic process the covariance function is univariate (involving one variable quantity).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Separable process

A

In 2D, the correlation structure in the x direction does not change with y (and vice versa).
This holds for multivariate extensions …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Difference between a Gaussian distribution and Gaussian Process

A

Gaussian distribution is defined by its mean and variance, whereas a Gaussian process is defined by a mean function and a covariance function (positive definite).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Weierstrass Theorem

A

By increasing the order of a polynomial we can fit any smooth function to arbitrary precision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Gaussian process with Matérn 3/2 covariance possesses how many derivatives?

A

One

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Via Bochner’s theorem the associated spectral density of a Matérn covariance function is the pdf of what distribution?

A

t-distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Define a nugget

A

an independent (iid normally distributed) error added to each data point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Three reasons why you would add a nugget?

A

1) Instrumental error (often small) - data isn’t entirely smooth because of instrumental error
2) Small scale variation that we don’t want to model by the Gaussian Process - concerned with the larger scale stuff
3) Sometimes we add nuggets for numerical reasons to prevent the covariance matrix being too smooth - guarantees the matrix will be positive definite and numerically stable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Name the layers of the data when considering it in a hierarchical way (model).

A

Data layer
Process layer
Parameter layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What layer is missing from the Empirical Hierarchical model and why?

A

Parameter layer

The parameters θ are fixed numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Bayesian Hierarchical model steps

A

Put a prior distribution on the θ (length scale)
And use conditional distributions to find the distribution of Z (the data).
Bayes theorem then allows to ‘reverse’ the hierarchy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Another name for variogram

A

Semi-variogram

Structure function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Equation for the variogram

A

γ(h) = σ² − C(h)

Given a covariance function we can calculate the corresponding variogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Assume the process is ergodic, what can consequently be concluded about the covariance function, and hence the variogram?

A

C(h) → 0 as h → ∞

lim h→∞ [γ(h)] = σ²

And we have
C(h) = σ² − γ(h) = lim k→∞ [γ(k)] − γ(h)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Define kriging

A

Kriging is using the variogram to interpolate between data

i.e. using a variogram instead of a covariance function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Define simple kriging

A

In simple kriging the mean is assumed to be constant and known (i.e. zero)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Define Ordinary Kriging

A

The mean is estimated as well as the parameters in the variogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Define Universal Kriging

A

The mean is a function of some covariates, usually but not exclusively, spatial co-ordinates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Different weights to use when fitting the theoretical variogram to the sample variogram by least squares?

A
  1. number of pairs in each bin - more data so trust more thus larger weighting than elsewhere
  2. the theoretical variogram - where variogram is higher, trust it more(?)
  3. equal weights
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Outline the steps in the Method of Moments to fit a variogram?

A

Calculate the sample variogram
Choose a shape for the variogram
Fit that variogram to the sample variogram by weighted least squares

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is Hawkin’s and Cressie used for?

A

Hawkins and Cressie is an alternative to estimating the sample variogram.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Name some loss functions used to get a point estimate from the posterior distribution.

A

squared loss = mean of the posterior;
absolute loss = median;
(0, 1) loss = mode (also known as maximum a posteriori (MAP) estimates)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Name different types of priors

A
Subjective Bayes
Objective Bayes
Conjugate Priors
Non-Informative Priors
Informative Priors (MCMC methods)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

2 varieties of MCMC methods

A

Gibbs Sampler

Metropolis-Hastings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Describe a conjugate prior

A

Conjugate prior is one such that the formula for the posterior and the prior are the same

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What type of prior is mostly used in Bayesian Inference?

A

Improper priors / Non-informative priors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Different methods of obtaining a prior distribution of the length scale (delta) in Bayesian Inference.

A

Maximise the posterior (MAP)
MCMC
Approximate the posterior and sample from that
Discretise the prior on δ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Discretising the prior on δ uses what method?

A

Monte Carlo

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Different validation methods for fitting a GP

A

Leave one out
Leave N out
If a completely independent data set, hold some back and use to check (Indivudual Prediction Errors)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

For Leave one out method, how many points would you expect to see outside +/- 2 standard deviations?

A

1 in 20

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Name some of the ‘inputs’ for a simulator/computer model?

A

Parameters
ICs
BCs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What are reasons for the uncertainty in simulators/computer models?

A

STRUCTURAL UNCERTAINTY;

  • Uncertainty in the underlying science (Don’t know the world perfectly, hence equations not perfect)
  • Uncertainty in the solution of the equations (the discretisation adds additional uncertainty etc)

UNCERTAINTY IN THE INPUTS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Define an emulator

A

A Gaussian process to model the simulator output as a function of its input

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What is Sensitivity Analysis? (Emulators)

A

How sensitive is the simulator output to a change in an input (or combination of inputs)

47
Q

What is Uncertainty Analysis? (Emulators)

A

If we are uncertain about the simulator inputs what does that say about our uncertainty on the outputs

48
Q

Name two designs for a set of simulator runs to span input space

A

Optimised Latin Hypercubes
and
quasi-Monte Carlo sequences(Sobol)

49
Q

Two ways of assessing how well spread a Latin Hypercube is?

A
Maximin - maximise the minimum distance between points
Orthogonal designs (e.g. good coverage on x1 + x2 )
50
Q

Idea behind discrepancy (theory) in space?

A

Distributing points in some space such that they are evenly distributed with respect to some (mostly geometrically defined) subsets. The discrepancy (irregularity) measures how far a given distribution deviates from an ideal one.

51
Q

Steps in Building an Emulator

A

Specify the Gaussian process model (mean and covariance function)
Select the prior distributions for the GP hyperparameters
Choose a design for training and validation
Run the ensemble of model runs
Fit the emulator to the simulator runs
Validate and re-fit if needed

52
Q

Why can’t time series data be modelled like spatial data?

A

Time has one direction
Time series is considered as discrete as it is often collected at regular intervals
(Trends more common in time series - need to consider more)
Time series are used to extrapolate, whereas spatial data is normally used for interpolation

53
Q

2 ways of dealing with time series seasonality?

A

Seasonal anomalies

Seasonal differences

54
Q

How to describe data that has second order stationarity (weak) and Gaussian?

A

Strictly Stationary

Strict stationarity implies weak/second-order stationarity and the converse is true for Gaussian processes only.

55
Q

Which type of stationarity is not (equivalently) defined in time series data?

A

Intrinsic stationarity

56
Q

Can time series give negative correlations?

A

Yes

57
Q

Assumption on the variance for Auto correlation functions?

A

Variance = 1

58
Q

Name of the time series equivalent of the Bochner Theorem

A

The Wiener-Khintchine Theorem

59
Q

The Wiener-Khintchine Theorem

A

The Fourier transform of a valid covariance (correlation) function is a density function
(and vice versa)

Allows you to talk about time series in a Fourier Space

60
Q

Relationship of spectrum and ACF?

A

The fourier transform of the ACF is called the Spectral Density Function (spectrum)

NB the fourier transform of the spectrum is also the ACF

61
Q

Name a condition for calculating the spectrum

A

mean = 0

62
Q

How to calculate the Cross Spectrum?

A

Take the fourier transform of the cross-covariance between 2 time series x_t and y_t

63
Q

Describe the form of the Cross Spectrum

A
Complex number
S_xy = c_xy − iq_xy
c_xy is the co-spectrum
q_xy is the quad-spectrum
S_yx is the complex conjugate of S_xy
64
Q

Define coherency between two time series x and y

A

Coherency = correlation between the x and y at a particular frequency

65
Q

How to represent the Cross spectrum in terms of Amplitude and Phase

A

Can because of complex number
Amplitude: sqrt[(c_xy)^2 + (q_xy)^2]
Phase: arctan(c_xy/q_xy)

66
Q

Is kriging model-free?

A

Yes

67
Q

Is spectral analysis non-parametric?

A

Yes

It is model-free

68
Q

What type of equations are the MA and AR processes?

A

Difference equations

69
Q

What is ε_t representing in the MA and AR processes?

A

εt is white noise.
εt is i.i.d (independently and identically distributed) from a normal distribution with mean zero and with variance σ^2_w

70
Q

How to find the auto-correlation function, ρ(h), of a MA or AR process?

A

Auto-covariance function / variance of the process

h is the lag
ρ(h=0) = 1

71
Q

MA(1) process

A
x_t = ε_t + β1*ε_{t-1}
x_t = (1 + β1*B)ε_t
72
Q

AR(1) process

A

x_t = α*x_{t-1} + ε_t

73
Q

Condition for stationary AR(1) process

A

|α| < 1 ~ stationary process

α = 1 ~ random walk
|α| > 1 ~ explosive process

74
Q

Relationship between AR and MA models

A

An AR(q) model is actually an MA model with infinite order.

An MA model is an infinite order AR model.

(Can be shown for q=1 and for any q i.e. all AR processes)

75
Q

Another term for white noise variance

A

Innovation variance

76
Q

Contextually describe the PACF?

A

PACF: what correlation do I have in the next lag, that isn’t explained by all the previous lags

77
Q

How to write an ARMA process in its causal and invertible form?

A

Causal form, define xt in terms of εt (put everything on the MA side of the ARMA model)

Invertible form, define εt in terms of xt (put everything of the AR side of the ARMA model)

78
Q

When is an ARMA process causal?

A

Let θ(z) and Φ(z) be the AR and MA polynomials with B (the backward shift operator) replaced with a complex number z.

An ARMA process is causal iff
Φ(z) =/= 0 for |z| <= 1

79
Q

When is an ARMA process invertible?

A

Let θ(z) and Φ(z) be the AR and MA polynomials with B (the backward shift operator) replaced with a complex number z.

An ARMA process is invertible iff
θ(z) is =/= 0 for |z| <= 1

80
Q

Are MA processes unique?

A

No

81
Q

Are AR(p) models linear or non-linear?

A

Linear

82
Q

How to choose order of ARMA model?

A

Best AIC or BIC values

Akaike Information Critera (AIC)
Bayesian Information Critera (BIC)

83
Q

What time series model can be used to capture seasonality?

A

ARIMA

DLM

84
Q

Indication of a good fitting time series model? (AR, MA, ARMA, ARIMA)

A

Residuals should be normally distributed

The residuals should be uncorrelated (can tell by ACF, or spectrum)

85
Q

How to forecast using time series models? (AR, MA, ARMA)

A

AR can be done differently because linear; One step ahead prediction error, by minimising the mean squared prediction error - substituting in the best linear predictor.

ARMA and MA;
The Durbin-Levinson Algorithm

86
Q

What part of the DLM is not observed?

A

The state space

What we’re most interested in…

87
Q

In which model can the error variance change with time?

A

DLM

Dynamic Linear Model

88
Q

What are the names of the 2 equations in a DLM?

A

State equation

Observation equation

89
Q

Notation for the state and observation regression matrices in their respective equations?

A

G_t (Φ_t) in state equation

F_t (A_t) in observation equation (F^T)

90
Q

Name and describe the quadruple of matrices defining a DLM

A

REGRESSION MATRICES
F_t (A_t) ~ for the observation equation
G_t (Φ_t) ~ for the state equation

VARIANCE MATRICES OF THE NORMAL DISTRIBUTION OF THE ERROR TERM
V_t ~ for the observable equation
W_t ~ for the state equation

91
Q

If the ‘regression’ matrices are constant in time the DLM is known as what? And if the variance matrices are also constant?

A

Time Series DLM (TSDLM)

Constant DLM

92
Q

Describe the properties of a univariate DLM

A
A Univariate DLM has y_t (Y_t) and v_t univariate.
Note x_t (θ_t) can still be a vector in a univariate DLM.
93
Q

Difference between Forecasting, Filtering and Smoothing

A

We are trying to estimate the properties of the state (x_t) from the data (y_s)

If t > s this is forecasting (using only the past)
If t = s this is filtering (using the past and present)
If t < s this is smoothing (using past, present and future)

94
Q

What ways can you fit a DLM to the data (i.e. estimate the parameters)?

A

Numerically
MLE
Bayes Apart from some special cases MCMC
Gibbs sampler (Usually use this)

95
Q

What is the Kalman Filter?

A

A way to forecast data

96
Q

Define data assimilation and name 2 ways in which this can be done.

A

Combine data with output from a numerical model to make a better forecast

1) Kalman filter
2) Variational Methods

97
Q

What is a particle filter

A

Run an ensemble of models that are spread enough to

allow us to calculate P_ft

98
Q

Two approaches to dealing with non-stationary spatial fields (non-stationary covariance function).

A
  1. CHANGE SPACE: Warp space so that our conventional methods work
  2. CHANGE METHODS: Explicitly use a GP model that includes the non-stationarity
99
Q

General form of GP

A

y(x) = µ(x) + σ(x) + ε(x)

• y(x) - our output
• µ(x) - deterministic mean function
• σ(x) - zero mean GP
• ε(x) - nugget to cope with measurement error and
small scale variability
100
Q

Give names to the ‘spaces’ when warping space to deal with non-stationary spatial fields.

A
  • Geographic space is G-space

* Deformed space is D-space (D=f(G))

101
Q

What is an identifiable model?

A

A model where you can estimate all the parameters

102
Q

What are the three different approaches to modifying the methods/theory when accounting for non-stationarity in the covariance matrix for spatial fields?

A
  1. Use a non-stationary covariance function (scale/marginalise)
  2. Use the process convolution definition of GPs
  3. Reformulate the GP as the solution to a stochastic partial differentiable equation
103
Q

The kernel function is equivalent to what?

A

The covariance function

104
Q

What does SPDE stand for and what is it?

A

Stochastic Partial Differential Equation
(NB, not deterministic)
These are Partial Differential Equations driven by white noise.

105
Q

What SPDEs have a solutions that are Gaussian Processes?

A

LINEAR SPDEs

The solution of all linear SPDEs are Gaussian processes.

106
Q

What is the precision matrix?

A

The inverse of the covariance matrix

107
Q

What useful property does the precision matrix have?

A

It is sparse
So can use sparse-matrix methods
Which are fast calculations

108
Q

What does INLA stand for and what is it?

A

Integrated Nested Laplace Approximation
A procedure that uses the Laplace approximation to
approximate the posterior for a set of Gaussian models (including GMRF)

109
Q

What is INLA an alternative to?

A

MCMC for Bayesian inference

110
Q

In Spatio-temporal modelling, what do S_s and S_t represent?

A

S_s ~ spatial variance matrix (qxq)

S_t ~ temporal variance matrix (pxp)

111
Q

Name a method, widely used in climate and weather forecasting, for spatio-temporal modelling.

A

Empirical Orthogonal Functions (EOFs)

PCA

112
Q

What does separability imply about spatio-temporal data?

A

Separability implies that
- the spatial correlation structure does not change with time
and that
- the time structure does not change with space.

The covariance function in space and time can be separated into a spatial part and a temporal part.

113
Q

Name a model to use to model spatio-temporal data that is non-seperable?

A

Generalised DLM

Coregionalisation?

114
Q

Which is faster? INLA or MCMC?

A

INLA