Statistical modelling in Space and Time Flashcards
What variables is a Gaussian process dependent on/defined by?
A mean function
A covariance function (as a function of the distance between two points at x1 and x2)
Do Gaussian processes model space or time?
Spatial fields
What is a strictly stationary process? (stochastic processes)
A strictly stationary process has the same statistical properties everywhere.
Denote the process by Y. Then the distribution of Y(x) is the same as for Y(x + h).
Define a deterministic function. What is the difference between a deterministic and a stochastic process?
A function is considered deterministic if it always returns the same result set when it’s called with the same set of input values.
In deterministic models, the output of the model is fully determined by the parameter values and the initial conditions. Stochastic models possess some inherent randomness. The same set of parameter values and initial conditions will lead to an ensemble of different outputs.
Define a weakly stationary process
For a weakly stationary process the first (mean) and second order moments are the same everywhere, and the covariance simply depends on distance.
This means the process has the same mean at all time points, and that the covariance between the values at any two points, x and x+h, depend only on h, the distance between the two points, and not on the location of the points in the region.
(w.l.o.g assume mean is 0)
( Cov(x1, x2) = Cov(x1 + h, x2 + h) )
What is another term used for Weak Stationarity?
second order stationarity
Does Strict stationarity imply weak stationarity?
Yes
In general the converse does not apply but it does for Gaussian processes; and only for Gaussian processes which are the only stochastic processes defined solely by their first and second moments.
Define a Gaussian Process.
A Gaussian process is an infinite dimensional (continuous) stochastic function/process all of whose marginal, conditional and joint distributions are Gaussian.
What are the first and second order moments of a (stochastic) process?
The first moment of xᵢ is the expected value/mean
E[xᵢ]
The second moment of xᵢ is the expected value of xᵢ²
E[xᵢ²]
What is intrinsic stationarity?
Assume a constant mean process. Then
E[Y(x + h) − Y(x)]² = Var[Y(x − h) − Y(x)] = 2γ(h)
If this only depends on h then the process is said to have intrinsic stationarity
What property does the weakly stationary process’s covariance function have? Proof.
Covariance function has to be positive definite.
Weakly stationary, therefore the second moment of xᵢ is finite for all t;
i.e. ∀t, E[xᵢ²] < ∞
Which also implies E[(xᵢ-𝜇)²] = Var(xᵢ) < ∞; i.e. that variance is finite for all t)
(CARD NOT FINISHED)
The Wiener-Khinchin (or Khinchine) theorem is a special case of which theorem for time series?
Bochner’s Theorem
Define an isotropic process
If the covariance depends only on distance (not direction) the process is isotropic.
For an isotropic process the covariance function is univariate (involving one variable quantity).
Separable process
In 2D, the correlation structure in the x direction does not change with y (and vice versa).
This holds for multivariate extensions …
Difference between a Gaussian distribution and Gaussian Process
Gaussian distribution is defined by its mean and variance, whereas a Gaussian process is defined by a mean function and a covariance function (positive definite).
Weierstrass Theorem
By increasing the order of a polynomial we can fit any smooth function to arbitrary precision
Gaussian process with Matérn 3/2 covariance possesses how many derivatives?
One
Via Bochner’s theorem the associated spectral density of a Matérn covariance function is the pdf of what distribution?
t-distribution
Define a nugget
an independent (iid normally distributed) error added to each data point
Three reasons why you would add a nugget?
1) Instrumental error (often small) - data isn’t entirely smooth because of instrumental error
2) Small scale variation that we don’t want to model by the Gaussian Process - concerned with the larger scale stuff
3) Sometimes we add nuggets for numerical reasons to prevent the covariance matrix being too smooth - guarantees the matrix will be positive definite and numerically stable.
Name the layers of the data when considering it in a hierarchical way (model).
Data layer
Process layer
Parameter layer
What layer is missing from the Empirical Hierarchical model and why?
Parameter layer
The parameters θ are fixed numbers
Bayesian Hierarchical model steps
Put a prior distribution on the θ (length scale)
And use conditional distributions to find the distribution of Z (the data).
Bayes theorem then allows to ‘reverse’ the hierarchy.
Another name for variogram
Semi-variogram
Structure function
Equation for the variogram
γ(h) = σ² − C(h)
Given a covariance function we can calculate the corresponding variogram
Assume the process is ergodic, what can consequently be concluded about the covariance function, and hence the variogram?
C(h) → 0 as h → ∞
lim h→∞ [γ(h)] = σ²
And we have
C(h) = σ² − γ(h) = lim k→∞ [γ(k)] − γ(h)
Define kriging
Kriging is using the variogram to interpolate between data
i.e. using a variogram instead of a covariance function
Define simple kriging
In simple kriging the mean is assumed to be constant and known (i.e. zero)
Define Ordinary Kriging
The mean is estimated as well as the parameters in the variogram
Define Universal Kriging
The mean is a function of some covariates, usually but not exclusively, spatial co-ordinates.
Different weights to use when fitting the theoretical variogram to the sample variogram by least squares?
- number of pairs in each bin - more data so trust more thus larger weighting than elsewhere
- the theoretical variogram - where variogram is higher, trust it more(?)
- equal weights
Outline the steps in the Method of Moments to fit a variogram?
Calculate the sample variogram
Choose a shape for the variogram
Fit that variogram to the sample variogram by weighted least squares
What is Hawkin’s and Cressie used for?
Hawkins and Cressie is an alternative to estimating the sample variogram.
Name some loss functions used to get a point estimate from the posterior distribution.
squared loss = mean of the posterior;
absolute loss = median;
(0, 1) loss = mode (also known as maximum a posteriori (MAP) estimates)
Name different types of priors
Subjective Bayes Objective Bayes Conjugate Priors Non-Informative Priors Informative Priors (MCMC methods)
2 varieties of MCMC methods
Gibbs Sampler
Metropolis-Hastings
Describe a conjugate prior
Conjugate prior is one such that the formula for the posterior and the prior are the same
What type of prior is mostly used in Bayesian Inference?
Improper priors / Non-informative priors
Different methods of obtaining a prior distribution of the length scale (delta) in Bayesian Inference.
Maximise the posterior (MAP)
MCMC
Approximate the posterior and sample from that
Discretise the prior on δ
Discretising the prior on δ uses what method?
Monte Carlo
Different validation methods for fitting a GP
Leave one out
Leave N out
If a completely independent data set, hold some back and use to check (Indivudual Prediction Errors)
For Leave one out method, how many points would you expect to see outside +/- 2 standard deviations?
1 in 20
Name some of the ‘inputs’ for a simulator/computer model?
Parameters
ICs
BCs
What are reasons for the uncertainty in simulators/computer models?
STRUCTURAL UNCERTAINTY;
- Uncertainty in the underlying science (Don’t know the world perfectly, hence equations not perfect)
- Uncertainty in the solution of the equations (the discretisation adds additional uncertainty etc)
UNCERTAINTY IN THE INPUTS
Define an emulator
A Gaussian process to model the simulator output as a function of its input