Geostatistics and Kriging Flashcards
Geostatistics
- Applied branch of statistics that deals with spatial properties
- Ex. Treat problems that arise when conventional statistical theory is used in estimating changes in ore grade w/in a mine
- Deals w/ problems of spatial autocorrelation
Finish the sentence: data is positively correlated with a correlation that…
decreases as distance between data increases
Regionalized variable’s are?
- Continuous from location to location (unlike random variables), but changes are too complex to be described by deterministic function
- Spatially continuous and values are only known at samples, taken at specific locations
Regionalized Variable Def’n
A variable that has intermediate properties btwn a truly random variable and one that is completely deterministic
- i.e. natural phenomena w/ geographic distribution such as elevation, population density, rainfall, etc.
- Many earth science variables are regionalized
Geostats involves estimating…?
The form of a regionalized variable in 1, 2, or 3 dimensions
What is the basic statistical measure of Geostats?
Semivariance
Semivariance
- Measure of degree of spatial dependence between samples at a specific point
- Function of distance, h
- Difference btwn attribute values as a function of their spatial separation, h, or change of a regionalized variable
How is semivariance estimated if spacing btwn observations is constant (change in h)
Semivariance (h) = sum of (zi - zi+h)^2/2n
- zi = measurement of regionalized variable z taken at location i
- zi-h = another measurement taken at change in h intervals away
Semivarince: terms inside expression
z, are attributes taken at intervals of size or distance, h
- if h = 1 every point is compared to its neighbour
- if h = 2 every point is compared to a point 2 spaces away etc. etc.
- Then plot on semivariogram
Semivariance is simply half the variance of what?
Half of the variance of a spatial process
Semivariogram
-
Experimental Semivariogram
- Description of how data are related/correlated w/ distance
Empirical Semivariogram
- Smooth function defined by a model that represents the experimental semivariogram
- Allows semivariance to be estimated at any h
When semivariogram = 0
- h = 0
- Same value, semivariance = 0
- Highly related
As change in h increases, relatedness does what
Relatedness decreases, semivariance increases
What happens when change in h ‘critical’ is reached
Relatedness = 0, semivariance approximates process variance
When change in h is ‘small’
xi, xi +h is ‘similar, semivariance is small
Semivariogram vs. autocorrelation
- Increase semivariance (increase distance), autocorrelation/relatedness decrease
Range
- Distance at which the curve approaches the process variance (sill)
- W/in range, closer sites are similar
- Greater than range, point is not useful to interpolation (too far away)
Sill
- Flat region after the range
- At high distances the semivariance levels off
- No spatial dependence, constant variance
Nugget affect
- Ideally 0
- Variable erratic over short distance
- Variability btwn nearby points
- Random noise due to micro-scale processes plus measurement error
- High variance over small distance
Modelling a semivariogram
- Trial and error process
- Semivariance should be able to be calculated for any h
- Should fit data as closely as possible
- Ideally begin at origin, rise smoothly to some upper limit, continue at constant level after limit
Parabolic semivariogram
Excellent continuity
- Ideal form
Linear semivariogram
Moderate continuity
- No sill, never reaches critical value, may need to expand search of pairs or obtain more data out of study area to find sill, or sill may not exist
Horizontal semivariogram
No spatial autocorrelation
What are 3 important factors of semivariogram
- Range
- Sill
- Nugget
Spherical model
- Radius of curve increases as h increases
- Used to represent phenomena that exhibits a linear decrease in the rate of change of spatial autocorrelation as h increases
Exponential model
- Increases with distance
- Never quite reaches a flat sill
- Typically used for data that has long continuity distances
Semivariance is = to?
Variance of the squared differences of points at distance h apart
Regionalized variables can be regarded as what 2 parts?
- Residuals
- Drift
- Drift <> 0 and semi-v will not flatten
- If drift exists, it should be removed by TSA
Irregularly spaced data
- Irregular data leads to distance btwn points that is not constant
- Necessary to partition distances btwn points into classes (lags)
Lag distances
- Should be equal to or slightly less than the average nearest Neighbour Distance
- Number of Lags is number of intervals of lag distances that will be examined
- Number of lags x the lag distance should be less than 1/2 the largest distance in the dataset
Anisotropic Data
- Directional
- Stationary but not isotropic data and the semi-v will differ depending on the orientation of the analysis
- If semi-v changes w/ direction then data is non-stationary
- Calc semivariograms for multi directions to see if data is anisotropic and on which direction the primary axis lies
Kriging def’n
- Interpolation technique that uses regionalized variable theory to incorporate information about the stochastic aspects of spatial variation when estimating interpolation weights
Optimal properties of Kriging
- Exact estimator
- Predicts sample points w/ 0 error
- Provides a measure of uncertainty of the interpolated surface
Kriging
- Generalized Linear Regression (GLR) technique
- Requires knowledge of date (from semivariogram)
- Does not assume independence of observations
- Does not assume randomness of observations, assumes autocorrelation
BLUE
- Best b/c aims at minimizing variance of errors
- Linear b/c its estimates are weighted linear combinations of the available data
- Unbiased b/c it tries to have the mean residual or error = 0
- Estimator
Ordinary Kriging
- Simplest form
- Dimensionless points to estimate other dimensionless points (elevation, precip)
- Unknown value is calculated using weighted average of known values
- Estimator is unbiased
- Mean error = 0 for large samples b/c of weighting
What is the crucial variable in Ordinary kriging eneqn
- Weight, lambda
- Weights change depending on location of unknown point
- Sum of lambda = 1 ensures error variance = minimum
Assumptions of Ordinary kriging
- Partial realization of random function
- Stationary regionalized variable (mean, spatial semivariance do not depend on the variable of interest)
- Estimation of value at unknown location is based on values at known location (weighted average)
- No trend or directional influence
Remote Sensing and Ordinary Kriging
- Used to fill gaps in cover of pixel information (i.e. where clouds cover)
- Used to filter out noise
Stochastic Processes
- Drift/Trend is avg of a regionalized variable w/in a neighbourhood
- Relatively slow varying, non-stationary part of the surface
- Residuals are difference btwn actual measurements and the drift
- Subtract drift and regionalized variable becomes stationary
Universal Kriging
- Generalization of Kriging procedure
- Doesn’t require stationary variable assumption
- Non-stationary has 2 components, Drift/Trend and Residual
- Linear estimator not biased in presence of a trend
Kriging performs in 1 step what would otherwise require 3
- estimate and remove trend from non-stationary variables
- Use Ordinary Kriging on non-stationary residuals to obtain estimated residuals at unsampled points
- Combined estimated residuals w/ trend to obtain estimate of actual surface
- Basically pull out trend, model residuals, put trend back in
What does effectiveness of Kriging rely on?
- Correct specification of parameters that describe semivariogram
- Drift model
Where does charging yield estimates of likely error?
Standard errors or error variances at every interpolation point
Since kriging is robust…
- Even with naive parameter selection the method will do no worse than conventional grid estimation
Kriging: Smoothing
- Smooths according to the proportion of total sample variance accounted for by random ‘noise’
- Noisier the data, the less samples represent their immediate vicinity and the more they are smoothed
Kriging: De-clustering
- Weight assigned to a sample is lowered to the degree that its information is duplicated by nearby, highly correlated samples
- Helps mitigate the impact of over-sampled ‘hot-spots’
Kriging: Anisotropy
- When samples are more highly correlated in a particular direction, the weights will be greater for samples in that direction
Kriging: Precision
- Most precise estimates possible will be computed for available data given a representative semivariogram
What is price that must be paid for optimality in estimation and kriging
- Price is computation complexity
- Many simultaneous eqns must be solved for every interpolation point in kriging
- Computer run-times will be longer using kriging over conventional interpolations
What extensive prior study must be made for kriging?
- Process stationary or not
- Form of semivariogram
- Set neighbourhood size/orientation
- Select proper order of the trend if it exists
What is the goal for validation of kriging?
- Validation withholds a portion of data from semivariogram and kriging
- Goal is to get ratio btwn the RMS Predicted Error from cross-validation/validation process and the Estimation Error of the surface to = 1