Interpolation Flashcards
Spatial Sampling
- Location of sample points can be critical for subsequent analysis
- for mapping, ideally, samples should be located evenly over the area. Regular sampling could be biased, and completely random location has also drawbacks
Examples:
- Regular sampling
- Random Sampling
- Stratified random sampling (good compromise between random and regular —> individual points are located randomly within regularly blocks or strata
- cluster sampling –> can be used to examine spatial variation at several different scales
- Transect sampling
- Contour sampling –> making DEM
Definition Spatial interpolation
*general definition: Procedure of predicting the value of attributes at unsampled sites from measurements made at
point locations within the same area.
- Predicting value of an attribute at sites outside the area covered by existing information is called extrapolation
Point interpolation is used to convert data from point observations to continuous fields so that the spatial patterns sampled by these measurements can be compared with the spatial patterns
of other spatial entities
Necessary when:
- data do not cover the domain of interest completely.
- discretized surface has a different level of resolution from that required.
- data model is different from that required represents a continuous surface.
Examples: Elevation, thickness of soil, perimeters of trees, soil organic carbon content, depth to
groundwater, precipitation, heavy metal levels in soil or plants.
Tasks to be fulfilled:
- Catch the important features of the data
- estimate the average value over a large area
- and estimate unknown values at unsampled locations
- estimate average values over small areas, check the performance of the estimation methodology.
Definition
- Exact interpolation
- Support
Exact Interpolation predicts a value of an attribute at a sample point, which is identical to the measured value.
Support is the volume (or area or length) of the physical sample on which a measurement is
made (difference between 1g or 1 kg) less variation if the volume increases. Important in mining
Spatial interpolation Methods
-Global Methods (3)
* - Classification using external Information.
- Trend surfaces on geometric coordinates.
- Regression models on
surrogate attributes.
-Local Deterministic Methods (2)
- Thiessen Polygon.
- Inverse Distance Weighting.
- Splines
-Geostatistical Methods
- Kriging (Ordinal, universal, indicator)
- Co-kriging
- Conditional Simulation
GLobal Method –> Classification
- Definiton
- classification method
- GLOBAL INTERPOLATORS:
- Use all available data to provide predictions for the whole area of interest.
- **Classification methods use easily available information to divide the area into
regions that can be characterized by the statistical moments (mean, variance) of attributes measured at locations within those regions:
a) Global prediction using global classification methods:
- In some cases, it is convenient to
assume that the observations are taken from a statistically stationary population, that
means mean and variance are independent of both location and support.
- We can decide that these observation points sampled coordinate spatial change, so we can select a classificatory approach based on the spatial units or perform the standard analysis of variance ANOVA. The simplest statistical model is the ANOVA model: (Formel anschauen)
Assumptions:
- - variations of “Z” within the map units are random and not spatially contiguous (we have sharp class boundaries in the examples of flood classes–> but unrealistic).
- all mapping units have the same within class variance (noise, same error around the means)
- all attributes are normally distributed.
- spatial changes take place at the class boundaries, and they are sharp, and not gradual.
- transformation of original data can be done in order to achieve normal distribution. (ex. log)
- The analysis of variance compares the variance between classes to the within class (error) variance.
-The ratio of between and within variances is called F-value (MSB/MSW=F)
-It is compared against a tabulated critical value that fives the maximum F-value that is at the selected probability level purely random. If F > Fcrit, then the difference between classes is non-random (there is at least one class different).
-To check which classes differ from
each other, another test (t-test) must be applied.
Global Method –> Trend surfaces on geometrical coordinates
Global Interpolation using trend surfaces:
- When variation of an attribute occurs
continuously over a landscape, it may be possible to model it by a smooth mathematical surface.
-The simplest way to model is by a multiple regression of attribute values versus
geographical locations.
-The idea is to fit a polynomial line or surface that minimizes the sum of squares for 𝑍
̂(𝑋𝑖) − 𝑍(𝑋𝑖). X and Y are independent, and Z is normally distributed. Regression errors are independent of location.
- 𝑍 (𝑋) = 𝑏𝑜 + 𝑏1𝑋 + 𝜀
- linear trend: 𝑍(𝑋, 𝑌) = 𝛽𝑜 + 𝛽1𝑋 + 𝛽2𝑌 + 𝜀 (Fluctuation over the plane)
- quadratic trend: 𝑍(𝑋, 𝑌) = 𝛽𝑜 + 𝛽1𝑋 + 𝛽2𝑌 + 𝛽3𝑋2 + 𝛽4𝑌2 + 𝛽5𝑋𝑌 + 𝜀
- The significance of a trend surface can be tested by the technique of analysis of
variance to partition the variance between trend and residuals from the trend. The
goodness of fit (𝑅2) values show that even higher order surfaces coefficients do not fully represent all the variations in the data. Even if significally better fits can be obtained
with higher orders polynomial, it is not physically sensible to choose because it has not physical explanation.
Global Method–> Regression models on surrogate attributes (cheap to measure attributes)
- attributes like “distance to river” or “elevation”
- Regression model has the form:
𝑍(𝑋) = 𝑏𝑜 + 𝑏1𝑃 1 + 𝑏2𝑃2 + 𝜀 - 𝑏𝑜, 𝑏1, … are regression coefficients; 𝑃 1, 𝑃2, … independent properties (most important
properties that influences the simulation/interpolation) - The most important point is that the regression model makes physical sense, and not all are inexact interpolation.
- The result gives a continuous data with a confidential interval of 95%.
Local Deterministic Methods
- Definiton
a) nearest Neighbors: Triangulation and Tessellation
- Local Methods of interpolation
that use the information from the nearest data points directly. - For this approach, interpolation involves defining a search area or neighborhood around the point to be
predicted, finding data points in this neighborhood, choosing mathematical functions to represent the variation over this limit number of points, and evaluating it for the point on a regular grid. - The procedure is done until all the points on the grid have been computed
a) nearest Neighbors: Triangulation and Tessellation
- When we talk about Voronoi polygons and spatial predictions, it basically means that we are trying to predict attributes for locations where we have no data. We do this by looking at the nearest available data points and using their attributes to make predictions for the missing locations. It’s a rough approximation, but sometimes that’s the best method we have.
- Thiessen polygons divide the region in a way that is totally determined by the configuration of the data points, with one observation per cell
- Thiessen polygons are often used in GIS as a quick method for relating point data to space, for meteorology data for a given site.
+ can be easily used with qualitative data like vegetation class or land use, all needed is a choropleth map.
+ They are exact predictors because all
predictions equal the values at the data points.
- cons: has sharp borders (discontinuities are undesirable and have little to do with the
reality)
- The lines joining the data points show the Delaunay triangulation, which is the topology as TIN.
- overcomes this problem of polygonal method, removing possible discontinuities between adjacent points by filling a plane through three samples that surround the point being estimated.
- pycnophylactic method: Mass-preserving relocation from primary data.
- Ensures that the volume of the attribute in a spatial entity remains the same, irrespective of if the
global variation of the attribute is represented by homogenous, crisp polygons or a continuous surface.
-The total volume of the attribute per polygon is in variable. - The constraining surface is assumed to vary smoothly so that neighboring location have
similar values. Conversion of the data to a density function. - The resulting pattern is
similar to smooth interpolators, but it is not an exact interpolator
Local Deterministic Methods
b) Inverse Distance Interpolation - Linear Interpolation
- Combine the idea of proximity of
Thiessen polygons with the gradual change of the trend surface.
-The assumption is that the value of an attribute Z at some point is a distance weight average of data points
occurring within a neighborhood or window
-It is used to create raster surface from point data. - exact interpolator
- FORMULAR and Example
- The further away the point is, the lower the influence to the estimated point.
- r parameter for smoother result (r=2)
Local Deterministic Methods
b) Spline Interpolation
- Form of interpolation where the interpolant is a special type of piece wise polynomial called spline.
-Interpolation error can be made small even when using low degree polynomials. - Gives a smoother result.
Geostatistical Methods
- definition
- It assumes that the data point’s values represent a sample
from some underlying true population. By analyzing this sample, it is often possible to
derive a general model that describes how the sample values vary with distance (and
direction) - Core Concepts in Geostatistics:
Frequency tables and histogram: - A frequency table records how often observed values fall within certain intervals or classes. (Graphically=histogram)
- common to use a constant class width for the histogram so that the height of each bar
is proportional to the number of values within that class. -
Cumulative frequency tables
can be used, and histograms might be prepared after ranking the data in descending order.
Probability plots:
- A normal probability plot is a type of cumulative frequency plot that
helps to see if the distribution is close to Gaussian distribution.
- On a normal probability
plot Y–axis is scaled in such a way that the cumulative frequencies will plot as a straight
line if the distribution is Gaussian.
Statistical Parameters from experimental data:
Mean: mx= 1/n * ∑xi
Variance: S = 1/n* ∑(xi - mx)^2
Standard deviation: Sx = Wurzel aus Varianz
Covariance: 𝐶𝑋𝑌 =1/𝑛 * ∑ (𝑋𝑖− 𝑚𝑋)(𝑌 𝑖− 𝑚𝑌)
Scatterplot:
- Most common display of bivariate data, which is an X–Y graph of the data on which the X coordinate corresponds to the value of one variable and the Y coordinate
to the value of the other variable.
- There is some scatter in cloud, the larger values of the variable “V” tend to be associated with the larger values of the variable “U”, and smaller values of the variable “V” tend to be associated with the smaller values of the variable “U”.
- The shape of a cloud of points on an h-scatter plot tells us how continuous the data
values are over a certain distance in a particular direction. If the data values at locations separated by “h” are very similar, then the pairs will plot close to the line X = Y, a 45- degree line passing through the origin. As the data values become less similar, the cloud of points on the h–scatterplot becomes fatter and more diffuse.
Correlogram, covariance function and variogram:
- The relationship between the
correlation coefficient of an h–scatterplot, and “h” is called the correlation function or
correlogram. The correlation coefficient depends on “h”, which has both, a magnitude
and a direction.
* Correlogram = change of the coefficient of correlation against space.
a) Experimental Variogram
explain Nugget, Range and Sill
Experimental Variogram
- Variogram is a function describing the degree of spatial dependence of a spatial random field or stochastic process.
= defined as the variance of the difference between field values at two location (X, Y) across realizations of the field.
- main goal of a variogram
analysis is to construct a variogram that best estimates the autocorrelation structure of
the underlying stochastic process.
- Useful information for interpolation, optimization sampling and determining spatial patterns.
- parameters:
*Nugget: (Small scale variation):
a micro-scale variation and measurement error. it is estimated from variogram when the y=0.
Sill: it’s the variance of the random field up to which the variogram doesn’t change. - Range: Distance (if any) at which data are no-longer autocorrelated. It describes how
inter-site differences are spatially dependent. Within the range, the closer together the
sites are, the more similar they are likely to be. If the distance to an unvisited site from
data point is big, the data point can make non-useful contribution to the interpolation.
computation steps:
1. Form all possible data pairs.
2. Group the data pairs into distance classes.
3. Compute the difference between the two values of each data pair.
4. Square all the difference.
5. Compute the average value of the squared differences for each distance class.
6. Divide by two (by definition)
- Variograms can be nested
Kriging interpolation
Def.: It is based on the variogram, and yields the Best Linear Unbiased Estimator (BLUE). Produces not only estimates, but also their error variances. It is linear because it estimates linear combinations of the data, unbiased because it attempts to have a mean residual error of zero, and best because it minimizes the error variance
- types of kriging:
Ordinary kriging
Block
Stratified
Indicator
Co-kriging
steps in kriging
- Examine data for normally and spatial trends and carry out appropriate
transformation. If using indicator kriging, transform data to binary (0/1) values. - Compute experimental variogram, and fit a suitable model to it. If spatial variation
is pure nugget, the interpolation is not sensible. (*Pure nugget: no special structure,
no spatial variation. Example: pH soil) - Check the variogram model by cross-validation (jack–knifing).
- Use the variogram model to interpolate sites on a regular grid, where the sites are
either equal in size to the original samples (point kriging), or larger blocks of land
(block kriging). - Either display results as grid cell maps or by threading contours, singly or dropped
over other data layers. - Input the results to the GIS and use them in condition with other data.
kriging and conditional simulation:
- Kriging yields the best (linear, unbiased) estimates at unsampled locations, but
interpolated surfaces is smoother than the data. - Conditional simulation creates a random field with the same variance–covariance
structure of the data. Created surface passes through the data. At unsampled locations,
conditional simulation does not yield the best estimate.