Module 10 Flashcards
interpolation
interpolation is filling in data points between the data
you already have
• eg, regression analysis and trendlines only apply
to the data set (from xmin to xmax); temperature is
measured only at weather stations, can we
estimate temperature between the weather
stations
extrapolation
extrapolation is filling in data points beyond the data
that you have
• eg, using regression analysis to predict values
beyond the scale of the observations; estimating
temperature beyond the network of weather
stations
• extrapolation methods assume that the world
outside the data behaves the same or similar to
the world inside the data
IDW
• inverse distance weighting estimates a value of each
location by taking the distance
-weighted average of
the values of known points in its neighbourhood
• the closer a known point is to the location being
estimated, the more influence or weight it has in the
averaging process (ie, each known point has a local
influence that diminished with distance)
Tobler’s First Law of Geography
The First Law of Geography, according to Waldo Tobler, is “everything is related to everything else, but near things are more related than distant things.”
IDW importance of the Power
• the power parameter determines the significance of the known
points on the interpolated value
• a higher power (eg, > 2) puts more emphasis on the nearby points
and produces a more varying and less smooth surface
• a lower power (eg, < 2) gives more influence to the distant points,
resulting in a smoother surface
• neighbourhood size can be defined by the radius of a circle, or by the number of known points – in general, the ______ the neighbourhood the smoother the interpolated surface since the averaging procedure incorporates more of the actual data
larger
Primary Features of an IDW result
.the surface passes through the sample points
• the interpolated values are always within the range of the measured values of
known points and will never be beyond the maximum and minimum values of the
known points
Natural Neighbor
the natural neighbour interpolation method estimates the value of an unknown location
by finding the closest subset of known points to the location being estimated, then
applying weights to them based on proportionate areas
• each polygon contains 1 known point, and any unknown point within a given polygon is closer to the polygons known point than to any other known point contained in other polygons
• this technique originated as a method to generate rainfall estimates, and has since spread throughout spatial science
• a new polygon is created around the given unknown point, which also adjusts the surrounding polygons but maintains the basic proximity rules
• only the known points belonging to polygons that have been adjusted will be included in the subset of points for interpolation, and the weight applied to each known point is proportional to the amount of overlap between the new polygon and the original polygons
Trend Surface Interpolation: 3 types
- a trend surface interpolation fits a smooth surface defined by a polynomial function to a set of known points, then uses the polynomial function to estimate the values of unknown locations
- the trend surface is analogous to a least-squares regression equation – use a subset of points to define the relationship, then predict the z value of each point in the sample area
- like regression analysis, there is a prediction error (the residual) at each point – for the trend interpolation,
1st Order Polynomial:Planar Surface(flat)
2nd Order Polynomial:Quadratic(some degree of curve)
3rd Order Polynomial: Cubic Surface(very curvy)
trend surfaces are also an effective tool for smoothing the data – much like a filter, the trend surface removes high and low values and reveals the underlying spatial trend of the dataset
• orders 1 – 4 are most commonly used (ArcGIS allows up to 12th -order) - it is difficult to justify that some natural phenomenon behaves as an 8th-order polynomial, so it is best to avoid these cases
• trend surface interpolation is highly susceptible to extreme outliers (just like regression analysis), so examining the dataset beforehand and objectively removing the outliers is important
______ order polynomial equations need many data points to produce the surface, so a
bigger dataset is needed-for trend surface interpolation
higher
Spline
Estimates values at unknown locations using a mathematical function that minimizes overall surface curvature
-while there are several different types of spline functions, the most commonly used in GIS are thin-plate splines, which produce a surface that passes exactly through the known points while ensuring the surface is as smooth as possible
- both regularized splines and splines with tension create smooth, gradually changing surfaces with estimated values that may lie outside the range of the maximum and minimum values for the known points
- regularized splines run into significant problems by estimating steep gradients in data-poor regions – these are known as overshoots; in general, when t > 0.5 there are a greater number of overshoots
- splines with tension allow the user to control the tension to be applied at the edges of the surface as a method of reducing overshoots
while there are several different types of spline functions, the most commonly used in GIS are _____ splines, which produce a surface that passes exactly through the known points while ensuring the surface is as smooth as possible
thin-plate
Kriging
• kriging is a geostatistical method for spatial interpolation that is similar to IDW in that it
estimates the value of a variable at a location by computing a weighted average of the
known z values in its neighbourhood;however, the weights in kriging are dependent on the spatial variability in the values of the known points
• kriging assumes that in most cases spatial variations observed in environmental
phenomena (eg, variations in soil qualities, changes in the grade of ores) are random
but spatially correlated, and the data values characterizing such phenomena conform
to Tobler’s first law of geography – ie, spatial autocorrelation
• the exact nature of spatial autocorrelation varies from dataset to dataset, and each
set of data has its own unique function of variability and distance between known
points, which can ultimately be represented by the semivariogram
Semivariogram
a semivariogram is a graph of the semi variance on the y-axis and the distance between known points (the lag) on the x-axis
in order to estimate the semivariance at any given
distance, the data points are fitted with a continuous curve called a semivariogram modelnthere are several different models, each designed to fit different types of phenomena and having different effects on the estimation of the unknown values, especially for nearby
points
Kriging: range
the range represents the maximum
distance between points where spatial autocorrelation occurs -small ranges indicate that data values change more rapidly over space
-the range is used in kriging for defining the size of the
neighbourhood so that spatially correlated known points are selected for interpolation
Kriging: the sill
the sill represents the semivariance at the range value, and is typically the same as the variance of the whole
dataset theoretically, at lag = 0, semivariance =
0, but most natural phenomena exhibit a nugget effect, where semivariance > 0 at lag = 0
the nugget value represents a degree of randomness attributed to measurement error and/or spatial variations that occur at scales smaller than the
sampling scale
2 main forms of kriging used by ARCGIS
• Ordinary Kriging (for random data): assumes that there is no trend in the data and that the mean of the dataset is unknown – the weights are derived by solving a system of linear equations
which minimize the expected variance of the data values
• Universal Kriging (for trending data): assumes that there is an overriding trend in the data in addition to spatial autocorrelation among the known points, and this trend can be modeled by a
polynomial function
ordinary kriging is for _______(trending/random) data
random
- in the use of kriging, more known points will produce a more accurate ________ model, and a more accurate interpolated surface
- kriging also produces as additional output a map of , _____ ______which can be interpreted as showing where the interpolated surface is most, or least, accurate
semivariogram
standard errors
- anything white or light gray is a poor extrapolation
- dark colour gray/black means good extrapolation
TOR F
kriging produces a surface which
passes through the known points and
the interpolated values are bound by the maximum and minimum of the known data
F the interpolated values are NOT bound by the max and min
every spatial interpolation method involves errors –
-An interpolated surface is a mathematical approximation of a continuous surface
• the accuracy of an interpolated surface is often evaluated through cross-validation,
which evaluates the performance of the surface in 2 steps:
- it removes each known point one at a time and estimates its value based on the
remaining known points using the chosen interpolation method - then it compares the observed and estimated values to calculate estimation errors
(eg, standard error, or standardized RMSE)
in addition to errors inherent in the interpolation method, there are other common
sources of error in spatial interpolation(2)
- data uncertainty in sample data mainly results from too few known points, limited or clustered distributions of known points, and uncertainty about locations and/or values of known points
• in general, more known points=accurate interpolation , but clustered points yield less information than evenly spread out points - edge effects refer to distortions of the interpolated values near the boundary of the study area due to the lack of sample data outside the area
• in fact, near the edges of method is extrapolating, not interpolating
• edge effects can be minimized by collecting data from outside the study site, include them in the interpolation, then clip them out afterwards
exploratory spatial data analysis
is the process of applying spatial statistical methods and tools to investigate spatial data in order to detect and quantify patterns in the data and to establish spatial associations between a given set of environmental events or phenomena
▪ in most circumstances, objectively random sampling methods are preferred because they are best for reducing bias and are most likely to lead to sample
representativeness of the population