OefenTT Flashcards
Explain the basic principle of kriging, including the role of the semivariogram in this interpolation technique and the term ‘regionalized variable’.
Answer should include :
* the contribution (or ‘weight’) of observed values to derive an estimate of the value at the
unvisited location is based on spatial correlation properties of the data set itself instead
of being presumed by some arbitrary function.
* Regionalized variable is variable which exhibits spatial correlation over ‘small’ distances,
but values become uncorrelated over ‘large’ distances. Kriging requires spatial correlation
to be present > so kriging requires variable of interest to be a regionalized variable
* The semivariogram describes the spatial correlation structure of the regionalized variable
> expressed in semivariance as a function of spatial lag, where semivariance is a measure
of average dissimilarity between data values at sample points that are a certain distance
apart (spatial lag)
* Use of semivariogram in kriging: needed to obtain the interpolation weights: distance
between data point and unvisited grid point is used to extract matching semivariance
value from semivariogram (e.g use sketch as used in lecture on kriging to illustrate) for
calculating interpolation weight for that data point, as are the distances between data
points to reduce the interpolation weights for spatially clustered data points.
Kriging and ‘inverse distance weighting’ are two techniques that can be used for interpolating between spatially distributed data. To estimate the value at an unvisited location these techniques provide weights to surrounding data points in different ways. Which are two essential differences between these approaches?
Name 2 of the 3 points below:
1. The way in which weights are assigned to observed values to obtain a value at an unvisited location: kriging uses interpolations weight that are based on spatial correlation in the data set, as expressed in the semivariogram, while IDW uses inverse distance to determine interpolation weights, where distance has a user
picked value for power p (1/Dp. Distance (D) refers to the distance between location of observed value and unvisited location of unvisited where a value has to be derived through interpolation
2. Kriging automatically corrects for spatial clustering in observation points, by reducing the weights of observed values that are spatially clustered, whereas IDW has no such correction
3. In kriging an error map is created (kriging variance map), in IDW only error statistic for the interpolated map as a whole can be created
Zie figuur 2.1
What is, according to the two empirical semivariograms, a characteristic property of this data set? What terminology is generally used to refer to this property?
- property: the maximum distance over which spatial correlation exist (the range) depends on direction.
- Terminology: referred to as geometric anisotropy,
When fitting the theoretical models a fundamental error has been made. Which one? Explain your answer.
Fundamental error: nugget value is not the same for the two directional semivariograms
* Explanation: this is an error because the nugget value has no directional component as nugget is semivariance at spatial lag =0, it is a property reflecting measurement error (such as instrument accuracy, small scale variability of the variable that cannot be captured by the measurement method) > so nugget should have the same value for both
directional semivariograms
What can be learned from the ‘kriging variance’ map ?
- Kriging variance > measure for uncertainty in the estimate of an interpolated value
- Kriging variance map > the map shows spatial variation in this interpolation uncertainty
Does the ‘kriging variance’ map also apply to the final copper concentration map or to the interpolated residuals only? Explain your answer
Answer should include
* Kriging variance only applies to interpolated residuals,
* the trend surface itself also has an uncertainty associated with it. An interpolation error map for the final copper concentration map (trend + residual) should include both uncertainties
Explain the basic principle of spectral analysis. In your answer include the type of data set that can be analysed by this technique and the typical properties of the data set that are extracted by this technique
Answer should include :
* Type of data set: interval or ratio scale of measurements, time series (or spatial series) with evenly spaced measurements/constant time interval between
measurements, no trend over time (or space in case of spatial series)
* Basic principle: (explained below for time series, but can also be spatial series)
o Transform a time series from time domain to frequency domain: translate time series to a summation of multiple sinusoidal functions with different frequency, amplitude and phase > achieved by Fourier transform of the time series
o Result presented in a power spectrum/spectral density spectrum: shows for each frequency the matching variance or energy density .
* Typical property: Reveals presence of periodicity in the time series, and at which frequencies this occurs (or wave periods) and how dominantly present these periodicities are (height of spectral peak)
Explain the meaning of the following two terms :
· ‘Aliasing’ (clarify your explanation using an example sketch)
· ‘Spectral leakage’
‘Aliasing’ (clarify your explanation using an example sketch):
Answer should include:
- Aliasing is artefact in spectrum: a spectral peak shows up at a frequency where it is
not present in reality
- in reality the peak is present at a frequency beyond the Nyquist frequency
- artefact occurs due to a too coarse sampling interval compared to highest frequency occurring in reality
- sketch: see lecture slides > an apparent long wave appears from measurements (longer than the true wave period) when sampling with a time step that is longer than the wave period of the oscillation
-
* ‘Spectral leakage’:
Answer should include:
- Spectral leakage is an artefact resulting from the estimation procedure of spectrum
- It results from Fourier transforming a finite length time series
o fourier transform of finite series also involves Fourier transform of
rectangular window resulting in spectrum that is combination of true
spectrum and spectrum of rectangular window (convolution of the two
spectra) such that energy appears at frequencies that are neighbouring to the true freq.(so called ‘side lobes’>see lecture slides)
o Variance of the time series can only be mapped to frequencies that are
multiples of 1/T, where T is time series length. >so variance present at freq. that is not multiple of 1/T will appear in neighboring frequencies
When estimating a power spectrum from a time series one may apply ‘bin averaging’ as part of the estimation procedure.
· What is ‘bin averaging’ ?
· What is the reason for applying ‘bin averaging’?
· What is a disadvantage of applying ‘bin averaging’?
What is ‘bin averaging’ ?
Answer should include:
- bin averaging is part of calculation procedure of a power spectrum : take average of values of spectral estimates in a frequency bin containing several neighboring frequencies from the raw spectrum. (size bin chosen by the researcher)
- use this average value as best estimate of the spectral value for that frequency bin
* What is the reason for applying ‘bin averaging’?
- Decrease uncertainty in value of spectral estimate,
What is a disadvantage of applying ‘bin averaging’?
- loss of spectral resolution
Scientists have measured nearshore wave height along the west coast of France by measuring water level fluctuations (due to the passing waves) at a fixed location. Their measurement device measures the water level elevation at a frequency of 2 Hz (so two measurements per
second). They made their measurements during high tide over a total time span of 100 minutes. The energy density spectrum of this time series can be used to calculate the mean wave height during this time span, because the wave height is related to the amplitude of the
water level fluctuations relative to the mean water level.
What is the Nyquist frequency of the spectrum of the above described time series?
Explain how you derived your answer
f nyq = 1 Hz, (Note: use an appropriate unit! )
because
* f nyq =1/(2∆t) = 1/(2x0.5) = 1 s-1 = 1 Hz (Alternatively: 1/2∆t = ½ * 1/∆t = 0.5fsampling = 0.52 = 1 Hz)
* sampling freq.= fsampling =2 Hz = 2 measurements/s, so ∆t = 0.5 s
( Note on appropriate unit: ∆t = 0.5 s = (0.5/60) minute = (0.5/3600) hr etc., then f nyq has units of respectively s-1 (=Hz), min-1 , hr -1 , etc )
What is the fundamental frequency of the spectrum of the above described time series?
Explain how you derived your answer.
fund = 1/100 = 0.01 min -1 (use an appropriate unit!) (or when T in seconds: f fund = 1/6000 = 1,67 * 10-4 Hz )
* Because f fund = 1/T, where T=total time series length = 100 minutes
(or T= 100/60 hours = 100*60 s )
On which date are the waves the highest at this location? Explain your answer
Answer should include:
* highest waves on May 28
because
* maximum value of energy density is highest on May 28
* value of energy density is measure for the amplitude of the water level fluctuations,
hence the wave height, at a given frequency
Which other conclusion regarding differences in wave conditions between these two dates can be derived from these two spectra?
Answer should include
* On May 26 clear double peaked spectrum, while one broad spectrum for May 28 which means difference in wave height distribution over wave periods:
* On May 26 roughly two types of waves/wave fields, one type with wave periods around 10 seconds (roughly 8-11 s) and another type with wave periods less than about 6 s; while on May 28 one wave field with broad range of wave periods of less than 10 seconds
Explain the basic principle of principal component analysis. Use a data set consisting of only two variables to clarify your explanation, including a sketch that shows how these variables relate to the principal components,
Answer should include:
* Analysis method for a multivariate data set, which is a data set in which N ‘objects’ are described/characterized by multiple variables (you may also give an example of a multivariate data set: eg 50 groundwater wells, water in each well is characterized by several variables such as its pH, temperature, phosphate concentration and groundwater depth)
* Map a set of measured variables to a new set of variables that are linear combinations of the original variables > the new variables are the principal components (PC’s)
* Mapping of variables to PC’s is based on the correlation (or covariance) between the original variables
* At least the first PC (but generally several PC’s) describes more variance than any of the original variables
* based on the strength of the contribution of original variables to a PC (expressed as PC loading),a PC is generally given a physical meaning (e.g. water pollution from urban origin)
* The projection of individual samples/objects on the PC’s (indicated as PC scores) characterizes the sample/object in terms of this physical meaning (eg. how polluted the given water sample is in terms of urban pollution)
* make sketch as used in the lecture slides on PCA analysis: scatterplot of 2 original variables. PC’s (‘new axes’) are plotted in direction of maximum variance of this point cloud (PC1) and in orthogonal direction in direction of remaining variance (PC2). Refer to this sketch in meaningful way from the above points
Give two reasons for applying principal component analysis on a data set.
- Explore structure in a large data set
- Data reduction