Block 3 - Data input Flashcards

1
Q

What is the difference/relationship between automatic digitizing and scanning.

A

Both are methods to convert analogue data into a digital form. But the result of scanning is a scanned image which is still a raw data format. The result of digitizing is a thematic vector layer which is immediately ready for analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the preferred way to digitize a height contour map: scanning or manual/automatic line digitizing? Motivate your answer.

A

The height contour map is a line map because the physical feature ‘elevation’ is recorded in lines. Therefore the preferred method is line digitizing, you can add to each entity (line element) an elevation value. A scanned line map is more like a picture. We can see the elevation lines, but the GIS has no idea and the information is not usable in analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A researcher wants to develop a land cover map of a floodplain nature area. The researcher decides to base the map on a digital aerial photograph because this digital image can be easily captured in a GIS database. Is this a clever decision?

A
Yes, because the aerial image contains (still in uninterpreted format) information which can be digitized. The scanned image can subsequently be the subject of a supervised or unsupervised classification routine, resulting in an (interpreted) land use map. 
The alternative (manual digitizing) is not impossible, but seems less efficient. The digitizer needs to be able to interpret the different types of land use from a photo which is very laborious, error-prone and costly.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the advantage of using a data compaction technique over generalization?

A

Generalization can be defined as “simplification of detail”. It is discussed in Heywood, I. et al. (2011) on pages 42-43 and on pages 161-163. In general, the rule applies “the smaller the scale, the more generalization (because a map with a small scale can contain less detail)”.

Data compaction techniques are used to store data as efficient as possible. It is discussed in the Heywood, I. et al. (2011) on page 81 (last line) and Box 3.2 (page 82-83).

Both techniques save storage space but the advantage of data compaction over generalisation is that the data itself is not altered. When generalisation is applied, the data loses detail. For example, the amount of points in a digitized line can be diminished by weeding out points. The shape becomes more general, taking up less storage space. This can be useful when maps of different scale have to be combined. The larger scale map (having more detail) can be generalised to match the smaller scale map (Heywood, I. et al. (2011), page 162).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why can it be difficult to join satellite images taken at different times of the day?

A

During the day the angle of the sun or cloud cover changes. Because of this the recorded reflections differ between the images, this gives straight lines where the edges meet. Especially when the combined image is classified on the basis of the pixel colour, the edges can produce artificial differences (Heywood, I. et al. (2011), page 164 end of first column).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Give an example of rubber sheeting.

A

When a map is rubber sheeted, the map is stretched as a rubber sheet to match the coordinates of elements on the map to a reference measurement. The correct coordinates of map elements such as cross-roads or houses, can be measured using a GPS during a field campaign.

You practised rubber sheeting during the Task 2.3 Georeferencing” of Block 2 - Understanding GIS. You had to identify and link reference points on an aerial image and on a topographic map. Once sufficient points had been connected the georeferencing button was activated and the aerial image was transformed to fit the topographic map. This transformation process is comparable to rubber sheeting. The aerial image is the rubber sheet that is stretched and the topographic map is the reference point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain briefly the meaning of data capture

A

The input of raw data in a GIS system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain briefly the meaning of editing/ cleaning

A

Improving the data by weeding out unnecessary elements, correcting errors in vector data (table 5.2; fig 5.3), radiometric correction of satellite or aerial images, filtering data (fig 5.4).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain briefly the meaning of re-projection

A

changing the projection (or coordinate) system of a map (mostly to match other datasets or to comply to standards used in countries).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain briefly the meaning of generalization.

A

Simplifying the data set by weeding out vector point of lines or polygons for example.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain briefly the meaning of edge matching.

A

When edge matching is applied, ‘tiles’ of a map are ‘glued’ together. This means that roads, rivers or other elements that run from one tile to another should continue without disruption/displacement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain briefly the meaning of layering

A

Combining maps in layers within a GIS. Actually, this is automatically achieved when opening different maps subsequently in a GIS program.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A researcher wants to set up a GIS database to study the relationship between the presence of roads and the abundance of wildlife in the Province of Gelderland. The researcher has three information sources:

  1. a paper nature area map of Staatsbosbeheer covering the eastern part of the Netherlands;
  2. a digital road map of the province of Gelderland from Rijkswaterstaat;
  3. wildlife abundance data for the different nature areas in the Netherlands collected by non-governmental nature organisations.

Specify at least four activities that should be performed by the researcher to develop a sound GIS database.

A
  1. First the paper map has to be scannned and
  2. georeferenced to match the digital road map (data capture, re-projection)
  3. Then the nature areas can be digitized as polygons (data capture, editing)
  4. The wildlife abundance data can added to the nature area polygons attributes (data capture).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define remote sensing in your own words.

A
  • To capture data for GIS of remote objects.
  • To sense objects without making contact.
  • To gather information about objects without making contact.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why are aerial images so useful in GIS?

A

They contain a lot of (uninterpreted) information. The scale of aerial images can be chosen to suit the need. They are cheap and can be made at anytime the weather permits it (Heywood, I. et al., 2011, page 59).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which two factors are important to consider when you want to use aerial photograph information in GIS? Does this also apply to satellite images?

A

There are multiple (more than two) factors to consider when using aerial photograph information:

  • The interpretation is influenced by the time of day and the time of year. By changing the time of day, shadows can vary and may obscure elements which need reconnaissance. The time of year is important when mapping vegetation, the colour varies within the season. Sometime, when mapping infrastructure, the winter is chosen for capturing the earth surface. In the winter, roads are not obscured by foliage. It does apply to satellite images as well.
  • The angle under which the image was taken (e.g. vertical versus oblique photographs). Does also apply to satellite images, but to a lesser extent.
  • The resolution determines the level of detail in the photo. Also applies to satellite images.
  • The scale (which varies within the photo with the distance from the centre; at least for vertical images). Also applies to satellite images, but to a lesser extent.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Explain why GPS is a useful tool in the data stream.

A

A GPS can provide coordinates of ground elements which are recognisable on aerial images and therefore can be used to geo-reference that image. Also the GPS can be useful in field studies to pinpoint vegetation-quadrants for example.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the blocking effect?

A

Parts of the electromagnetic spectrum are filtered by the atmosphere and radiation of the earth surface in those wave-lengths cannot be captured by the sensors of satellites, i.e. the radiation is blocked.

19
Q

Which scanners are not susceptible to cloud cover?

A

An important property of the long wavelengths used in the microwave region is that they are not susceptible to atmospheric scattering. As a result they can penetrate through cloud cover, haze and all but the heaviest rainfall. Consequently, active scanners that emit and sense long wavelengths are not susceptible to cloud cover. Examples include active scanners, like radar or laser emitting and receiving scanners.

20
Q

Imagine you have 2 photographic films. One is susceptible to radiation of 0.4 µm and the other is susceptible to radiation of 1.6 µm. Which one would you choose to discern water, sand and vegetation? Motivate your answer.

A

The reflectance of the 3 types is similar at 0.4 µm, therefore the brightness on the photographic film is similar and difficult to interpret.
At 1.6 µm the differences are much larger and therefore much more interpretable: water absorbs all radiation, it doesn’t reflect and appears as black on the image; sand has about 60% reflection and will appear very bright; while vegetation will appear intermediate, grey in case of a black and white film.

21
Q

Some sensors have a small band around (or just after) 0.8 µm, for example the CEASAR-CCD MSS. Why do you think this band is inserted? Hint: look at the spectral signatures in Figure B.7 of the syllabus (paragraph B2.5 Object - Radiation Interaction).

A

This band is inserted in the range of high reflection of vegetation, and in combination with other bands (like 1.6 µm) is very useful to sense if vegetation is present or not. Furthermore (not directly deductible from the figure), different vegetation types can be discerned because their reflection differs in this region (for example between pine trees (lower reflection) and deciduous trees (higher reflection).

22
Q

What is classification?

A

To categorise pixels of an image into land cover classes or themes (paragraph 7.7 in Lillesand, T.M. and Kiefer, R.W., 1994).

23
Q

In this course we focus on spectral pattern recognition. What is the difference between spatial pattern recognition and spectral pattern recognition?

A

In order to classify pixels into the correct classes, we use the spectral characteristics of objects (land cover classes like water). The characteristics are captured by a sensor in reflections of several bands (wavelengths). The pattern of the reflections in different bands is the spectral reflectance pattern (see the patterns of water, sand, forest, … in figure 7.37 of Lillesand, T.M. and Kiefer, R.W., 1994). Typical for spectral pattern recognition is that the reflection pattern of each individual pixel forms the basis of the classification. Relationships between pixels are not considered.

A spatial pattern is the categorisation of pixels according to their location and spatial relationship with neighbouring pixels. Feature size and shape are aspects a human can recognize easily, and contribute to our interpretation of an object. For example, a long line of water is not a lake or a square of concrete with coloured dots is likely to be a parking space.

24
Q

Explain the difference between supervised and unsupervised classification in your own words. Use the term training areas in the explanation.

A

In supervised classification, the definition of the spectral patterns (= interpretation) precedes the classification of pixels. The spectral patterns of the classes are defined by the interpreter. Around known land cover classes, so called training areas are delineated. The spectral patterns of these training areas are used to classify unknown pixels based on their spectral reflection pattern.

In an unsupervised classification, classification precedes the interpretation. First, an algorithm is applied which categorises the spectral patterns in classes. Subsequently, the interpreter has to assign a real world interpretation to the automatically assigned classes.
Page 533 (paragraph 7.7) in Lillesand, T.M. and Kiefer, R.W. (1994)
25
Q

In the training stage of supervised classification, the image analyst collects spectral response patternsof classes he or she wants to map, e.g. water, sand, forest, etc. Why are these spectral response patterns needed?

A

The spectral response patterns are used to compare the spectral pattern of an unclassified pixel and with the typical spectral response pattern of the classes that are to be mapped (signatures). In this way, unknown pixels can be classified based on their resemblance with these signatures.

26
Q

What are the basic steps in supervised classification? Which step is different in unsupervised classification?

A

Stages of supervised classification:

  • Training stage
  • Classification stage
  • Output stage.

In unsupervised classification there is no training stage. It starts with the classification stage (i.e. choosing a classification algorithm and defining criteria for classification such as the number of classes or class thresholds). Signatures are generated automatically in unsupervised classification.

27
Q

What do variance and co-variance mean according to the text in paragraph 7.9 of Lillesand, T.M. and Kiefer, R.W., 1994?

A

Variance is the spread in the recorded (reflection) values for a certain band. In the graphs, the variance of pixel observations is visualized as the range over the band-axis over which the observations are distributed.

Co-variance is the relationship in variance between two parameters. For example, higher reflection values of one band may for example correlate with higher reflection values of another band (positive correlation). In the graphs, co-variance is visualized as a cloud which has a direction, i.e. is elongated as an ellipse. This means the DNs values of the bands are correlated in some manner.

28
Q

Pixels outside the decision region in the parallelepiped classifier are classified as unknown. Why are there no unknown pixels when the Gaussian maximum likelihood classifier is applied?

A

In the Gaussian Maximum Likelihood classifier, for each class the probability is calculated that the pixel belongs to this class. Each pixel is subsequently assigned to the class with the highest probability. In general, the rule is that the pixel is assigned to the class with highest probability, how small it might be. But, unknown pixels can exist when the analyst decides to use a probability threshold, i.e. a threshold below which pixels are not classified due to a lack of confidence.

29
Q

What is a cluster?

A

A cluster is a naturally occurring grouping in the data, i.e. in a multidimensional attribute space. Within an unsupervised classification method, it is often determined statistically.

30
Q

As a first step in the iso-data clustering technique, the clusters are defined along a 45 degree line. The user only gives the number of classes he/she wants to end with. What is the next step which is performed?

A

Of the sample points the distance (using the Pythagorean theorem) to the cluster mean centre points are calculated. The sample points are labelled according to the closest cluster type. The sample points having the same label form the new cluster.

31
Q

Explain when the iteration stops and the process of clustering is finished.

A

The iteration stops after a user defined number of iterations or when less than 2 percent of the sample points shifts from one cluster to another.

32
Q

What is a fuzzy boundary? Think of an example.

A

A fuzzy boundary is a boundary which is hard to define or recognise on the aerial photo’s or in the field. For example the border of the sea with land: due to high and low tides, the border is shifting constantly. In contrast, the edge of a corn field is a clear hard boundary.

33
Q

What are sliver polygons?

A

Sliver polygons are digitising errors. They are small left over polygons between two not very well connected large polygons, see figure 10.15 in Heywood, I. et al. (2011).

34
Q

What is the difference between accuracy and precision? Give examples of both in a GIS context.

A

Accuracy is the extent to which an estimated data value approaches its true value. If a GIS database is accurate, it is a true representation of reality.

Precision is the recorded level of detail in the data. A co-ordinate in metres to the nearest 12 decimal places is more precise than one specified to the nearest 3 decimal places.

35
Q

What is applicability?

A

The dataset contains the right information to solve a particular problem. The analysis tools can be combined usefully with the data. For example, the elevation maps elevation values are stored in a numeric format, not in a text format

36
Q

What is bias?

A

A systematic variation of data from reality. For example caused by an consistent truncation of decimal places from data values.

37
Q

Define compatibility?

A

Compatible datasets can be used together. Combining them makes sense because they cover the same area, have the same coordinate set, are of similar scale and use compatible attributes. Also, they are stored in a compatible data form, so they can be loaded within the same GIS.

38
Q

Define completeness.

A

A complete set covers the whole area, the whole time period in its entirety. Every entity needed is included in the map.

39
Q

Define consistency

A

If a map is produced consistently, the map has the same characteristics in his whole extent: entities are named in the same way, using the same legend, the way of producing ensures that mapping errors are of similar magnitude across the map, etc. Consistency between maps ensures that maps will be more compatible.

40
Q

Conceptual error is related to how we perceive reality. How does conceptual error affect GIS?

A

Different analysts or scientific disciplines perceive reality in a different way. This affects the way a GIS dataset is set up. The abstract form of reality will be slightly different, this is called the conceptual error.

41
Q

What are the main sources of error in GIS data input, database creation and data processing?

A
  • source data
  • data encoding
  • data editing and conversion
  • data processing and analysis.
42
Q

The modifiable areal unit problem (MAUP) is commonly cited as a difficulty faced when attempting to overlay data layers in a GIS. What is the MAUP, and why does it cause problems for data integration?

A

The spatial units in which data is mapped can represent for example people living in the same zip-code region. The distribution of these people within the zip-code region is not known and might be not uniformal. So, if this layer is combined with other data having smaller units, the combination might be not compatible unless uniformity of distribution of people is assumed. (Which is probably not the truth).

43
Q

What measures can be taken to provide information about error in GIS?

A

The process has to be written down, what base data was used, of which scale, which (hopefully standardised) method was used, what is the classification error, year of production, etc. This kind of description is called metadata, and has to be provided together with the map.

In Heywood, I. et al. (2011) a registration of the process is called data lineage. It contains a record of data history from source to present format.

44
Q

Discuss some of the ways in which errors in GIS data can be avoided.

A

Be consistent