Lecture 9 - Data Acquisition & Quality Flashcards

1
Q

Why is data important?What does it determine?

A

Data is the foundation of a GIS installation

Data determines:

  • what you can analyze (issues/problems)
  • where your analysis focuses (study area)
  • types of analysis (forestry vs. emergency services dispatch)
  • quality of your analysis (how much do you trust your data?)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is data quality so important?

A

Access to high quality, geo-referenced data difficult to find

  • historical data collection often has no locational dimension
  • problems with converting existing paper maps
  • digital data may exist but confidentiality, liability, or commercial interests may disallow broad use

In past, data construction often said to consume up to 3/4 of a GIS project’s budget and time

  • labour-intensive
  • error-prone
  • can become an end in itself
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 2 broad data capture methods for GIS?

A
  • primary (direct measurement; from scratch)

- secondary (indirect derivation; existing data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the primary sources for data collection for raster and vector?

A

Raster:

  • digital remote sensing images
  • digital aerial photos

Vector:

  • GPS measurements
  • survey measurements
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the secondary sources for data collection for raster and vector?

A

Raster:

  • scanned maps
  • DEMs from maps

Vector:

  • topographic surveys
  • toponymy data sets from atlases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the stages in data collection projects?

A
  1. planning
  2. preparation
  3. digitizing/transfer
  4. editing/improvement
  5. evaluation
    repeat
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Where do you get data from?

A
Secondary:
- already published/available
- special tabulation/contract
Administrative records (data as a by-product):
- within organization
- other organizations
Primary data:
- developed in-house (DIY)
- contracted out
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain primary data capture

A
  • capture specifically for GIS use
  • raster (RS)
  • resolution is key consideration (spatial, spectral, temporal)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain vector primary data capture

A

Surveying:
- location of objects determined by angle and distance measurements from known locations
- uses expensive field equipment and crews
- most accurate method for large scale, small areas
GPS:
- collection of satellites used to fix locations on Earth’s urface
- differential GPS used to improve accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain secondary geographic data capture

A
  • Data collected for other purposes can be converted for use in GIS
  • raster conversion: scanning of maps, photos, etc.; important scanning parameters are spatial and spectral (bit depth) resolution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain vector secondary data capture

A
  • collection of vector objects from maps, photos, plans, etc.
  • digitizing: manual (using digitizer) or vectorization (on screen digitizing by putting vector over raster)
  • photogrammetry: science and technology of making measurement from photographs
  • COGO: coordinate geography
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the typical data entry issues?

A

Typology of human errors in digitizing

  • undershoots/overshoots
  • invalid polygons
  • sliver polygons

Error induced by data cleaning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why does it matter whether you choose vector or raster?

A

Representational models filter reality differently so choosing one over the other will impact analysis results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can we measure the accuracy of nominal data (ex. vegetation cover map)?

A

Confusion matrix: compares recorded classes (observations) with classes obtained by some more accurate process or from a more accurate source (reference)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do confusion matrices work?

A
  • rows of table correspond to land use class of each parcel as recorded in database
  • columns to class as recorded in field
  • numbers appearing on principal diagonal of the table (top left to bottom right) reflect correct classification
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do we measure uncertainty in interval/ratio data?

A

Errors distort measurements by small amounts

  • accuracy: refers to amount of distortion from true value
  • precision: refers to variation among repeated measurements and also refers to amount of detail in reporting of a measurement
17
Q

How do we measure accuracy (equation)?

A

Root mean square error (RMSE) is square root of the average squared errors

  • primary measure of accuracy in map accuracy standards and GIS databases
  • abundance of errors of diff magnitudes often closely follow normal distribution
18
Q

What is ecological fallacy uncertainty in analysis?

A
  • many possible grouping and regrouping strategies for a set of individually measured data
  • erroneous (incorrect) assumption that an overall characteristic of a zone is also a characteristic of any location or individual within the zone
  • ex. if you know the average was 77%, you can’t just assume everyone received 77%
19
Q

What is a modifiable areal unit problem?

A

It is endemic to all spatially aggregated data
Consists of 2 interrelated parts:
1. uncertainty about what constitutes objects of spatial study (identified as scale and aggregation problem)
2. implications for methods of analysis commonly applied to zonal data (choosing zonal units)

  • number, sizes, and shapes of zones affect the results of analysis
  • many ways to combine small zones into big ones
  • no objective criteria for choosing one over another
20
Q

What are typical data issues?

A
  • access to geographic data
  • many sources of digital data (govt, businesses)
  • problems: users may be unaware of available data; difficult to determine if data are suitable for user’s applications; format (scale, projection conversion, translators between formats)
21
Q

What is the role of data standards?

A

Distributing GIS data relies on adoption of common standards

  • to allow various components to operate together, such standards have been developed by various national/intl bodies
  • ex. Dublin Core and federal Geographic Data Committee Metadata (FGDC)
22
Q

What is metadata?

A

Data that describes the data

  • info on the procedures used to collect and compile data (ex. purpose, measurement scale, accuracy of instruments, etc.)
  • allows users to evaluate suitability of dataset for application
  • often not available b/c too costly to data providers, so often leads to misinterpretation and false expectation of accuracy
23
Q

What is a geolibrary?

A

A digital library containing georeferenced info

- its search mechanism uses geographic location as primary key

24
Q

What is a geoportal?

A

A digital library of geographic data and GIServices

- one stop shop for GIS info

25
Q

What are other potential data issues?

A
  • ownership and cost recovery
  • copyright issues
  • legal liability
  • freedom of info and privacy
  • geometric incompatibility
  • database updating and quality control
26
Q

What are technology issues often faced?

A
  1. integration of GIS into database management systems
    - represent vector and raster data
    - integrate spatial and tabular data
  2. competing claims from software vendors
  3. evaluation of technology
    - hardware and software
    - benchmarking
27
Q

What is newer technology/gadgets that has enhanced the accessibility of GIS?

A
  • handheld/wearable devices make it increasingly possible to obtain GIS services
  • ex. cell phones can now be used to generate maps
28
Q

Why is GIS education important?

A

Diversity of GIS users

  • viewers: occasional browsers of spatial data (mainly need data access)
  • general users: wide range of disciplines (need understanding of capabilities and limitations of GIS)
  • GIS specialists: technical support staff who make GIS work (shortage of skilled specialists; requires GIS, database, and programming skills)