Lecture 9 - Data Acquisition & Quality Flashcards
Why is data important?What does it determine?
Data is the foundation of a GIS installation
Data determines:
- what you can analyze (issues/problems)
- where your analysis focuses (study area)
- types of analysis (forestry vs. emergency services dispatch)
- quality of your analysis (how much do you trust your data?)
Why is data quality so important?
Access to high quality, geo-referenced data difficult to find
- historical data collection often has no locational dimension
- problems with converting existing paper maps
- digital data may exist but confidentiality, liability, or commercial interests may disallow broad use
In past, data construction often said to consume up to 3/4 of a GIS project’s budget and time
- labour-intensive
- error-prone
- can become an end in itself
What are the 2 broad data capture methods for GIS?
- primary (direct measurement; from scratch)
- secondary (indirect derivation; existing data)
What are the primary sources for data collection for raster and vector?
Raster:
- digital remote sensing images
- digital aerial photos
Vector:
- GPS measurements
- survey measurements
What are the secondary sources for data collection for raster and vector?
Raster:
- scanned maps
- DEMs from maps
Vector:
- topographic surveys
- toponymy data sets from atlases
What are the stages in data collection projects?
- planning
- preparation
- digitizing/transfer
- editing/improvement
- evaluation
repeat
Where do you get data from?
Secondary: - already published/available - special tabulation/contract Administrative records (data as a by-product): - within organization - other organizations Primary data: - developed in-house (DIY) - contracted out
Explain primary data capture
- capture specifically for GIS use
- raster (RS)
- resolution is key consideration (spatial, spectral, temporal)
Explain vector primary data capture
Surveying:
- location of objects determined by angle and distance measurements from known locations
- uses expensive field equipment and crews
- most accurate method for large scale, small areas
GPS:
- collection of satellites used to fix locations on Earth’s urface
- differential GPS used to improve accuracy
Explain secondary geographic data capture
- Data collected for other purposes can be converted for use in GIS
- raster conversion: scanning of maps, photos, etc.; important scanning parameters are spatial and spectral (bit depth) resolution
Explain vector secondary data capture
- collection of vector objects from maps, photos, plans, etc.
- digitizing: manual (using digitizer) or vectorization (on screen digitizing by putting vector over raster)
- photogrammetry: science and technology of making measurement from photographs
- COGO: coordinate geography
What are the typical data entry issues?
Typology of human errors in digitizing
- undershoots/overshoots
- invalid polygons
- sliver polygons
Error induced by data cleaning
Why does it matter whether you choose vector or raster?
Representational models filter reality differently so choosing one over the other will impact analysis results
How can we measure the accuracy of nominal data (ex. vegetation cover map)?
Confusion matrix: compares recorded classes (observations) with classes obtained by some more accurate process or from a more accurate source (reference)
How do confusion matrices work?
- rows of table correspond to land use class of each parcel as recorded in database
- columns to class as recorded in field
- numbers appearing on principal diagonal of the table (top left to bottom right) reflect correct classification