Lecture 9 - Data Acquisition & Quality Flashcards
Why is data important?What does it determine?
Data is the foundation of a GIS installation
Data determines:
- what you can analyze (issues/problems)
- where your analysis focuses (study area)
- types of analysis (forestry vs. emergency services dispatch)
- quality of your analysis (how much do you trust your data?)
Why is data quality so important?
Access to high quality, geo-referenced data difficult to find
- historical data collection often has no locational dimension
- problems with converting existing paper maps
- digital data may exist but confidentiality, liability, or commercial interests may disallow broad use
In past, data construction often said to consume up to 3/4 of a GIS project’s budget and time
- labour-intensive
- error-prone
- can become an end in itself
What are the 2 broad data capture methods for GIS?
- primary (direct measurement; from scratch)
- secondary (indirect derivation; existing data)
What are the primary sources for data collection for raster and vector?
Raster:
- digital remote sensing images
- digital aerial photos
Vector:
- GPS measurements
- survey measurements
What are the secondary sources for data collection for raster and vector?
Raster:
- scanned maps
- DEMs from maps
Vector:
- topographic surveys
- toponymy data sets from atlases
What are the stages in data collection projects?
- planning
- preparation
- digitizing/transfer
- editing/improvement
- evaluation
repeat
Where do you get data from?
Secondary: - already published/available - special tabulation/contract Administrative records (data as a by-product): - within organization - other organizations Primary data: - developed in-house (DIY) - contracted out
Explain primary data capture
- capture specifically for GIS use
- raster (RS)
- resolution is key consideration (spatial, spectral, temporal)
Explain vector primary data capture
Surveying:
- location of objects determined by angle and distance measurements from known locations
- uses expensive field equipment and crews
- most accurate method for large scale, small areas
GPS:
- collection of satellites used to fix locations on Earth’s urface
- differential GPS used to improve accuracy
Explain secondary geographic data capture
- Data collected for other purposes can be converted for use in GIS
- raster conversion: scanning of maps, photos, etc.; important scanning parameters are spatial and spectral (bit depth) resolution
Explain vector secondary data capture
- collection of vector objects from maps, photos, plans, etc.
- digitizing: manual (using digitizer) or vectorization (on screen digitizing by putting vector over raster)
- photogrammetry: science and technology of making measurement from photographs
- COGO: coordinate geography
What are the typical data entry issues?
Typology of human errors in digitizing
- undershoots/overshoots
- invalid polygons
- sliver polygons
Error induced by data cleaning
Why does it matter whether you choose vector or raster?
Representational models filter reality differently so choosing one over the other will impact analysis results
How can we measure the accuracy of nominal data (ex. vegetation cover map)?
Confusion matrix: compares recorded classes (observations) with classes obtained by some more accurate process or from a more accurate source (reference)
How do confusion matrices work?
- rows of table correspond to land use class of each parcel as recorded in database
- columns to class as recorded in field
- numbers appearing on principal diagonal of the table (top left to bottom right) reflect correct classification
How do we measure uncertainty in interval/ratio data?
Errors distort measurements by small amounts
- accuracy: refers to amount of distortion from true value
- precision: refers to variation among repeated measurements and also refers to amount of detail in reporting of a measurement
How do we measure accuracy (equation)?
Root mean square error (RMSE) is square root of the average squared errors
- primary measure of accuracy in map accuracy standards and GIS databases
- abundance of errors of diff magnitudes often closely follow normal distribution
What is ecological fallacy uncertainty in analysis?
- many possible grouping and regrouping strategies for a set of individually measured data
- erroneous (incorrect) assumption that an overall characteristic of a zone is also a characteristic of any location or individual within the zone
- ex. if you know the average was 77%, you can’t just assume everyone received 77%
What is a modifiable areal unit problem?
It is endemic to all spatially aggregated data
Consists of 2 interrelated parts:
1. uncertainty about what constitutes objects of spatial study (identified as scale and aggregation problem)
2. implications for methods of analysis commonly applied to zonal data (choosing zonal units)
- number, sizes, and shapes of zones affect the results of analysis
- many ways to combine small zones into big ones
- no objective criteria for choosing one over another
What are typical data issues?
- access to geographic data
- many sources of digital data (govt, businesses)
- problems: users may be unaware of available data; difficult to determine if data are suitable for user’s applications; format (scale, projection conversion, translators between formats)
What is the role of data standards?
Distributing GIS data relies on adoption of common standards
- to allow various components to operate together, such standards have been developed by various national/intl bodies
- ex. Dublin Core and federal Geographic Data Committee Metadata (FGDC)
What is metadata?
Data that describes the data
- info on the procedures used to collect and compile data (ex. purpose, measurement scale, accuracy of instruments, etc.)
- allows users to evaluate suitability of dataset for application
- often not available b/c too costly to data providers, so often leads to misinterpretation and false expectation of accuracy
What is a geolibrary?
A digital library containing georeferenced info
- its search mechanism uses geographic location as primary key
What is a geoportal?
A digital library of geographic data and GIServices
- one stop shop for GIS info
What are other potential data issues?
- ownership and cost recovery
- copyright issues
- legal liability
- freedom of info and privacy
- geometric incompatibility
- database updating and quality control
What are technology issues often faced?
- integration of GIS into database management systems
- represent vector and raster data
- integrate spatial and tabular data - competing claims from software vendors
- evaluation of technology
- hardware and software
- benchmarking
What is newer technology/gadgets that has enhanced the accessibility of GIS?
- handheld/wearable devices make it increasingly possible to obtain GIS services
- ex. cell phones can now be used to generate maps
Why is GIS education important?
Diversity of GIS users
- viewers: occasional browsers of spatial data (mainly need data access)
- general users: wide range of disciplines (need understanding of capabilities and limitations of GIS)
- GIS specialists: technical support staff who make GIS work (shortage of skilled specialists; requires GIS, database, and programming skills)