Data Quality & Uncertainty Flashcards
Importance of Data Quality
- Automatic tendency to regard outputs as a form of truth
- How reliable are the results/output
- Liability issues
What are the 4 components of data quality?
1) Accuracy
2) Precision
3) Error
4) Uncertainty
Liability
If not done correctly, could cause problems later
- Ex. wrong datum caused arrests to be thrown out of court because the boundary people crossed was not placed in the correct spot
Data quality: Accuracy
- How close does the data match the true values or descriptions
- True for both spatial and attribute
How do we account for data quality?
- usually best when personally collected
- scale, who, what, why
- What was the data intended for and can it be used for another purpose
- Spatially looks like it should an where it should be
- On target but maybe not clustered
Data quality: Precision
- Scale
- How Exact the data are measured (map sheet vs. lat/long vs. UTM meters
- Higher level of precision from map sheet to meters
- Worst case: Precise data that is inaccurate
Data quality: Error
- How far the data are actually from their true values
- Always present to some extent but does not fatally undermine GIS use
- Use statistics to determine if data can or cannot be used based on type/size of error (size–> distance from true?)
What are the 3 types of Error?
- Gross
- Systematic
- Random
Gross Error
Incredibly inaccurate
- easy to identify
Systematic Error
Exact same on every piece of data
- X, Y accidentally set as Y, X
- Can be fixed/accounted for
Random Error
Not easy to find
- Could be one data point or attribute incorrectly entered (10.21 entered as 102.1)
Data quality: Uncertainty
Doubt due to incompete knowledge (someone else collected) (this is why metadata is essential)
- Many issues in GIS have uncertainty underpinning them
- Prevalent in processes/transformations
- Model behaviour (know how the model works, not just what it does, link to describe process and math involved i.e. white papers)
Sources of Uncertainty
- Measurement
- GIS representation
- Reporting Numbers (lines for roads but what is the width)
Data Collection
Quality control at the 1st step
Data Input
Resolution when digitizing
- Boundaries (edge of forest by ownership vs edge of trees and how many trees dictate a forest?)