Data Quality & Uncertainty Flashcards
Importance of Data Quality
- Automatic tendency to regard outputs as a form of truth
- How reliable are the results/output
- Liability issues
What are the 4 components of data quality?
1) Accuracy
2) Precision
3) Error
4) Uncertainty
Liability
If not done correctly, could cause problems later
- Ex. wrong datum caused arrests to be thrown out of court because the boundary people crossed was not placed in the correct spot
Data quality: Accuracy
- How close does the data match the true values or descriptions
- True for both spatial and attribute
How do we account for data quality?
- usually best when personally collected
- scale, who, what, why
- What was the data intended for and can it be used for another purpose
- Spatially looks like it should an where it should be
- On target but maybe not clustered
Data quality: Precision
- Scale
- How Exact the data are measured (map sheet vs. lat/long vs. UTM meters
- Higher level of precision from map sheet to meters
- Worst case: Precise data that is inaccurate
Data quality: Error
- How far the data are actually from their true values
- Always present to some extent but does not fatally undermine GIS use
- Use statistics to determine if data can or cannot be used based on type/size of error (size–> distance from true?)
What are the 3 types of Error?
- Gross
- Systematic
- Random
Gross Error
Incredibly inaccurate
- easy to identify
Systematic Error
Exact same on every piece of data
- X, Y accidentally set as Y, X
- Can be fixed/accounted for
Random Error
Not easy to find
- Could be one data point or attribute incorrectly entered (10.21 entered as 102.1)
Data quality: Uncertainty
Doubt due to incompete knowledge (someone else collected) (this is why metadata is essential)
- Many issues in GIS have uncertainty underpinning them
- Prevalent in processes/transformations
- Model behaviour (know how the model works, not just what it does, link to describe process and math involved i.e. white papers)
Sources of Uncertainty
- Measurement
- GIS representation
- Reporting Numbers (lines for roads but what is the width)
Data Collection
Quality control at the 1st step
Data Input
Resolution when digitizing
- Boundaries (edge of forest by ownership vs edge of trees and how many trees dictate a forest?)
Stages for Accounting for Data Quality
Real world - Inherent Uncertainty
Conception - Uncertainty in Conception
Measurement - Uncertainty in measurements
Analysis - Uncertainty in Analysis
- Acceptable values fall within/under a curve to deal with variability (ex. double breast height of a tree measured by different foresters can be ok if it falls within acceptabe values)
Data quality: Positional Accuracy RMSE
- Square root of the average of the squared discrepancies in position (d) of well-defined points (n) determined from the map and compared to higher accuracy surveyed location of each point
- Calculates image from true difference and is scale dependent
Fuzzy Sets
Defined by degree of membership
- Venn diagrams, set theory, and/or –> SQL
- probability (%) that something belongs to that category
- S-Curve (Venn diagram that acknowledges uncertainty)
- Can have partial membership in a set with yes, no, and maybe
Uncertainty
Degree to which the measured value is estimated to vary from the true value
- Arise from a variety of sources including limitation on precision or accuracy of measuring system
- Often used to describe degree of accuracy of measurement
Why would you choose a point, line, or polygon for the data?
Based on the purpose for the data
S- Curve
Venn diagram with the uncertainty acknowledged
Advantages of fuzzy sets
- Acknowledging uncertainty upfront
- Membership can be adjusted if more info becomes available
What is the drawback?
People! (numbers from feelings)
- Probability values can reflect the way individuals state how they feel
- i.e. one person can state 90% surety while another with the same confidence can state 99% surety
- But which is it?
G.C.S
Geographic Corrdinate System
- ex. Lat/Long