Data Quality & Uncertainty Flashcards

1
Q

Importance of Data Quality

A
  • Automatic tendency to regard outputs as a form of truth
  • How reliable are the results/output
  • Liability issues
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 4 components of data quality?

A

1) Accuracy
2) Precision
3) Error
4) Uncertainty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Liability

A

If not done correctly, could cause problems later
- Ex. wrong datum caused arrests to be thrown out of court because the boundary people crossed was not placed in the correct spot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data quality: Accuracy

A
  • How close does the data match the true values or descriptions
  • True for both spatial and attribute
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do we account for data quality?

A
  • usually best when personally collected
  • scale, who, what, why
  • What was the data intended for and can it be used for another purpose
  • Spatially looks like it should an where it should be
  • On target but maybe not clustered
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data quality: Precision

A
  • Scale
  • How Exact the data are measured (map sheet vs. lat/long vs. UTM meters
  • Higher level of precision from map sheet to meters
  • Worst case: Precise data that is inaccurate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data quality: Error

A
  • How far the data are actually from their true values
  • Always present to some extent but does not fatally undermine GIS use
  • Use statistics to determine if data can or cannot be used based on type/size of error (size–> distance from true?)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the 3 types of Error?

A
  • Gross
  • Systematic
  • Random
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Gross Error

A

Incredibly inaccurate

- easy to identify

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Systematic Error

A

Exact same on every piece of data

  • X, Y accidentally set as Y, X
  • Can be fixed/accounted for
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Random Error

A

Not easy to find

- Could be one data point or attribute incorrectly entered (10.21 entered as 102.1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data quality: Uncertainty

A

Doubt due to incompete knowledge (someone else collected) (this is why metadata is essential)

  • Many issues in GIS have uncertainty underpinning them
  • Prevalent in processes/transformations
  • Model behaviour (know how the model works, not just what it does, link to describe process and math involved i.e. white papers)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Sources of Uncertainty

A
  • Measurement
  • GIS representation
  • Reporting Numbers (lines for roads but what is the width)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Data Collection

A

Quality control at the 1st step

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data Input

A

Resolution when digitizing

- Boundaries (edge of forest by ownership vs edge of trees and how many trees dictate a forest?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Stages for Accounting for Data Quality

A

Real world - Inherent Uncertainty
Conception - Uncertainty in Conception
Measurement - Uncertainty in measurements
Analysis - Uncertainty in Analysis
- Acceptable values fall within/under a curve to deal with variability (ex. double breast height of a tree measured by different foresters can be ok if it falls within acceptabe values)

17
Q

Data quality: Positional Accuracy RMSE

A
  • Square root of the average of the squared discrepancies in position (d) of well-defined points (n) determined from the map and compared to higher accuracy surveyed location of each point
  • Calculates image from true difference and is scale dependent
18
Q

Fuzzy Sets

A

Defined by degree of membership

  • Venn diagrams, set theory, and/or –> SQL
  • probability (%) that something belongs to that category
  • S-Curve (Venn diagram that acknowledges uncertainty)
  • Can have partial membership in a set with yes, no, and maybe
19
Q

Uncertainty

A

Degree to which the measured value is estimated to vary from the true value

  • Arise from a variety of sources including limitation on precision or accuracy of measuring system
  • Often used to describe degree of accuracy of measurement
20
Q

Why would you choose a point, line, or polygon for the data?

A

Based on the purpose for the data

21
Q

S- Curve

A

Venn diagram with the uncertainty acknowledged

22
Q

Advantages of fuzzy sets

A
  • Acknowledging uncertainty upfront

- Membership can be adjusted if more info becomes available

23
Q

What is the drawback?

A

People! (numbers from feelings)

  • Probability values can reflect the way individuals state how they feel
  • i.e. one person can state 90% surety while another with the same confidence can state 99% surety
  • But which is it?
24
Q

G.C.S

A

Geographic Corrdinate System

- ex. Lat/Long

25
Q

MAUP

A

Modifiable Areal Unit Problem

  • Classic source of error
  • Especially when aggregating spatial units
  • Data can be the same but aggregated differently
  • Choices of areal units tent to be dominated by what is available rather than what is best (Ex. Crime in within an administrative boundary an be 5% but in a concentrated area of the boundary it can be up to 20% of properties broken into)
26
Q

Choropleth & MAUP

A

Choropleth will show that an entire polygon is 18% but really only a portion of it holds the majority

27
Q

What is a possible fix to account for MAUP?

A

Metadata!

  • can’t fix but can account for issue
  • Embrace Uncertainty!
  • There will always be issues when you did not personally collect the data so justify with metadata, field checks, and GIS checks
28
Q

Some basic principle for dealing with uncertainty in GIS

A
  • Use many sources to prevent building of error from same datasets
  • Metadata
  • Checks (Field & GIS)
  • Appropriateness of reporting results ( Be careful how you phrase results with causes, may or could
29
Q

More thoughts on dealing with uncertainty

A
  • Embrace uncertainty!
  • Try multiple data sources to prevent error boiling
  • Be honest in communicating what you know about the accuracy
  • Report what you believe to be true, not what the GIS appears to be saying
  • GIS has been slow to deal with or treat errors