Thematic Mapping: Choropleths Flashcards
Chorochromatic map
- Nominal Data (Name)
- Simple presence/absence
- Map areas differentiated by colour/shading/pattern
- No meaning behind symbol choice
What is the breakdown of the word Chorochromatic?
Choros: Area
Chroma: Colour
What can we use thematic mapping of areas for?
- Compare and contrast areas
- Relationship between areas
- Quantitative and qualitative data
- Data that has a range where the class boundaries influence the map message
Why do we choose scientific and statistical class boundaries?
- So that they are defendable and not arbitrary (Tells lies!)
What happens when you change the classes in a range?
Changes the map message!
What are some colour guidelines for a chorochromatic map?
- Remove colour or change colour and map message doesn’t really change
- Don’t use shades of the same colour because that could imply degrees of value
What is the breakdown of the word Choropleth?
Choro: Area
Pleth: Value
What is used to map quantitative areal data?
Choropleth maps
What is used to map Qualitative/Nominal Data?
Chorochromatic maps
Choropleth Map
Data value is mapped and colour/symbolization has meaning
- Magnitude of data is proportional to a cartographic attribute (colour, shade, texture)
- Shades of one colour where dark=high and light=low values
- Colour gives message, Legend gives value
What are the 2 types of choropleth map?
- Classless
- Range-Graded
Classless choropleth map
- Spectrum of colour with no classes
- Each number is directly attached to it’s own unique colour on spectrum
What are the problems with a Classless choropleth map?
- Not trusted
- Numbers on spectrum are invisible
- Can’t determine message with ease
- Numbers are infinitely available/ divisible
- Can mitigate with boundaries behind the colour but that negates some of the purpose
What is the potential purpose of using a Classless choropleth map?
- Maybe useful to remove distracting boundaries when there are many polygons
- Removing boundaries helps reduce psychological effect of large polygons seeming to have more influence
- But without boundaries value is impossible to determine for audience
Range-Graded Choropleth
Data grouped into classes
Why does Dent argue that Classless Choropleths aren’t really a choropleth?
Because of the loss of polygon boundaries
What is the key to a successful range-graded choropleth?
Successful class boundaries
What are the 6 steps in Choropleth Mapping?
1) Judge data suitability
2) Order/Rank data
3) Classless or Range-graded
4) Determine # of classes
5) Determine class intervals
6) Determine Symbology (visual clues to differentiate between classes)
Choropleth mapping: Data suitability
- Must have data everywhere (continuous)
- Must be area
- Must be derived not absolute
- No missing or unreported data
Derived vs. Absolute data
Absolute: Theoretical situation of equal areas (data independent of area size)
Derived: Changed or calculated from absolute (per, %, avg, etc.)
Example of Derived vs. Absolute data
Absolute number of dog licenses is not related to number of people and can artificially show an area as having more dog ownership.
Take percentages of licenses per population of area to derive data and give appropriate message of percentage dog ownership
Choropleth mapping: Ordering Data
- minimum to maximum
Choropleth mapping: Choosing Classless or Range-Graded
Range-Graded is the convention
- Classless is complicated and difficult to interpret
Choropleth mapping: When do you increase the number of Classes?
- Smarter audience
- Technology and colour availability (vs. pattern fill = less classes)
- More enumeration units
- Regular distribution (Irregular = fewer classes)
Choropleth mapping: Rules of thumb for number of classes
- 4-6 is good
- 3 is too simplistic (maybe)
- 7 becomes complicated (maybe)
- 2 is too few (bivariate)
- 12 is too many (for most audiences)
Choropleth mapping: When might it be possible to use 2 classes?
When doing an above/below map with one variable
Choropleth mapping: Fewer vs. more classes
Fewer: Louder message
More: Message may get confused
What are the 4 basic and most used class boundary types?
1) Natural Breaks (Jenks)
2) Quantile
3) Equal Interval
4) Standard Deviation
Class Boundaries: Natural Breaks (Jenks)
- Based on natural clusters inherent in data
- Group similar values to maximize differences between classes
Disadvantage: Based on how many classes are chosen and may end up breaking clusters
Class Boundaries: Quantile
- Classes contain equal number of features irrespective of distribution
- Designed to distribute data
- Designed for uniformly distributed data
- Used more often than they should be as data is often nonuniform
Class Boundaries: Quantile Disadvantages
Disadvantages: features with wildly different values can be put in same class even though they are more related to another class - Map can be misleading graphically
Class Boundaries: Equal Interval
- Classes based on equal sized subranges
- You specify # of intervals and breaks
- Ex. 0-10, >10-20 etc.
Class Boundaries: Standard Deviation
- Classes based on how much the values differ from a mean using standard deviations from the mean
- Normally Distributed Data!
- Tough for most audiences, best for statisticians
- Used for above/below maps (about the mean)
- Based on mean not the actual attributes themselves
What does changing the class boundary type do to the map?
Changes the message, sometimes drastically, even when # of classes remains the same
Choropleth Mapping: Primary decision Criteria for class boundary choice
- Maximize variation between classes
- Minimize variance within classes
- Highlight critical values?
How is class boundary optimization measured?
With a Goodness Variance Fit test (GVF)
SDDBG
Part of GVF
= Sum of Squared Deviation Between Groups (classes)
SSDTotal
Part of GVF
= Total Sum of Squared Deviation
SSDTotal = Sum of (each attribute - mean)^2
GVF equation
GVF = SSDBG/SSDTotal
GVF
- Helps determine if the chosen classification system matches the data
- Use for each classification type to find highest GVF
- Use for justification, it helps decision but doesn’t make the decision
Why do we look at data distribution when choosing a class boundary and interpreting GVF?
- Standard Deviation requires normal distribution
- If data is not normal and even if it has highest GVF, the standard deviation still cannot be chosen
How to derive SSDBG
SSDTotal = SSDBG + SSDWG
–>Then SSDBG = SSDTotal - SSDWG
SSDWG
Sum of Squared Deviation Within Groups
SSDWG = (for each class) the Sum of (each attribute in class - mean of class)^2
What is optimal GVF?
GVF = 1
Classification technique closest to 1 minimizes within group variation and maximizes between group variation
What are the steps and why do we organize data?
- Organize to visualize data to see distribution
Step 1) Rank Order
Step 2) Draw a histogram
Step 3) Draw graphic array (scatterplot)
What are the 4 constant step methods?
- Common difference
- Quantiles
- Normal Distribution
- Nested Means
What are the 2 Systematically Unequal step methods
- Arithmetic Series
- Geometric Series
- Basically nature of how curve changes
What are the 2 Irregular step methods?
- Graphic Method (Natural Breaks)
- Iterative Method (Jenks Optimization)
- To see where changes occur
How is the Common Difference
(CD) Calculated?
- Calculate Range (Highest - Lowest)
- Calculate CD (Range/# of observations
- Upper Class Limit = Lowest + 1 * CD, Lowest + 2*CD, etc until # reaches # of classes
- Data is evenly distributed
How are Quantiles Calculated?
- Step 1) # data values/# classes
- Step 2) Make adjustments for odd number results
- Step 3) Separate classes based on number of observation sin each class
What are advantages of Quantile Classification?
- No Empty classes
- Class Limits are computed manually
- Same Percentage of observations in each class
- Discussion of mapped data more simplistic
- Good for Ordinal data (by definition this data is ranked
Disadvantages of Quantile Classification?
- Fails to consider data distribution, ranking can change distribution
- Values far apart can end up in same class
- Identical values on class breaks
- Odd numbers will need to adjust for which class will have more numbers
- Creates gaps (Highest in one class is not lowest in net class
How does Normal Distribution Classification work?
- Calculate mean and standard deviations and use as class breaks
- Use when data are normally distributed
- Even # of classes
- No more than 6 classes (+/- 3 std. dev.)
- Class boundaries are arrayed around a central value instead of ascending from lowest value
What are some disadvantages of Normal Distribution classification?
- Data must be normal
- Even glasses only
- Can confuse audience with what one standard deviation represents
- Class boundaries are arrayed around a central value instead of ascending from lowest value
How does Nested Means Classification work?
- Calculate mean and use to divide data into two classes
- Take mean of the two classes above and below mean and use to create new classes
- Normally distributed data
- Can only have even # of classes
- Can only have number of classes to power of 2, 4, 8, 16 etc.
What are some disadvantages to using Nested Means?
- Only for normally distributed data
- Can only have even # of classed to powers of 2, 4, 8, 16, etc.
- Mean can end up as one of the values and and cause uneven # of classes
How do Systematically Unequal Stepped class limits work?
- X = (Hightest - Lowest)/Progression value sum
- Lowest + Pregression value * X
- Can do Ascending or Descending
- Use where frequency of data is skewed in one direction
What is does positive skewness relate to?
- Mean > Median
- Use ascending arithmetic (1, 2, 3, 4, etc) or geometric (1, 2, 4, 8, etc.)
What does negative skewness mean?
- Mean
What is the progression for Geometric Series?
- 2^0, 2^1, 2^2, 2^3, etc
- L + 1x +2x+4x+8x = H
How does the Irregular Step Method (natural breaks) work?
- Best for irregular data
- Finds major gaps or groupings, or sharp breaks in histogram slope
- Take difference of neighbouring data points and see where largest differences occur to find class breaks
How are visualization symbols selected in choropleth mapping?
- Symbols have meaning related to magnitude
- Symbols must be selected to carefully communicate proper relationship and magnitude
- Usually a shading pattern or colour