Topic 8 Flashcards
what is a histogram
graph showing the frequency of measurements/obeservations plotted against the range of observations
an important data exploration and summary tool
explain modality and symmetry
median - 50% higher and 50% lower
skew… what way is the tail facing
symmetrical distribution
mode, median, mode are coincident
modality
when there are more than one value with a high frequency
greatly impacts the use of median and mean measures
sample mean vs population mean
divided by the sample number or entire population
we use deviation from mean ro deviation from the median
median is more popular because it doesnt get so easily affected by outliers
standard deviation is important with data transformations T/F?
true
what is data normalization
raw totals (Numerator) are standardized against a denominator
min-max scaling
comparing something to make it comprehensible
standardization (z-score normalization)
types of normalization
denominator is standard deviation
max is 1
min is 0
goes to 0
min-max scaling
important for rasters
range of data
min and max values
does not go to 0 only to the min value
how do you know when you need to standardize your data?
know your data before normalizing it. Normalizing unrelated data is like mixing apples and oragnes. It makes fruit salad, not a good analysis
not all variables need to be normalized
results can be proportions or percentages
data classification considerations
grouping of numerical data into classes for mapping, with each class represented by an individual symbol
class interval: where to put breaks in the data
number of intervals : 4-7
describe equal intervals
equal intervals or steps along the number line
determine data range
not very good
susceptible to outliers
describe quantiles
each class contains the same number of observations/values
easy tp understand
describe mean standard deviation
derive classes from the descriptive statistics of overall data distribution
worst method
maximum breakes (defined interval)
derive classes from groups of similar data values according to local citerion
calculation of classes order data from low to high
use largest differences as class breaks
can be good
susceptible to outliers
you dont see contrast
natural breaks
subjective, visual/manual determination of logical breaks in data distribution in dispersion graph or histogram
depends on what you want to highlight
geoeetrical intervals
class breaks are based on a geoetric series
good for highly skewed data
good for computers
optimal (fisher-jenks)
computational approaches to mimimizing classification error
most common method
indentifies low points in data
rating of classification methods major points
quantiles is only good for ordinal data
optimal is only good one for helping assist with selecting number of classes
enumeration and spatial fallacies
areal aggregation
census tracts
best when units are similar sizes
MAUP
depending on the geometry of spatial organization impacts your outcome and how it will look on the map
change the area = change results
jenks (optimal) tends to stay away from using mean as a central measure T/F?
True
what is multivariate mapping
encoding two or more variables into the symbolization
trade off between the information content and the complexitiy of the map
two main groups
inter-symbol encoding
intra-symbol encoding
bivariate choropleth maps
bivariate normalization (value by alpha)
what is inter-symbol encoding
symbolize 2 symbols concurrently (complimentary symbols)
what is intra-symbol coding
multiple visual variables in one symbol
combination of size and hue could be applied
selection of colour for classed maps
kind of data
colour vision impairment
simultaneous contrast
colour associations - cognitive
aesthetics
purpose - exploration vs presentation
cost of production
uni polar vs bipolar data characteristics
unipolar - sequential
lightness usually preferred - varying saturation cna enhance visual contrast
bipolar - diverging
two hues diverge from a common light hue or grey
representing uncertainty
accuracy and precision
uncertainty - difference between the real geographic phenomena and the users understanding of the phenomena
ex. inaccuracy of inerpolated maps
completeness of census data
nature of data collection - standardized methods or not
grouping techniques
intrinsic-extrinsic
coincident-adjacent
intrinsic - extrinsic
intrinsic = vary existing object
extrinsic = use new object
coincident-adjacent
coincident = shown in same frame
adjacent = small multiple
static-dynmic
dynamic = animations or interactivity
colour schemes and classified maps
map readers process colour by seeing differences in hue, stauration, and value
mapping different “things”
saturation (changing lightness)
hues ( best used to identify map featueres and differentiate)
value (lightness )
Look up tables
LUTs are used in lots of ways
data structures that map values of attributes to something else
colour scale can be lenghtened by adding saturation
keep hue constant but vary saturation and lightness
look up tables = enhancement
what type of data is pseudo colour table used on
unclassed data
what is an anthrom?
human controlled ecosystem
3 type of histogram graphs with LUTs
linear stretch (min-max enhancements)
exponential stretch
logarithmic stretch
are stretch and enhancement synonymous terms?
yes
what method is best used with contrast stretch
standard deviation