14 - Data exploration and classification Flashcards
The power of GIS resides in
its ability to process spatial data
and its associated attributes providing answers and solutions to
real life spatial issues
Data exploration def
the process of examining data prior to formal structured data analysis
Spatial data exploration
the process of examining the
attribute data of spatial features prior to a formal structured
data analysis
data classification tool is based on what?
descriptive stats
How is data exploration different to statistics?
- Both spatial and attribute data involved
- Media used in GIS is maps, graphs and tables
Data visualization involves?
- Rendering
- Manipulation
Rendering def
what to show in a graphic plot & what type of plot to make
Manipulation def
how to operate on individual plots and how to organise multiple plots.
Organise graphs so that easy to interpret
3 NB tasks when exploring data
- Find patterns
- Pose queries (characteristics and subsets)
- Make comparisons
Inferential stats def
generalizing from a sample to a population with calculated degree of certainty
Descriptive stats def
are statistics which provide a statistical summary of a dataset - measures of central tendency or summary stats
Descriptive stats (4)
- Measures of central tendency
- Measures of dispersion
- Skewness
- Kurtosis
Measures of dispersion
- Standard deviation
- Variance
- Z score
- Range
- Standard difference
A low standard deviation indicates
the data points tend to be very close to the mean
high standard deviation indicates
the data points are spread out over a large range of values
Standard dev def
Shows how much variation or dispersion” exists from the average
Variance
Measure of how far a set of numbers is spread out (Standard
deviation squared)
Standard score (z score) def
The standardized or z score informs how many standard deviations a
reading is above or below the mean
Classification def
the process of reducing a large number of individual quantitative values to a smaller number of ordered categories, each of which comprises a portion of the original data value range
Classification types
- Natural breaks
- Equal interval classes
- Geometric interval
- Mean and standard deviation
- Quantile
- User defined
Fundamental principle of classification
– ALWAYS mutually exclusive AND exhaustive
– Monochrome - 5-7 classes
– multi hue - 9 or less
Considerations for classification
- Available symbols
- Communication goal
- Complexity of spatial pattern
Quantitative precision
- larger no. of classes
- represent a small range
– too much info
– indistinct symbols
Immediate graphic impact
- small no. of classes
- graphically clear, imprecise quanti
– overs simplification
– class may have varying data values
Natural breaks def
method seeks to reduce the variance within classes and
maximize the variance between classes
Why is equal intervals not good for rectangular data distri?
- Since each bin will have an equal width, but the uniform distribution ensures that each bin will have approximately the same number of data points, the resulting visualization might misleadingly suggest that the data is more evenly spread than it actually is.
- In reality, the uniform distribution means each data point is equally likely, but equal interval binning does not convey any particular areas of interest within the data.
Why is user defined intervals not good for skewed data?
Many classes will be empty and not
mapped
Why is mean & std dev not good for skewed data?
For skewed data, the mean is pulled toward the skew, and the standard deviation may not accurately reflect the dispersion around the central values, resulting in misleading class boundaries