Data exploration and classification Flashcards
lecture 14
what is data exploration?
the process of examining data prior to formal structured data analysis
What is data classification in Acr GIS?
the data classification tool is a tool which can be used to explore spatial data and is based on descriptive stats.
What does data exploration include in GIS?
- In GIS it involves both spatial and attribute data (how & where?)
- Media used in GIS includes maps (spatial), graphs, and tables.
What is the crime rate like in Gauteng and where is the highest crime found in this province?
projected on a map with stats
What does data visualisation invlove?
- Rendering – what to show in a graphic plot & what type of plot to
make - Manipulation – how to operate on individual plots and how to
organise multiple plots
What are the fundamental tasks for data exploration?
- Finding patterns
- Posing queries, i.e. exploring data characteristics and data subsets
- Making comparisons, i.e. between variables or data subsets
Q – Which portion of my field produces the highest / lowest yield
Q2 – Why do certain portions of my land produce higher yields?
Q – Which areas of Tanzania are most suitable for growing Pinotage?
Q – How does wildfire susceptibility vary across a nature reserve ?
Q – What is the groundwater recharge potential of the Winelands municipality
Q – How does deforestation rates vary across the Peruvian Amazon?
spatial data exploration statistics?
can be:
Descriptive
Inferential
What are descriptive statistics?
Statistics that provide a statistical summary of a dataset (summary statistic)
1. Measures of central tendency - Describes data by identifying central position.
2. Measures of dispersion .
3. Skewness
4. Kurtosis
What are inferential statistics?
generalizing from a sample to a population with a calculated degree of certainty.
drawing conclusions.
What are measures of central tendency?
Median, mode, mean
What are measures of dispersion?
Look at the statistical spread or
distribution of a dataset.
Include:
1. Standard deviation / Standaard afwyking
2. Variance/ Variansie
3. Standardised score (z score)
Observe the spread of or trends in
data - can be used to identify outliers.
What is the standard deviation?
Shows how much variation or “dispersion” exists from the average.
What is the variance?
Measure of how far a set of numbers is spread out.
What is the standard score (z score)
The standardized or z score informs how many standard deviations a
reading is above or below the mean.
What is classification?
the process of reducing a large number of individual quantitative values to a smaller number of ordered categories, each of which comprises a portion of the original data value range.
what are the different types of classification?
Each classification type divides the data value range in different
ways and are used for the classification of interval and ratio
data (mostly):
1. Natural breaks
2. Equal interval classes
3. User defined
4. Quantiles
5. Mean and Standard
Deviation
6. Geometric Interval
What is the fundamental principle of classification?
- Each of the original (un-classed) data values must fall into only one of the classes
- None of the original data values falls into more than one class
- Always mutually exclusive & exhaustive (if they cannot both be true).
Deciding the number of classes:
Rules of thumb:
* Monochrome color schemes: No more than 5 to 7 classes.
* Multi-hue map: No more than 9
Need to consider:
* Communication goal?
* Complexity of Spatial Pattern
* Available Symbol Types
What is quantitative precision?
Communication goal:
* Use larger number of class intervals.
* Each class will represent a relatively small range of the original data values and will therefore represent those values more
precisely.
Trade offs:
* Too much information
* Indistinct symbols
What is immediate graphic impact?
Communication goal:
* Use smaller number of class intervals.
* Each class will be graphically clear, but will be imprecise quantitatively.
Trade offs:
* Potential for oversimplification
* One class may include wildly
* varying data values
What is Jenks natural breaks?
- The Natural Jenks is the default classification method in ArcGIS
- Minimum variation in value within classes.
- Maximum variation in value between classes.
- The method seeks to reduce the variance within classes and maximize the variance between classes.
What are the advantages of natural breaks?
- Maximizes the similarity of values within each class
- Increases the precision of the map given the number of
classes
what are the disadvantages of natural breaks?
- Class breaks often look random
- Need to explain the method
- Method will be difficult to grasp for those lacking a background in statistical methods.
What is equal interval classification?
- Each class represents an equal portion of original data range.
- Also called equal size or equal width classification
Calculation:
1. Determine range of original values {Range = Max – Min}
2. Decide Number of classes, {N}
3. Calculate class width:
{CW = Range / N}