14 - Data exploration and classification Flashcards by Raphaelle Moors

The power of GIS resides in

its ability to process spatial data
and its associated attributes providing answers and solutions to
real life spatial issues

How well did you know this?

Not at all

Perfectly

Data exploration def

the process of examining data prior to formal structured data analysis

How well did you know this?

Not at all

Perfectly

Spatial data exploration

the process of examining the
attribute data of spatial features prior to a formal structured
data analysis

How well did you know this?

Not at all

Perfectly

data classification tool is based on what?

descriptive stats

How well did you know this?

Not at all

Perfectly

How is data exploration different to statistics?

Both spatial and attribute data involved
Media used in GIS is maps, graphs and tables

How well did you know this?

Not at all

Perfectly

Data visualization involves?

Rendering
Manipulation

How well did you know this?

Not at all

Perfectly

Rendering def

what to show in a graphic plot & what type of plot to make

How well did you know this?

Not at all

Perfectly

Manipulation def

how to operate on individual plots and how to organise multiple plots.
Organise graphs so that easy to interpret

How well did you know this?

Not at all

Perfectly

3 NB tasks when exploring data

Find patterns
Pose queries (characteristics and subsets)
Make comparisons

How well did you know this?

Not at all

Perfectly

Inferential stats def

generalizing from a sample to a population with calculated degree of certainty

How well did you know this?

Not at all

Perfectly

Descriptive stats def

are statistics which provide a statistical summary of a dataset - measures of central tendency or summary stats

How well did you know this?

Not at all

Perfectly

Descriptive stats (4)

Measures of central tendency
Measures of dispersion
Skewness
Kurtosis

How well did you know this?

Not at all

Perfectly

Measures of dispersion

Standard deviation
Variance
Z score
Range
Standard difference

How well did you know this?

Not at all

Perfectly

A low standard deviation indicates

the data points tend to be very close to the mean

How well did you know this?

Not at all

Perfectly

high standard deviation indicates

the data points are spread out over a large range of values

How well did you know this?

Not at all

Perfectly

Standard dev def

Study These Flashcards

Shows how much variation or dispersion” exists from the average

Variance

Study These Flashcards

Measure of how far a set of numbers is spread out (Standard
deviation squared)

Standard score (z score) def

Study These Flashcards

The standardized or z score informs how many standard deviations a
reading is above or below the mean

Classification def

Study These Flashcards

the process of reducing a large number of individual quantitative values to a smaller number of ordered categories, each of which comprises a portion of the original data value range

Classification types

Study These Flashcards

Natural breaks
Equal interval classes
Geometric interval
Mean and standard deviation
Quantile
User defined

Fundamental principle of classification

Study These Flashcards

– ALWAYS mutually exclusive AND exhaustive
– Monochrome - 5-7 classes
– multi hue - 9 or less

Considerations for classification

Study These Flashcards

Available symbols
Communication goal
Complexity of spatial pattern

Quantitative precision

Study These Flashcards

larger no. of classes
represent a small range
– too much info
– indistinct symbols

Immediate graphic impact

Study These Flashcards

small no. of classes
graphically clear, imprecise quanti
– overs simplification
– class may have varying data values

Natural breaks def

method seeks to reduce the variance within classes and maximize the variance between classes

Why is equal intervals not good for rectangular data distri?

- Since each bin will have an equal width, but the uniform distribution ensures that each bin will have approximately the same number of data points, the resulting visualization might misleadingly suggest that the data is more evenly spread than it actually is. - In reality, the uniform distribution means each data point is equally likely, but equal interval binning does not convey any particular areas of interest within the data.

Why is user defined intervals not good for skewed data?

Many classes will be empty and not mapped

Why is mean & std dev not good for skewed data?

For skewed data, the mean is pulled toward the skew, and the standard deviation may not accurately reflect the dispersion around the central values, resulting in misleading class boundaries

14 - Data exploration and classification Flashcards

(29 cards)