lecture 9 - spatial analysis and statistics Flashcards
what is spatial data?
something you can prescribe an x and a y coordinate to e.g. a map, image
vector data
points, lines and areas
raster data
regular grids of cells e.g pixels
what order do the axis go in the (cartesian) coordinate system?
x then y
x,y
what additional axis is sometimes added in datasets such as images?
z axis
how do you calculate the distance between points on a diagonal?
use Pythagoras’ theorem
what do trigonometry and Pythagoras allow us to do?
work out the dimensions of individual pixels
Pythagoras works out distances on images
trigonometry works out widths
speed =
distance / time
what does a cluster analysis look for?
spatial patterns
spatial patterns in the cluster analysis can be described as?
- dispersed
- clustered
- random
define dispersed
closest to uniform
define random
any visually apparent cluster is due to chance
define clustered
pattern or grouping
what method is used for testing uniform distribution?
chi squared
in a 3D histogram, the flatter the graph looks =
the more uniform the distribution is
what method is used for testing for random distribution?
Chi squared - compared with the poisson distribution
*if chi squared statistic is lower than critical value, data follows a poisson distribution and is therefore random
what is the poisson distribution used for?
to calculate the probability of a spacing at a given distance (or time interval/area/value)
what method is used to test for clustered distribution?
nearest - neighbour criterion
*to reject null hypothesis, z value has to be >1.96 OR
what methods can be used to test for similarities between clusters?
- hierarchal clustering
- nearest - neighbour clustering
- k - means clustering
- gaussian mixture clustering
hierarchal clustering
- dendrogram
- assumes that each individual data point is a cluster
- based on distance between data points they are grouped again, and so on
- can easily pick the level of clustering you would like
gaussian mixture clustering
- detects peaks in the datapoint concentration
* drawback of this method is = have to define the number of clusters you have to begin with