GEOG364 Final Flashcards

1
Q

runs count

A

a one dimensional autocorrelation measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

joins count

A

a two dimensional autocorrelation measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

spatial autocorrelation generally explained

A

the correlation of a variable to itself through space

similarity in position vs similarity in attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

free sampling and example

A

the outcome is always random and not determined by previous results
example being flipping a coin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

non-free sampling and example

A

when the outcome is affected by the previous result

example being a card being picked from a deck. each card taken affects the probability of the next card

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

4 factors that can dramatically influence spatial autocorrelation results

A

a sample size smaller than 30
one category of values occurs in less than 20% of the data
the region is elongated and has few joins
there are a couple of features with many joins and some with very few

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

name a limitation of joins counts

A

it does not work for numeric data

numbers can be reclassed as “high/low,” but this throws away much information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are the two alternatives to use so for joins/counts to measure spatial autocorrelation

A

moran’s i

geary’s c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

in general, what does moran’s i and geary’s c measure?

A

they compare the differences in neighbors compared differences in values in the entire study area

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

in moran’s i or geary’s c what does it mean if the difference between neighboring features is less than between all other features

A

it would mean that the neighboring features could be considered clustered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

which spatial autocorrelation uses squared differences between adjacent cases

A

geary’s c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

which spatial autocorrelation measure uses a covariance term

A

moran’s i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

name two similarities between geary’s c and moran’s i FORMULAs

A

they both divide by total “w” to account for the number of pairs of cases
they both divide by a variance term in order to account for range of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

explain what -1, 0, and 1 would mean in a spatial autocorrelation analysis

A

it would mean you are using moran’s i
-1 means negative autocorrelation and the data is dispersed
0 means there is no autocorrelation and pattern is random
1 would mean positive autocorrelation and attributes are clustered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

explain what 0, 1, and 2 would mean in a spatial autocorrelation analysis

A

it would mean you are using geary’s c
0 means positive autocorrelation and values are clustered
1 means no autocorrelation with random values and no apparent pattern
2 means negative spatial autocorrelation with dispersed value (high-low)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

match the numbers of moran’s i to geary’s c

A
-1 = 2 = negative spatial auto
0 = 1 = no autocorrelation
1 = 0 = positive autocorrelation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what does the w represent in a spatial autocorrelation analysis?

A

the weight given to a measure to set adjeacency

for example, what distance/time/cost would make two features neighbors?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is the alternative method for etsting significance when etsting geary’s c or moran’s i?

A

the monte carlo simulation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what does monte carlo simulation do?

A

it generates a sample distribution for a given test statistic. this test statistic can then be used to assess significance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

global statistics

A

value summarizes a characteristic for an entire study region

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

why is it important to use measures of autocorrelation in a region?

A

spatial homogeneity does not exist over global regions/entire study area

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what do you call it when autocorrelation is low in one area of a region and high in another

A

spatial heterogeneity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

LISA

A

local indicators of spatial autocorrelation

local versions of geary’s c and moran’s i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what does LISA measure that is different than geary’s c or moran’s i?

A

LISA measures levels of particular clusters, not overall clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

what is the preffered tets of choice for local clustering measures

A

moran’s i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

name 4 objectives of a regression analysis

A

to determine whether a relationship exists
to describe the nature of the relationship mathematically
to assess the degree of accuracy with which the model represents the relationship
in the case of multiple regression, to understand the relative importance of individual independent vairables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

regression VS correlation

A

correlation provides us with the extent of a relationship between two variables
a regression analysis provides us with the nature of that relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

y in regression

A

the dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

x in regression

A

the independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

a and b in regression

A

the correlation coefficients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

e in regression

A

the random error or residual that the model does not account for

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

what is the line and what does it show in regression

A

the line is a statistical model that shows the expected mean value of y for each value x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

how do we create a regression line

A

by applying a least square criterion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

what does the least square criterion do

A

it chooses the line that minimizes the differences between the line it creates and the data points that are given

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

what are the 4 steps to a regression analysis

A
  1. specify independent and dependent variables
  2. use sample data to estimate a and b in the model
  3. estimate model error and check assumptions
  4. evaluate the statistical usefulness of the model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

can regression describe causality?

A

NO, it only helps describe the nature of the relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

what are the 4 assumptions made in a regression analysis

A
  1. mean error is 0
  2. variance of the error is constant across x values
  3. error is normally distributed
  4. no relationship exists between y and the residual/error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

is regression an extension of correlation or is correlation an extension of regression?

A

regression is an extension of correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

what does ANOVA measure

A

is measures the variance and overall significance of a regression model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

how does the size of residuals affect a regression model

A

smaller residuals mean that the line is a good fit and the model is accurate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

what is the range for r squared values

A

0-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

what does an r squared value of 0 or 1 mean

A

o means the line is excellent and there is no difference

1 means the line is horribly off and there is large differences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

what may r squared look like in a software output

A

ESS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

what does standard error of estimates show

A

it estimates the standard deviation of the errors/residuals
how close are the observed values to the line?
how many values fall within 95% of the value of the fitted line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

what is a regression model not good for?

A

estimating a value outside the range of observed value EXTRAPOLATION

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

what is the difference between multiple regression and simple regression

A

multiple regression uses multiple independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

name an example of a multiple regression

A

a linear trend surface

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

multicollinearity

A

an assumption in a multiple regression analysis

assuming that independent variables do not exhibit high correlation among each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

what is trend surface analysis an example of

A

how regression analysis can be applied to spatial problems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

what does ANOVA stand for

A

analysis of variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

what is a synonym of ANOVA

A

statistical analysis, but ANOVA goes over the top

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

what does ANOVA address?

A

different types of variance and then relates them to overall variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

how could you apply ANOVA to following regression

predicting plant growth by fertilizer application

A

you could additionally asks whether different types of fertilizer has varying effects on plant growth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

what is a name for two or more categorical predictor variables in ANOVA

A

factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

in terms of columns and rows what does ANOVA compare?

A

the difference between variables within one column to the overall variation between two different columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

name the 4 probability distributions

A

normal
z
t
f

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

what probability distribution does ANOVA use

A

f distribution

58
Q

what is the f statistic

A

a measure for the ratio of the first sample variance to the second sample variance

59
Q

what is a HUGE factor in determining variance values

A

sample size

60
Q

the larger the sample size, the _____ the sample variance

A

smaller

61
Q

how do we determine degrees of freedom

A

sample size minus 1

62
Q

when is ANOVA very useful

A

when predictor variables are categorical

gender, regions, beer labels

63
Q

what two groups is variance split into in an ANOVA distribution

A

within group variance and between group variance

INTRA AND INTER

64
Q

why are interaction effects important

A

think about the gender and beer goggles example. beer has a very different effect on different genders. but when looking at the beer goggles effect free of gender the results are very different

65
Q

what is the main advantage of using anova

A

it allows for individual studies to be replaced by one study that compares more factors

66
Q

what happens when your test statistics are not continuous, but categorical

A

use non parametric stats

67
Q

name 2 parameters

A

mean and standard deviation

68
Q

give three examples of non parametric data

A

raw counts
number of protected plants in a forest that are stable or declining
number of people receiving social security or not
number of crimes in spring VS summer vs Winter

69
Q

what is another way to describe the chi square test

A

a goodness of fit test

70
Q

what are two variables in the chi square formula

A

expected and observed variable

71
Q

what is the most popular non parametric test

A

chi square

72
Q

what is the arguable 5th scale of measurement?

A

cyclic
compass directions, months of year
what would the avg direction be between north and south?

73
Q

mean objective of descriptive stats

A

organization and summary of data

74
Q

what is the main difference between descriptive and inferential stats

A

inferential stats provide insights of a population on the basis of SAMPLES and test a hypothesis

75
Q

the three measures of central tendency

A

mean
mean
median

76
Q

the three measures of dispersion

A

range
iqr
variance and/or stand dev

77
Q

how do you find range of data

A

it is the difference between the highest and lowest valued observation

78
Q

what is the IQR

A

the difference between the first and third quartile

79
Q

variance

A

calculates how much each value differs from the mean

80
Q

what is stand dev

A

the square root of the variance

81
Q

what is first order variation

A

changes in observation in spatial autocorrelation are due to changes in local environment

82
Q

what is second order variation

A

variation in spatial autocorrelation is due to relationship with other attributes - not the environment itself

83
Q

ecological fallacy

A

confusing correlation for causation

84
Q

MAUP

A

changing classification, boundaries, or extent can change the display of the data

85
Q

non uniformity of space

A

coastal area may have more cases of the flue not because they are near the water, but because they also often have higher population density.

86
Q

edge effects

A

entities may only have a neighbor on one side. think of a crime map of mexico along the US border without US data

87
Q

what is the difference between euclidean and manhattan block distance

A

we can consider euclidean as the crow flies

manhattan must go around edges

88
Q

quantile classification

A

every class contains the same number of entities

89
Q

equal interval classification

A
dividing your data into equal intervals
the difference between the highest and lowest value in each class is the same
90
Q

advantage and disadvantage of natural breaks

A

advantage is that it is good for unevenly distributed data

disadvantage is that datasets cannot be easily compared

91
Q

quantile advantage and disadvantage

A

advantage is that relative positions (top 20%) can be shown GOOD for evenly distributed data
disadvantage is that the breaks are unnatural

92
Q

equal interval advantage and disadvantage

A

good food mapping continuous data and is easy to understand

disadvantage is that if data is clustered some classes will be heavily clustered

93
Q

goods and bads of stand dev classification

A

it is good for normally distriobuted data and getting an idea of how data compares to mean
disadvantage is that the actual values are not displayed and outliers strongly influence mean

94
Q

what classification scheme should be used for evenly/unevenly distributed data

A

for evenly distributed data use equal interval, stand dev, or quantile
for uneven use natural breaks

95
Q

about how many classes should be used

A

use between 3 and 7 classes

96
Q

mean center

A

simply the average of the x and y coordinates

center of gravity

97
Q

what is the problem with mean center

A

outliers affect the hell out of it

98
Q

what is an example of weighted mean center

A

rather than simply finding the mean center national park, weight the values by weighing the amount of visitors each has per year

99
Q

median center

A

the coordinate with the shortest distance to all features in the study

100
Q

central feature

A

the FEATURE with the shortest distance to all other features

101
Q

median center vs central feature

A

median center does not need to exist
central feature must exist
median calculates the most accessible location while central feature finds the most accessible entity

102
Q

what are the three defining parameters of a standard deviational elipse

A

the dispersion along the major axis
the dispersion along the minor axis
the angle of rotation

103
Q

what is the difference between absolute and relative frequencies

A

relative frequencies are absolute frequencies divided by total number of observations and.
all of them will add up to 1

104
Q

what is the link between observed data and the normal distribution curve

A

the z score

105
Q

population

A

total set of elements under examination in a study

106
Q

sample

A

group of elements actually studied

107
Q

census

A

when an entire population is studied

108
Q

sampling error

A

when uncertainty arises from working with a sample rather than a population

109
Q

sampling bias

A

when the samples used contain a certain population characteristics

110
Q

central limit theorem

A

if many samples of the same size are taken the distribbution will be normal
the mean should be the same as the population mean

111
Q

what is a type 1 error

A

the null hypothesis is true, but we reject it

112
Q

what is a type 2 error

A

the null hypothesis is false, but we do no reject it

113
Q

what type of error is it if the alternative hypothesis is true, but we accept the null

A

type 2

114
Q

what type of error is it if the alternative hypothesis is false, but we we reject the null

A

type 1

115
Q

what kind of associations can there be between 2 variables?

A

experimental and correlational

116
Q

experimental correlation

A

we are in charge of one of the variable

117
Q

correlational correlation

A

we simply observe both the control and the other

118
Q

what does pearson’s r measure?

A

the strength of a linear relationship between two variables

119
Q

what is the value range for pearson’s r?

A

-1 to 1

120
Q

what would the pearson value be if both x and y increase simultaneously

A

near 1

positive

121
Q

what are 2 conditions that should be had if using pearson’s r

A

the data should not contain extreme outliers

the variance of x and the variance of y should be roughly equal - homoelasicity

122
Q

what happens to mean and variation when data is aggregated?

A

variation is minimized, but mean remains constant

123
Q

problem with MAUP and data aggregation

A

if you aggregate data n/s vs e/w the aggregated results will be different

124
Q

what kind of distance can have multiple shortest routes?

A

manhattan distance

125
Q

is adjacenecy a binary concept?

A

yes

126
Q

how do you calculate margin of error

A

plus or minus 1/(SQRT(N))

127
Q

what happens to margin of error as the sample size increases

A

it lowers

128
Q

what will a distribution table be two tailed

A

when using the alternative hypothesis`

129
Q

what is the empirical rule

A

68% of data lies within 1 stand dev of mean

130
Q

define type 1 and 2 errors simply

A

if ho is true type 1

if ho is false type 2

131
Q

why must we use INVERSE distance weighting

A

if we used the raw data then features with greater distances would have a greater effect on features, but we want them to have less of an effect because they are far away

132
Q

Does a value of .8 mean moran’s I is significant?

A

no, moran’s i indicates the strength of a correlation and significance must be addressed in an entirely different manner

133
Q

explain clusters vs clustering

A

when we say clusters we are referring to a specific cluster of high values (for example counties)
if we speak of clustering we may be discussing the general amount of clusters all throughout Pennsylvania

134
Q

what is the difference between pearson’s r and spearman’s correlation coefficient?

A

pearson’s r refers to a parametric test involving two quantitative variables
spearman’s refers to a non parametric test used for qualitative or ordinal data

135
Q

when is mean not a good measure of center?

A

when the data is not normally distributed or skewed left or right

136
Q

give an example of a run in coins

A

having 8 heads in a row would be a run

137
Q

give an example of a join using coins

A

a join would be having a head, then a tail

138
Q

what determines whether if it will be one or two tailed

A

the alternative hypothesis

139
Q

when will you have a two tailed sitribution

A

when the observation in the test statistic does not equal the control

140
Q

how do you standardize a row

A

divide the weight in question by the sum of the entire row.

basically it’s getting the percentage

141
Q

what is factorial ANOVA

A

this is used when you want to measure the effects two or more independent variables have on an independent variable