Spatial Data and Statistics (1) Flashcards

Question 1

Q

What is R?

Answer

A

is an integrated suite of software facilities for data manipulation, simulation, calculation, and graphical
display. It handles and analyzes data very effectively and it contains a suite of operators for calculations
on arrays and matrices. In addition, it has the graphical capabilities for very sophisticated graphs and
data displays. Finally, it is an elegant, object-oriented programming language.
- S programming language
-The R project was started by Robert Gentleman and Ross Ihaka (that’s where the name R is derived)
from the Statistics Department in the University of Aukland in 1995. The software has quickly gained a
widespread audience. It is currently maintained by the R core development team – a hard-working,
international group of volunteer developers. The R project web page is:

Question 2

Q

S Language

Answer

A

The S language,
which was written in the mid-1970s, was a product of Bell Labs (of AT&T and now Lucent Technologies)
and was originally a program for the Unix operating system

Question 3

Q

SPSS stands for

Answer

A

SPSS (Statistical Package for the Social Sciences)

Question 4

Q

Descriptive statistics

Answer

A

organization and summary of data that represents the whole datatset
whenever this replacement is made, there is inevitably some loss of information –descriptive statistics aims to minimize this loss

Question 5

Q

statistics

Answer

A

statistics refers to any collection of numerical data
- is the methodology for collecting, presenting,
and analyzing data

Question 6

Q

inferential statistics

Answer

A

applies probability theory to descriptive statistics so that an
investigator can generalize the results of a study of a few individuals to some larger group
-we are usually interested in one or more characteristic of the population
-because you cant measure EVERY part of a dataset

Question 7

Q

Population Census vs. Sample

Answer

A

• population census: a complete tabulation of a particular population
characteristic for all elements in the population
• sample: a subset of the elements in the population which is used to make
inferences about certain characteristics of the population as a whole
• for practical considerations, usually time and/or cost, it is convenient to sample
rather than enumerate the entire population

Question 8

Q

since a population characteristic is likely to take on different values for different
elements of the population, it is usually called a ______

Answer

A

variable

-characteristics that varies in space and time

Question 9

Q

sampling error

nonsampling or data acquisition errors:

Answer

A

the difference between the value of a population characteristic and
the value of that characteristic inferred from the sample
-outliers not included in sample can cause error

errors that arise in the acquisition, recording,
and editing of statistical data

Question 10

Q

probability theory

Answer

A

the link between the sample and the population

way of understanding the errors involved between sample and population
small samples tend to be less accurate/representative

Question 11

Q

beginning in 1950, a new paradigm based on the scientific method and ______ _____ ,
began to dominate geographical thought

Answer

A

logical positivism

statistical analyses are relatively new to geography, starting in the mid-20th century

-the only meaningful problems are those that can
be answered by logical analysis

Question 12

Q

Scientific Method: 6 steps

Answer

A

Concepts
Description
Hypothesis
Model
Theory
Law

Question 13

Q

exploratory methods

Answer

A

analyses used to suggest an
hypothesis
o we might look at a map of snow depths, and
see that snow is deepest near lakes
o usually based on visual or descriptive analyses
(eg, GIS, preliminary data)

-idea of creating a figure and identifying patterns

Question 14

Q

confirmatory methods:

Answer

A

analyses used to confirm
an hypothesis
o a statistical method is used to test if the
snowfall patterns arise purely by chance, or if
there is a cause-and-effect process at work

-identifying hypothesis first and then collecting data

-o confirmatory methods rarely confirm or refute a hypothesis, but they are useful to
structure our understanding of the processes in question

Question 15

Q

Experimental Probability vs Assumed Probability

Answer

A

Experimental probability will most likely come closer and closer to the assumed probability
-ex: coin flip assumed 50/50 but may not turn out that way

Question 16

Q

probability

Answer

Study These Flashcards

A

o probability is a measure of certainty, or uncertainty
o something with a probability of 1 is certain to occur
o something with a probability of 0 is certain not to occur

Question 17

Q

statistics is not merely plugging numbers into equations and presenting the results –
a lot of consideration must be given to what the problem is and how can data and
statistical methods be used to help solve the problem (5)

Answer

Study These Flashcards

A

consider what data are relevant to the problem
consider how the relevant data can be obtained
explain the basis for all assumptions
lay out the arguments on all sides of the issue
formulate questions that can be addressed by statistical methods

Question 18

Q

o there are 4 primary types of data – each is different, and each requires a specific
statistical treatment

Answer

Study These Flashcards

A

Nominal/Categorical Data (Lowest Form)
-No way to order using numerical basis
-hard to apply statistical analysis
Ordinal Data
-you can actually order it
*1&2 are both categorical type data
.nominal and ordinal data are typically analyzed with nonparametric
statistical techniques, which are designed to
work with categorical data
Interval Data
-continuous data that has regular, interpretable differences between
values
- 0 is meaningless, 0 degrees Celsius does not mean there is no temperature
-time data is interval
*Fahrenheit based off of human body temperature
Ratio Data (Highest Form)
-continuous data that has regular, interpretable differences between values, but the
zero value is natural
- 0 means absence of the phenomenon
-celcius(interval) scale can be converted to kelvin(ratio)
-eg, a location 10 km away is twice as far as something 5 km away

Question 19

Q

Discrete Vs. Continuous Data

Answer

Study These Flashcards

A

o discrete data is usually represented in whole numbers, contains a finite number of
values, and/or fits into categories
o eg, how many trips to a doctor does a patient make? (they can’t take a ½ trip, or
0.166 of a trip

o continuous data can be measured on a continuum or scale, and there can be an infinite
number of values
o the elevation of Mt. Everest is 8844.43 m (the number of decimals is limited only by
the precision of the measurement device)

Question 20

Q

spatial data

Answer

Study These Flashcards

A

spatial data implies that you are interested in understanding the ways that things
change in space or over distances

Question 21

Q

Problems to consider when using spatial data (4)

Answer

Study These Flashcards

A

the modifiable unit area problem: “a problem arising from the imposition of artificial
units of spatial reporting on continuous geographic phenomena resulting in the
generation of artificial spatial patterns” (Heywood, 1998)
-depending on how you sample a region, you can manipulate the results

2. boundary problems: when you establish a sample location, you include everything
within the boundary and exclude everything outside the boundary
-but sometimes the things outside the
boundary have an influence on things
inside the boundary
-the size and/or shape of a sample area
can strongly affect the importance of
out-of-bounds influences

spatial sampling procedures: how do you collect an accurate sample of a population
when the characteristics of the population are unknown?

o if you were studying transportation habits in communities across Canada, you would
get very different results if you only looked at cities as opposed to rural
communities
o a sampling strategy must be determined before sampling in order to get a complete
and representative sample
o there are several types of sampling strategies (random, stratified, systematic,
etc.), each with it’s own advantages and disadvantages

spatial autocorrelation: things that are close together can influence each other and
behave in similar ways
-this invalidates one of the fundamental rules of statistical analysis – individual
entities must be independent of each other
-spatial autocorrelation is likely to exist in every geographic data set available – it is
the way if which you deal with it that is important

=Everything is related to everything
else, but near things are more
related than distant things

Spatial Data and Statistics (1) Flashcards

(21 cards)