Spatial Data and Statistics (1) Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is R?

A

is an integrated suite of software facilities for data manipulation, simulation, calculation, and graphical
display. It handles and analyzes data very effectively and it contains a suite of operators for calculations
on arrays and matrices. In addition, it has the graphical capabilities for very sophisticated graphs and
data displays. Finally, it is an elegant, object-oriented programming language.
- S programming language
-The R project was started by Robert Gentleman and Ross Ihaka (that’s where the name R is derived)
from the Statistics Department in the University of Aukland in 1995. The software has quickly gained a
widespread audience. It is currently maintained by the R core development team – a hard-working,
international group of volunteer developers. The R project web page is:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

S Language

A

The S language,
which was written in the mid-1970s, was a product of Bell Labs (of AT&T and now Lucent Technologies)
and was originally a program for the Unix operating system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

SPSS stands for

A

SPSS (Statistical Package for the Social Sciences)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Descriptive statistics

A
  • organization and summary of data that represents the whole datatset
  • whenever this replacement is made, there is inevitably some loss of information –descriptive statistics aims to minimize this loss
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

statistics

A

statistics refers to any collection of numerical data
- is the methodology for collecting, presenting,
and analyzing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

inferential statistics

A

applies probability theory to descriptive statistics so that an
investigator can generalize the results of a study of a few individuals to some larger group
-we are usually interested in one or more characteristic of the population
-because you cant measure EVERY part of a dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Population Census vs. Sample

A

• population census: a complete tabulation of a particular population
characteristic for all elements in the population
• sample: a subset of the elements in the population which is used to make
inferences about certain characteristics of the population as a whole
• for practical considerations, usually time and/or cost, it is convenient to sample
rather than enumerate the entire population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

since a population characteristic is likely to take on different values for different
elements of the population, it is usually called a ______

A

variable

-characteristics that varies in space and time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

sampling error

nonsampling or data acquisition errors:

A

the difference between the value of a population characteristic and
the value of that characteristic inferred from the sample
-outliers not included in sample can cause error

errors that arise in the acquisition, recording,
and editing of statistical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

probability theory

A

the link between the sample and the population

  • way of understanding the errors involved between sample and population
  • small samples tend to be less accurate/representative
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

beginning in 1950, a new paradigm based on the scientific method and ______ _____ ,
began to dominate geographical thought

A

logical positivism

statistical analyses are relatively new to geography, starting in the mid-20th century

-the only meaningful problems are those that can
be answered by logical analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Scientific Method: 6 steps

A
  1. Concepts
  2. Description
  3. Hypothesis
  4. Model
  5. Theory
  6. Law
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

exploratory methods

A

analyses used to suggest an
hypothesis
o we might look at a map of snow depths, and
see that snow is deepest near lakes
o usually based on visual or descriptive analyses
(eg, GIS, preliminary data)

-idea of creating a figure and identifying patterns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

confirmatory methods:

A

analyses used to confirm
an hypothesis
o a statistical method is used to test if the
snowfall patterns arise purely by chance, or if
there is a cause-and-effect process at work

-identifying hypothesis first and then collecting data

-o confirmatory methods rarely confirm or refute a hypothesis, but they are useful to
structure our understanding of the processes in question

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Experimental Probability vs Assumed Probability

A

Experimental probability will most likely come closer and closer to the assumed probability
-ex: coin flip assumed 50/50 but may not turn out that way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

probability

A

o probability is a measure of certainty, or uncertainty
o something with a probability of 1 is certain to occur
o something with a probability of 0 is certain not to occur

17
Q

statistics is not merely plugging numbers into equations and presenting the results –
a lot of consideration must be given to what the problem is and how can data and
statistical methods be used to help solve the problem (5)

A
  1. consider what data are relevant to the problem
  2. consider how the relevant data can be obtained
  3. explain the basis for all assumptions
  4. lay out the arguments on all sides of the issue
  5. formulate questions that can be addressed by statistical methods
18
Q

o there are 4 primary types of data – each is different, and each requires a specific
statistical treatment

A
  1. Nominal/Categorical Data (Lowest Form)
    -No way to order using numerical basis
    -hard to apply statistical analysis
  2. Ordinal Data
    -you can actually order it
    *1&2 are both categorical type data
    .nominal and ordinal data are typically analyzed with nonparametric
    statistical techniques, which are designed to
    work with categorical data
  3. Interval Data
    -continuous data that has regular, interpretable differences between
    values
    - 0 is meaningless, 0 degrees Celsius does not mean there is no temperature
    -time data is interval
    *Fahrenheit based off of human body temperature
  4. Ratio Data (Highest Form)
    -continuous data that has regular, interpretable differences between values, but the
    zero value is natural
    - 0 means absence of the phenomenon
    -celcius(interval) scale can be converted to kelvin(ratio)
    -eg, a location 10 km away is twice as far as something 5 km away
19
Q

Discrete Vs. Continuous Data

A

o discrete data is usually represented in whole numbers, contains a finite number of
values, and/or fits into categories
o eg, how many trips to a doctor does a patient make? (they can’t take a ½ trip, or
0.166 of a trip

o continuous data can be measured on a continuum or scale, and there can be an infinite
number of values
o the elevation of Mt. Everest is 8844.43 m (the number of decimals is limited only by
the precision of the measurement device)

20
Q

spatial data

A

spatial data implies that you are interested in understanding the ways that things
change in space or over distances

21
Q

Problems to consider when using spatial data (4)

A
  1. the modifiable unit area problem: “a problem arising from the imposition of artificial
    units of spatial reporting on continuous geographic phenomena resulting in the
    generation of artificial spatial patterns” (Heywood, 1998)
    -depending on how you sample a region, you can manipulate the results
2. boundary problems: when you establish a sample location, you include everything
within the boundary and exclude everything outside the boundary
-but sometimes the things outside the
boundary have an influence on things
inside the boundary
-the size and/or shape of a sample area
can strongly affect the importance of
out-of-bounds influences
  1. spatial sampling procedures: how do you collect an accurate sample of a population
    when the characteristics of the population are unknown?

o if you were studying transportation habits in communities across Canada, you would
get very different results if you only looked at cities as opposed to rural
communities
o a sampling strategy must be determined before sampling in order to get a complete
and representative sample
o there are several types of sampling strategies (random, stratified, systematic,
etc.), each with it’s own advantages and disadvantages

  1. spatial autocorrelation: things that are close together can influence each other and
    behave in similar ways
    -this invalidates one of the fundamental rules of statistical analysis – individual
    entities must be independent of each other
    -spatial autocorrelation is likely to exist in every geographic data set available – it is
    the way if which you deal with it that is important

=Everything is related to everything
else, but near things are more
related than distant things