Spatial Data and Statistics (1) Flashcards
What is R?
is an integrated suite of software facilities for data manipulation, simulation, calculation, and graphical
display. It handles and analyzes data very effectively and it contains a suite of operators for calculations
on arrays and matrices. In addition, it has the graphical capabilities for very sophisticated graphs and
data displays. Finally, it is an elegant, object-oriented programming language.
- S programming language
-The R project was started by Robert Gentleman and Ross Ihaka (that’s where the name R is derived)
from the Statistics Department in the University of Aukland in 1995. The software has quickly gained a
widespread audience. It is currently maintained by the R core development team – a hard-working,
international group of volunteer developers. The R project web page is:
S Language
The S language,
which was written in the mid-1970s, was a product of Bell Labs (of AT&T and now Lucent Technologies)
and was originally a program for the Unix operating system
SPSS stands for
SPSS (Statistical Package for the Social Sciences)
Descriptive statistics
- organization and summary of data that represents the whole datatset
- whenever this replacement is made, there is inevitably some loss of information –descriptive statistics aims to minimize this loss
statistics
statistics refers to any collection of numerical data
- is the methodology for collecting, presenting,
and analyzing data
inferential statistics
applies probability theory to descriptive statistics so that an
investigator can generalize the results of a study of a few individuals to some larger group
-we are usually interested in one or more characteristic of the population
-because you cant measure EVERY part of a dataset
Population Census vs. Sample
• population census: a complete tabulation of a particular population
characteristic for all elements in the population
• sample: a subset of the elements in the population which is used to make
inferences about certain characteristics of the population as a whole
• for practical considerations, usually time and/or cost, it is convenient to sample
rather than enumerate the entire population
since a population characteristic is likely to take on different values for different
elements of the population, it is usually called a ______
variable
-characteristics that varies in space and time
sampling error
nonsampling or data acquisition errors:
the difference between the value of a population characteristic and
the value of that characteristic inferred from the sample
-outliers not included in sample can cause error
errors that arise in the acquisition, recording,
and editing of statistical data
probability theory
the link between the sample and the population
- way of understanding the errors involved between sample and population
- small samples tend to be less accurate/representative
beginning in 1950, a new paradigm based on the scientific method and ______ _____ ,
began to dominate geographical thought
logical positivism
statistical analyses are relatively new to geography, starting in the mid-20th century
-the only meaningful problems are those that can
be answered by logical analysis
Scientific Method: 6 steps
- Concepts
- Description
- Hypothesis
- Model
- Theory
- Law
exploratory methods
analyses used to suggest an
hypothesis
o we might look at a map of snow depths, and
see that snow is deepest near lakes
o usually based on visual or descriptive analyses
(eg, GIS, preliminary data)
-idea of creating a figure and identifying patterns
confirmatory methods:
analyses used to confirm
an hypothesis
o a statistical method is used to test if the
snowfall patterns arise purely by chance, or if
there is a cause-and-effect process at work
-identifying hypothesis first and then collecting data
-o confirmatory methods rarely confirm or refute a hypothesis, but they are useful to
structure our understanding of the processes in question
Experimental Probability vs Assumed Probability
Experimental probability will most likely come closer and closer to the assumed probability
-ex: coin flip assumed 50/50 but may not turn out that way