W06 - Database Concepts and Data Sources Flashcards
how is spatial and attribute data used with GIS?
spatial data relate to the geometries of spatial features
attribute data describe the characteristics of the spatial features
how does the georelational data model (eg. a coverage) store spatial and attribute data?
separately and links the two by the feature ID. the 2 datasets are synchronized so they can be queried, analyzed, and displayed in unison
how does the object-based data model (eg. a geodatabase)
combines both geometries and attributes in a single system. each spatial feature has a unique object ID and an attribute to store its geometry
how does the raster data model work?
cell value corresponds to the value of a continuous feature at the cell location
the value attribute table summarizes cell values and their frequencies in the raster.
how is attribute data stored?
in tables, organized by rows (record) and columns (field).
what are the 2 types of attribute tables in GIS?
feature attribute table and and tables of nonspatial data
what is a feature attribute table?
an attribute table that has access to the geometries of features.
every vector data set must have a feature attribute table
in the georelational data model, the feature attribute table uses the feature ID to link to the feature’s geometry
in the object-based data model, the feature attribute table has a field that stores the feature’s geometry
have default fields that summarize the feature geometries (ex. length for line features and area & perimeter for polygon features)
what are tables of non-spatial data?
these tables do not have direct access to the feature geometry but has a field linking the table to the feature attribute table.
ex. delimited text files, dBASE files, excel files, access files, other db files from SQL, oracle, etc.
what is a database management system (DBMS)?
software package that lets us build and manipulate a database.
provides tools for data input, search, retrieval, manipulation and output
ArcGIS for Desktop uses Access for managing personal geodatabases
how is the geodatabase implemented?
implemented in a relational database management system and stores both geometries and attributes in a single database
what is a client-server distributed database system?
a client sends a request to the server, retrieves data from the server, and processes the data on the local computer
what are methods of classifying attribute data?
by data type, by measurement scale
what are the different data types?
determines how an attribute is stored, typically included in the metadata of geospatial data
ex. number, text (string), date, binary large object (BLOB)
how can numbers (data type) be stored?
integers (no decimal digits), float/floating point
integers can be short or long.
float can be single precision or double precision
what do BLOBs store?
store images, multimedia and feature geometrics as long sequences of binary numbers
what are the ways to classify data by measurement scale?
nominal, ordinal, interval, and ratio data
what is nominal data
different kinds / categories of data, such as land-use types or soil types
what is ordinal data
differentiates data by a ranking relationship
what is interval data
have known intervals between values (ex. 60F vs 70F differ by 10F)
what is ratio data
same as interval data but ratio data are based on a meaningful zero value (ex. population densities)
categorical data
includes nominal and ordinal scales
numerical data
includes interval and ratio scales
what are the types of database designs?
- flat file
- hierarchical
- network
- relational
what is a flat file?
stores all data in a large table (ex. spreadsheet)
what is a hierarchical database?
organizes its data at different levels and uses only one-to-many associations between levels (ex. zoning > parcel > owner)
what is a network database?
builds connections across tables
what is a common problem with hierarchical and network databases?
the linkages between tables must be known in advance and built into the database at design time. could make the database complicated and inflexible
what is a relational database?
collection of tables (or relations) that can be connected to each other by keys
what is a primary key?
represents one or more attributes whose values can uniquely identify a record in a table
cannot be null and should never change
what is a foreign key?
one or more attributes that refer to a primary key in another table
common field
primary and foreign key with the same name
what are the benefits of a relational database?
simple and flexible
each table in the database can be prepared, maintained and edited separately from the other tables
tables can remain separate until a query or analysis requires attribute data from different tables to be linked together (efficient for data management and data processing)
what is the SSURGO and who produces it?
the Soil Survey Geographic database, produced by the Natural Resources Conservation Service (NRCS)
SSURGO data collected from field mapping, archiving data in 7.5 minute quadrangle units, organized by soil survey area, which may consist of a county, multiple counties, or part of multiple counties
database consists of spatial and tabular data
for each soil survey area, spatial data contained a detailed soil map, made of soil map units (which may be made of one or more noncontiguous polygons). `
a soil map unit represents a set of geographic areas for which a common land-use management strategy is suitable.
what is normalization?
process of decomposition, taking a table with all the attribute data and breaking it down into small tables while maintaining the links between them
what are the objectives of normalization?
- avoid redundant data in tables that waste space and can cause data integrity problems
- ensure attribute data in separate tables can be maintained and updated separately and linked when necessary
- facilitate a distributed database
normalization performance issues
higher normal forms than the third can slow down data access and create higher maintenance costs.
what are the different types of relationships between records in tables?
one to one
one to many
many to one
many to many
origin and destination
one to one
one record in a table is related to only one record in another table
one to many
one record in a table may be related to many records in another table
many to one relationship
many records in a table may be related to one record in another table (ex. several households may share the same street address)
many to many
many records in a table may be related to many records in another table
what is a join
brings together 2 tables by using a common field or a primary key + foreign key
ex. joining attribute data from a nonspatial data table to a feature attribute table
recommended for one to one or many to one relationships
doesn’t work for one to many or many to many because only the first matching record from the destination will be assigned to the origin record
what is a relate?
operation that temporarily connects 2 tables but keeps the tables physically separate
works for all types of relationships, but slows down data access
what is a relationship class?
relationships between objects, predefined and stored in a geodatabase. for the object-based data model
can be one to one, many to one, one to many and many to many
for the first 3, records in the origin are directly linked to records in the destination
for many to many, an intermediate table sorts out the associations between records
field definition
define each field in the table, usually include
- field name
- data width (# of spaces reserved for a field)
- data type
- number of decimal digits (part of the definition for the float type)
field definition becomes a property of the field so it is important to consider how the field will be used before defining it
methods of data entry
import attribute files, but if they don’t already exist, then typing it in.
for map unit symbols or feature IDs, best to enter them directly in a GIS. for nonspatial data, better to use word processing or spreadsheet packages (excel, notepad)
what are the 2 steps to attribute data verification?
1) make sure that attribute data are properly linked to spatial data (feature ID should be unique and contain no null values)
2) verify the accuracy of attribute data
what is an effective method for preventing data entry errors?
use attribute domains in the geodatabase
attribute domains allows the user to define a valid range of values or a valid set of values for an attribute
what does field management entail?
adding or deleting fields and creating new attributes through classification and computation of existing attribute data
why is it good to delete unnecessary fields after downloading data from the internet?
reduces confusion in using the data set and also saves computer time for data processing
creating new attribute data by classification
data classification reduces a data set to a small number of classes (ex. reclassifying elevations into groups)
1) define a new field for saving the classification result
2) select a data subset using a query
3) assign a value to the selected data subset
creating new attribute data by computation
1) define a new field
2) compute the new field values from the values of existing attributes
what is the purpose of data exploration?
allows you to examine the general trends in the data, take a look at subsets, focus on possible relationships between data sets
purpose is to better understand the data and provide a starting point for formulating research questions and hypotheses
data visualization
discipline that uses a variety of exploratory techniques and graphics to understand and gain insight into data
how does data exploration in GIS differ from data exploration in statistics?
1) data exploration in GIS involves both spatial and attribute data
2) includes map and map features
besides descriptive statistics and graphics, data exploration in GIS must also cover map-based data manipulation, attribute data query, and spatial data query
range
difference between the minimum and the maximum
median
the midpoint value (50th percentile)
first quartile
the 25th percentile
third quartile
the 75th percentile
mean
average of data values
variance
measure of the spread of the data about the mean
sum of (value - mean) ^2 divided by # of values
standard deviation
square root of the variance
z score
standardized score
(x - mean) / standard deviation
cumulative distribution graph
line graph that plots the ordered data values against the cumulative distribution values
the cumulative distribution value is (i - 0.5)/n
the values fall between 0 and 1
bubble plots
a variation of scatterplots that uses varying-sized bubbles that represent a third variable
boxplots
show min, first quartile, median, third quartile, max
used to tell if the distribution is symmetric or skilled or if there are any outliers
QQ plots
quantile-quantile plots
compare the cumulative distribution of a data set with some theoretical distribution (ex. a normal distribution)
points in a QQ plot fall in a straight line if the data set follows the theoretical distribution
dynamic graphs
graphics displayed in multiple and dynamically linked windows where we can directly manipulate data points
brushing
allows the user to graphically select a subset of points from one chart and view related data points in other graphics
geovisualization
data visualization that focuses on geospatial data and the integration of cartography, GIS, image analysis, and exploratory data analysis
what are the different types of map-based data manipulations?
data classification, spatial aggregation, and map comparison
what are the different methods of doing map comparisons?
1) superimpose layers on top of one another and have them be represented on the map differently, or turn the layers on and off, or use transparency
2) use map symbols that can show two data sets
ex. bivariate choropleth map
ex. cartogram, where the unit areas are sized proportional to a variable (ex state population) and the area symbols are used to represent the second variable
3) temporal animation can be used if there is time-dependent data
attribute data query
process of retrieving data by working with attributes (ex. SQL commands)
SQL
data query language designed for manipulating relational databases, used in the GIS to communicate with a database
select
from
where
ex. select Parcel.Sale_date
from Parcel
where Parcel.PIN = ‘P101’
ex. select Parcel.Sale_date
from Parcel, Owner
where Parcel.PIN = Owner.PIN AND Owner_name = ‘Costello’
query joins the two tables and then actually queries it
procedural differences when querying a local database in a GIS package
1) only have to enter WHERE in the query expression box because typically the field and table have already been selected
2) an attribute query dialog is typically designed for a single table, so if the query involves attributes from two tables, they have to be joined first.
query expressions
the where conditions with Boolean expressions and connectors
Boolean expression
contains 2 operands and a logical operator
operands can be a field, number, or text
logical operators can be =, >, =, <> (not equal to)
can also contain arithmetic operators
boolean connectors
AND, OR, XOR, NOT
XOR is the opposite of AND. only records that satisfy one and only one of the expressions are selected
what are the types of operations that can act on a data set?
add more records to a subset
remove records from a subset
select a smaller subset
relational database query
works with a relational database, selects a data subset in the table and also selects records related to the subset in other tables
what is the difference between join operation and relate operation?
join operations combines the attribute data from 2 or more tables into a single table. relate dynamically links the tables but keeps the tables separate
spatial data query
process of retrieving a data subset from a layer by working directly with feature geometries. the results can be simultaneously inspected in the map, linked to the records in the table and displayed in charts
can select features spatially using a cursor, a graphic or the spatial relationship between features
feature selection by graphic
draw a shape (graphic) to select objects of interest (ex. restaurants within a 1 mile radius of a hotel)
feature selection by spatial relationship
selects features based on their spatial or topological relationships to other features
ex. roadside rest areas within 50 mile radius of selected rest area; rest areas within each county
spatial relationships used for querying include containment, intersect, and proximity
containment (spatial query)
selects features that fall completely within features for selection
intersect (spatial query)
selects features that intersect features for selection
proximity (spatial query)
selects features that are within a specified distance of features for selection
spatial adjacency
features to be selected and features for selection share common boundaries and the specified distance is 0
raster data query - query by cell value
use the raster instead of a field in the operand to query a feature
can query multiple rasters, which may be integer, floating point, or a mix of both. querying multiple rasters directly is unique to raster data
raster data query - query by select features
features can be used to query a raster and it returns an output raster with values for cells that correspond to the query and no data in the other cells