Chapter 4 Test Deck Flashcards
Data Warehouse
A subject-oriented, integrated, time-variant, and nonvoliatile collection of data in support of manament’s decision-making process.
Data warehousing
The process of constructing and using data warehouses
Subject-Oriented
organized around Major subjects such as customer, product, sales
Focusing on modeling and analysis of data for decision makers
(OLTP - Online Transaction Processing)
Integrated
Constructed by integrated multiple, heterogeneous data sources
relational databases, flat files, online transaction records
data cleaning and data integration techniques are applied
TIme variant
Time horizon significantly longer than operational systems: current value data
data warehouse data: provide info from historical perspective 5-10 years
summarized and historical records, patterns
Contains an element of time, explicitly or implicitly
Nonvolatile
A physically separate store of data from the operational environment
Update of data does not occur in the data warehouse environment
not require transaction processing, recovery, and concurrence control mechanisms
requires initial loading of data and access of data
OLAP- Online Analytical Processing
OLTP
Clerk, IT professional day to day operations application-oriented E-R model current, up-to-date detailed repetitive read/write short simple transaction tens thousands 100MB-GB transaction throughput
Olap
knowledge worker decision support subject-oriented star schema historical multidimensional ad-hoc lots of scans complex query millions hundreds 100Gb-TB query throughput, response
DBMS
Tuned for OLTP access methods, indexing, concurrency control, recovery
warehouse
tuned for OLAP: complex OLAP queries, multidimensional view, consolidation
Enterprise warehouse
collects all of the information about subjects spanning the entire organization
Data Mart
A subset of corporate-wide data that is of value to a specific group of users. Scope confined to specific, selected groups, such as marketing data market. Independent vs dependent data marts
virtual warehouse
A set of views over operational databases
only some of the possible summary views may be materialized
Data extractiom
Get data from multiple, heterogenous, and external sources
Data cleaning
detect errors in the data and rectify them when possible
Data Transformation
Convert data from legacy or host format to warehouse format
Load
sort summarize consolidate, compute views, check integrity, and build indices and partitions
Refresh
propagate the updates from the data sources to the warehouse
Meta data
the data defining warehouse objects
stores: description of the structure of the data warehouse: schema, view, dimensions etc.
The algorithms used for summarization
the mapping from operational environment to the data warehouse
data related to system performance
operational meta-data
business data, terms, ownership of data charging policies
operational meta-data
data lineage, history of migrated data and transformation path,
currency of data, active, archived, or purged,
monitoring information, warehouse usage statistics, error reports, audit trails
A data cube
allows data to be modled and viewd in multiple dimensions
dimension tables and fact tables
it is n-dimensional, 4 d cubes can be a series of 3-d cubes
physical storage may differ from its logical representation
Cuboid
data cube often referred as a cuboid
The lattice of cuboids forms a data cube
base cuboid
n-d base cube
apex cuboid
the top most 0-D cuboid which holds the highest-level of summarization