Chapter 4 Test Deck Flashcards
Data Warehouse
A subject-oriented, integrated, time-variant, and nonvoliatile collection of data in support of manament’s decision-making process.
Data warehousing
The process of constructing and using data warehouses
Subject-Oriented
organized around Major subjects such as customer, product, sales
Focusing on modeling and analysis of data for decision makers
(OLTP - Online Transaction Processing)
Integrated
Constructed by integrated multiple, heterogeneous data sources
relational databases, flat files, online transaction records
data cleaning and data integration techniques are applied
TIme variant
Time horizon significantly longer than operational systems: current value data
data warehouse data: provide info from historical perspective 5-10 years
summarized and historical records, patterns
Contains an element of time, explicitly or implicitly
Nonvolatile
A physically separate store of data from the operational environment
Update of data does not occur in the data warehouse environment
not require transaction processing, recovery, and concurrence control mechanisms
requires initial loading of data and access of data
OLAP- Online Analytical Processing
OLTP
Clerk, IT professional day to day operations application-oriented E-R model current, up-to-date detailed repetitive read/write short simple transaction tens thousands 100MB-GB transaction throughput
Olap
knowledge worker decision support subject-oriented star schema historical multidimensional ad-hoc lots of scans complex query millions hundreds 100Gb-TB query throughput, response
DBMS
Tuned for OLTP access methods, indexing, concurrency control, recovery
warehouse
tuned for OLAP: complex OLAP queries, multidimensional view, consolidation
Enterprise warehouse
collects all of the information about subjects spanning the entire organization
Data Mart
A subset of corporate-wide data that is of value to a specific group of users. Scope confined to specific, selected groups, such as marketing data market. Independent vs dependent data marts
virtual warehouse
A set of views over operational databases
only some of the possible summary views may be materialized
Data extractiom
Get data from multiple, heterogenous, and external sources
Data cleaning
detect errors in the data and rectify them when possible
Data Transformation
Convert data from legacy or host format to warehouse format
Load
sort summarize consolidate, compute views, check integrity, and build indices and partitions
Refresh
propagate the updates from the data sources to the warehouse
Meta data
the data defining warehouse objects
stores: description of the structure of the data warehouse: schema, view, dimensions etc.
The algorithms used for summarization
the mapping from operational environment to the data warehouse
data related to system performance
operational meta-data
business data, terms, ownership of data charging policies
operational meta-data
data lineage, history of migrated data and transformation path,
currency of data, active, archived, or purged,
monitoring information, warehouse usage statistics, error reports, audit trails
A data cube
allows data to be modled and viewd in multiple dimensions
dimension tables and fact tables
it is n-dimensional, 4 d cubes can be a series of 3-d cubes
physical storage may differ from its logical representation
Cuboid
data cube often referred as a cuboid
The lattice of cuboids forms a data cube
base cuboid
n-d base cube
apex cuboid
the top most 0-D cuboid which holds the highest-level of summarization
Star schema
A fact table in the middle connected, via foreign key to primary key relationship, to a ser of dimension tables
Snowflake schema
a refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables forming a shape similar to snowflake
Reduce redundancy
Fact Constellations
multiple fact tables share dimension tables, viewed as a collection of stars, therefore, called galaxy schema pr fact constellation
Data warehouses vs datamart
data wearhouse collects information about subjects that span the entire organization, it is enterprise-wide
for warehouse, the fact constellation schema is commonly used
a data mart is a departmental subset of the data warehouse, it is department-wide
the star or snowflake is commonly used
star schema is more popular and efficient, less joins
Roll up or drill up
summarize data by clibing up hierarchy or by dimension reduction
drill down or roll down
the reverse of roll up
from higher-level summary to lower level summary or detailed data or introducing new dimensions
slice and dice
slice: select form 1 D
Dice: select 2 or more Ds from a subcube
pivot or rotate
reorient the cube, visualization 3d series of 2d blanes
drill across
involving more than one fact table
drill through
through the bottom level of the cube to its back-end related tables using SQl
Information processing
supports querying, basic statistical analysis, and reproting using crosstabs, tables, charts and graphs
analytical processing
multidimensional analysis of data warehouse data
support basic olap operations, slice-dice, drilling,pivoting
Data mining
knowledge discovery from hidden patterns
supports associations, constructing analytical models, performing classification and prediction, and presenting the mining results using visualization tools
Relational OLAP or ROLAP
use relational or extended relaional DBMS to store and manage waehosue dat and client front end using OLAP middleware
greater scalability than MOLAP technology
Multidimensional OLAP or MOLAP
sparse array-based multidimensional storage engine
fast indexing to precomputed summarized data
Hybrid OLAP or HOLAP
e.g. microsoft SQLServer
combines ROLAP and MOLAP technology
flexible e.g. low level relational, high level array
Specialized SQL servers
e.g. Redbricks
specialized support for SQL queries over star.snowflake schemas