Date Warehousing Flashcards
Data mart
A data mart represents data from a single business process.
Third normal form (3NF)
Database design approach that eliminates redundancy and therefore facilitates insertion of new rows into tables in an OLTP application without introducing excessive data locking problems. Sometimes referred to as normalized.
Quickly can turn into a spider web of tables and sometimes consists of hundreds of tables. Great for transaction operational systems because an update only touches the database in one place, however all the joins across tables make it very confusing and inefficient for query performance to answer business questions.
Atomic data
the most detailed granular data captured by a business process. Atomic data must be made available in the data presentation area to respond to unpredictable ad hoc queries.
data warehouse bus architecture
The architecture for the data warehouse’s presentation area based on conformed dimensions and facts. Without adherence to the bus architecture, a data mart is a standalone stovepipe application.
Conformed dimensions
Dimensions are conformed when they are either exactly the same (including the keys) or one is a perfect subset of the other. Most important, the row header produced in answer sets from two different conformed dimensions must be able to be matched perfectly.
Conformed facts
Facts from multiple fact tables are conformed when the technical definitions of the facts are equivalent. Conformed facts are allowed to have the same name in separate tables and can be combined and compared mathematically. If facts do not conform, then the different interpretations must be given different names.
Operational System (data)
Optimized to process transactions quickly. Typically do not maintain history, but rather update data to reflect the most current state.
DW/BI Systems
Designed for efficient query performance to answer any sort of business question - how many new orders signed up, compared to last week’s orders, are the operational systems working correctly, etc. Although they need detailed data, they almost never deal with one transaction at a time. Often require that hundreds of thousands of transactions be searched and compressed into an answer set.
OLAP Performance vs. Relational Performance
OLAP cubes have traditionally been noted for extreme performance advantages over RDBMSs, but that distinction has become less important with advances in computer hardware, such as appliances and in-memory databases, and RDBMS software, such as columnar databases.
OLAP Structures vs. Relational Structures
OLAP cube data structures are more variable across different vendors than relational DBMSs, thus the final deployment details often depend on which OLAP vendor is chose. It is typically more difficult to port BIT applications between different OLAP tools than to port BI applications across different relational databases.
Two Key Components of a Star Schema
Fact tables for measurements and dimension tables for descriptive context.
Fact tables with a single level of detail
The idea that a measurement event in the physical world has a one-to-one relationship to a single row in the corresponding fact table is a bedrock principle for dimensional modeling. Everything else builds from this foundations.