BIS II - Data Warehouses Flashcards
Data warehouse – Definition
- Collection of databases
- Pool of data to support decision making
- Repository of current and historical data
- “Single version of truth”
Data Warehouse – Characteristics
- Subject oriented
- Integrated
- Time-variant (time series)
- Nonvolatile
- Metadata
- Web based, client/server
- Relational/multi-dimensional
- Real-time
Major Components of Data Warehousing Process
- Data sources: data are sourced from multiple independent operational “legacy” systems
- Data extraction and transformation: data are extracted and properly transformed using custom-written or commercial software called ETL
- Data loading: data are loaded into staging area, where they are transformed and cleansed; the data is then ready to load into the DW or data marts
- Comprehensive database: essentially the enterprise data warehouse to support all decision analysis; provides summarized and detailed information from multiple sources
- Metadata: are maintained so that they can be easily accessed by IT personnel and end users; incl. software programs about data and rules for summaries, that are easy to index and search
- Middleware tools: enable access to the data warehouse; power users may write their own scripts (e.g. SQL); Many front-end applications allow interaction with data stored in the DW
- Applications (Visualization), e.g. with OLAP,
Data Integration and the ETL Process: Data Integration (Second Step in Data Warehousing Process)
ETL = Extract Transform Load
- A decision maker needs access to multiple sources of data that must be integrated
- Data Integration comprises three major processes:
o Data access
o Data federation
o Change capture
Types of Data Warehouses
Data mart
- Departmental small-scale “DW” that stores only limited/relevant data
- Two types:
o Dependent data mart: subset that is created directly from a data warehouse
o Independent data mart: small data warehouse designed for strategic business unit or a department
Operational data Stores (ODS)
- Type of database used as an interim area for a data warehouse
- Similar to short-term memory (DW is a “long-term memory”)
Enterprise Data Warehouse (EDW)
- A data warehouse for the enterprise
Metadata
- Data about data
- In a data warehouse, metadata describe the contents of a data warehouse and the manner of its acquisition and use
Analysis of data in data warehouse
OLTP (Online Transaction Processing)
• Capturing and storing data from ERP, CRM, POS
• The main focus is on efficiency of routine tasks
OLAP (Online Analytical Processing)
• Converting Data into information for decision support
• Data cubes, drill-down/rollup, slice & dice, …
• Requesting ad hoc reports
• Conducting statistical and other analyses
• Developing multimedia-based applications
(!) Analysis of data in data warehouse - OLAP vs. OTLP
OLTP:
Purpose: to carry out day-to-day business functions
Data Source: transaction database, normalized data repository, primarily focused on efficiency and consistency
Reporting: routine, periodic, narrowly focused reports
Resource Requirements: ordinary relational database
Execution Speed: fast; recording of business transactions and routine reports
OLAP:
Purpose: to support decision making, to provide answers for business and management queries
Data Source: data warehouse/data mart; a non-normalized data repository, primarily focused on accuracy and completeness
Reporting: ad hoc, multidimensional, broadly focused reports and queries
Resource Requirements: multiprocessor, large-capacity, specialized databases
Execution Speed: slow, resource intensive, complex, large-scale queries
OLAP Operations: Slice
- • Definition: a subset of data corresponding to a single value set for one or more dimensions not in the subset
OLAP Operations: Dice
- Definition: a slice on more than two dimensions
* (e.g. putting filters on two columns in an excel or “SELECT * FROM xxx WHERE xxx AND xxx” in SQL
Other OLAP Operations
Drill Down/Up: Navigating among levels of data ranging from the most summarized (up) to the most detailed (down)
Roll Up: Computing all of the data relationships for one or more dimensions. A formula might be needed.
Pivot: used to change the dimensional orientation of a report or an ad hoc query-page display