data warehousing Flashcards
What is a data warehouse?
A decision-support database, separate from operational databases, providing consolidated, historical data for analysis.
What are the four key characteristics of a data warehouse?
Subject-Oriented: Organized around subjects like customers, products, or sales.
Integrated: Combines data from multiple sources into a consistent format.
Time-Variant: Stores historical data for long-term analysis.
Non-Volatile: Data is read-only and does not undergo frequent updates.
How does a data warehouse differ from an operational database?
Operational Database (OLTP): Focuses on day-to-day transactions, contains current and detailed data.
Data Warehouse (OLAP): Focuses on historical data analysis, contains aggregated and consolidated data.
What are fact and dimension tables in a data warehouse?
Fact Table: Stores measurements or metrics of a business process (e.g., sales amounts).
Dimension Tables: Store descriptive attributes for filtering and labeling query results (e.g., product names, dates).
Why is it necessary to separate a data warehouse from operational databases?
Optimized for different purposes: OLTP handles transactions, while OLAP supports analysis.
Consolidation of data from multiple sources.
Maintains historical data not available in operational databases.
Reconciles inconsistencies in data formats and representations.
What are the design approaches for building a data warehouse?
Top-Down: Structured and comprehensive design starting with overall planning.
Bottom-Up: Rapid prototyping starting with smaller projects.
Hybrid: Combines both approaches.
What are the three data warehouse models?
Enterprise Warehouse: Unified data for the entire organization.
Data Mart: Subset of a data warehouse for specific business units.
Virtual Warehouse: Logical view of operational databases.
What is ETL in data warehousing?
Extract: Gather data from multiple sources.
Transform: Convert data into a consistent format.
Load: Store data into the warehouse.
What are the two OLAP server architectures?
ROLAP (Relational OLAP): Uses relational databases for data storage and analysis.
MOLAP (Multidimensional OLAP): Uses arrays for fast access to summarized data.
HOLAP (Hybrid OLAP): Combines features of ROLAP and MOLAP.
What are the main uses of a data warehouse?
Information Processing: Querying, reporting, and statistical analysis.
Analytical Processing: Multidimensional analysis using slicing, dicing, and pivoting.
Data Mining: Discovering hidden patterns, associations, and predictions.
What is metadata in a data warehouse?
Metadata describes warehouse objects, including schema, data lineage, and performance metrics.