Lecture 2: Introduction to Data Warehousing Flashcards
Front
Back
What is a Data Warehouse?
A system designed to collect, store, and manage data from multiple sources for analysis and business intelligence.
How does a Data Warehouse differ from a regular database?
A regular database (OLTP) is optimized for day-to-day transactions; a Data Warehouse (OLAP) focuses on historical data and analytics.
What are the four key characteristics of a Data Warehouse?
Subject-oriented, Integrated, Time-variant, and Nonvolatile.
Explain ‘subject-oriented’ in Data Warehousing.
Data is organized by business topics (sales, marketing, finance) rather than daily transactions.
What does ‘integrated’ mean in the context of a Data Warehouse?
Data from multiple sources is standardized and combined, ensuring consistency in formats, naming, and units of measure.
Define ‘time-variant’ in a Data Warehouse.
Data is associated with specific time periods, allowing historical analysis and trend identification.
Why is a Data Warehouse considered ‘nonvolatile’?
Once data is loaded, it is not updated in real-time; it is primarily read-only, preserving historical snapshots.
Name three types of Data Warehouses.
Enterprise Data Warehouse (EDW), Operational Data Store (ODS), and Data Mart.
What is an Enterprise Data Warehouse (EDW)?
A centralized, large-scale data repository that provides a unified approach to organizing and accessing data across the entire organization.
Describe an Operational Data Store (ODS).
A near real-time data store that helps with immediate reporting needs, bridging the gap between operational databases and the Data Warehouse.
What is a Data Mart?
A subset of a Data Warehouse focused on a specific department or function (e.g., sales, marketing) for faster, targeted analysis.
List some common Data Warehousing tools.
IBM Datastage, Oracle, Amazon Redshift, SAP, Google BigQuery, and DOMO.
What are the three main DW architecture types?
Single-tier, two-tier, and three-tier. Three-tier (with database, OLAP server, and client tools) is the most widely used.
Why is selecting the right DW architecture important?
It depends on factors like organizational needs, resources, the complexity of data, and the reporting/analysis requirements.