Introduction to Data Warehousing Flashcards
What are some reasons for using a data warehouse?
Identifying potential customers and best-selling products.
Analyzing region-wise preferences and competitor products.
Evaluating revenue impact of promotion schemes.
Projecting sales and understanding the effect of selling more quantities.
What is the definition of a data warehouse?
A subject-oriented, integrated, non-volatile, and time-variant collection of data to support management decisions.
What does “Subject Oriented” mean in a data warehouse?
Data is organised around major subjects, making the design data-driven rather than application-driven.
What does “Integrated” mean in a data warehouse?
Data from various sources is stored in a single, consistent format.
What does “Time Variant” mean in a data warehouse?
Data is stored as snapshots or views over time, tracking changes.
How do OLTP systems differ from data warehouses?
OLTP: Application-oriented, current data, used for running business operations.
Data Warehouse: Subject-oriented, historical data, used for analysis and decision-making.
What is a centralised architecture?
A single data warehouse storing all data for business analysis.
What is a federated architecture?
Data is stored in separate physical databases but logically consolidated.
What is a tiered architecture?
A distributed data approach for faster response times and reduced search volumes
What is a star schema?
A central fact table connected to de-normalized dimension tables.
Easy to query but less efficient for large datasets.
Q: What is a snowflake schema?
A refinement of the star schema, where dimension hierarchies are normalised.
Reduces redundancy but increases query complexity.
What is a galaxy schema?
Multiple fact tables linked by shared dimensions.
Useful for complex datasets involving multiple business processes.
What is an SCD in data warehousing?
It refers to capturing changes in dimension attributes over time.
What are the three types of SCDs?
Type 1: Overwrites old data, no history retained.
Type 2: Adds a new record for each change, retaining full history.
Type 3: Adds columns to track original and current values.
What are the main steps in Kimball’s data lifecycle?
Extract data to a staging area.
Transform and clean data.
Load data into a data warehouse.
Provide access through data marts or cubes.
Name examples of ETL tools.
Informatica, IBM DataStage, Ab Initio.
What are examples of OLAP tools?
Oracle Express Suite, Business Objects, MicroStrategy.
What are BI reporting tools?
OBIEE, Pentaho, QlikView.
What is a fact table?
Stores measurements or metrics (e.g., sales revenue).
Contains foreign keys linking to dimension tables.
What is a dimension table?
Describes business aspects (e.g., time, location).
Contains descriptive text and is linked to fact tables by foreign keys.
What are types of dimension tables?
Slowly Changing Dimensions
Junk Dimensions
Conformed Dimensions
Late Arriving Dimensions
What are the three steps in ETL?
Extract: Retrieve data from source systems.
Transform: Validate, clean, and convert data for analysis.
Load: Insert transformed data into the target system.
Why is the ETL loading step critical?
It focuses on efficiency and minimizes downtime during updates.
What are the differences between star and snowflake schemas?
Star Schema: De-normalized, simpler structure, faster queries.
Snowflake Schema: Normalized, reduces redundancy, more complex queries.