Introduction to Data Warehousing Flashcards

1
Q

What are some reasons for using a data warehouse?

A

Identifying potential customers and best-selling products.

Analyzing region-wise preferences and competitor products.

Evaluating revenue impact of promotion schemes.

Projecting sales and understanding the effect of selling more quantities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the definition of a data warehouse?

A

A subject-oriented, integrated, non-volatile, and time-variant collection of data to support management decisions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does “Subject Oriented” mean in a data warehouse?

A

Data is organised around major subjects, making the design data-driven rather than application-driven.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does “Integrated” mean in a data warehouse?

A

Data from various sources is stored in a single, consistent format.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does “Time Variant” mean in a data warehouse?

A

Data is stored as snapshots or views over time, tracking changes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do OLTP systems differ from data warehouses?

A

OLTP: Application-oriented, current data, used for running business operations.

Data Warehouse: Subject-oriented, historical data, used for analysis and decision-making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a centralised architecture?

A

A single data warehouse storing all data for business analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a federated architecture?

A

Data is stored in separate physical databases but logically consolidated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a tiered architecture?

A

A distributed data approach for faster response times and reduced search volumes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a star schema?

A

A central fact table connected to de-normalized dimension tables.

Easy to query but less efficient for large datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Q: What is a snowflake schema?

A

A refinement of the star schema, where dimension hierarchies are normalised.

Reduces redundancy but increases query complexity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a galaxy schema?

A

Multiple fact tables linked by shared dimensions.
Useful for complex datasets involving multiple business processes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is an SCD in data warehousing?

A

It refers to capturing changes in dimension attributes over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the three types of SCDs?

A

Type 1: Overwrites old data, no history retained.
Type 2: Adds a new record for each change, retaining full history.
Type 3: Adds columns to track original and current values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the main steps in Kimball’s data lifecycle?

A

Extract data to a staging area.

Transform and clean data.
Load data into a data warehouse.

Provide access through data marts or cubes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Name examples of ETL tools.

A

Informatica, IBM DataStage, Ab Initio.

17
Q

What are examples of OLAP tools?

A

Oracle Express Suite, Business Objects, MicroStrategy.

18
Q

What are BI reporting tools?

A

OBIEE, Pentaho, QlikView.

19
Q

What is a fact table?

A

Stores measurements or metrics (e.g., sales revenue).
Contains foreign keys linking to dimension tables.

20
Q

What is a dimension table?

A

Describes business aspects (e.g., time, location).

Contains descriptive text and is linked to fact tables by foreign keys.

21
Q

What are types of dimension tables?

A

Slowly Changing Dimensions
Junk Dimensions
Conformed Dimensions
Late Arriving Dimensions

22
Q

What are the three steps in ETL?

A

Extract: Retrieve data from source systems.

Transform: Validate, clean, and convert data for analysis.

Load: Insert transformed data into the target system.

23
Q

Why is the ETL loading step critical?

A

It focuses on efficiency and minimizes downtime during updates.

24
Q

What are the differences between star and snowflake schemas?

A

Star Schema: De-normalized, simpler structure, faster queries.

Snowflake Schema: Normalized, reduces redundancy, more complex queries.

25
What is a galaxy schema used for?
To handle complex datasets by linking multiple fact tables with shared dimensions.
26
How do data warehouses support decision-making?
By consolidating historical and current data for insights into sales, customer behavior, and business trends.