Business Intelligence and Data Warehouses Flashcards
What is business intelligence
BI: A comprehensive, cohesive, and integrated set of tools and processes used to capture, collect, integrate,
store, and analyse data with the purpose of generating and presenting information to support business decision
making.
Differentiate between operational data and decision support data - Time span
- Time span: Operational data (OD) covers a short time frame, while decision support data (DSD) tends to
cover a longer time frame.
Differentiate between operational data and decision support data - Granularity
- Granularity: Describes the levels of aggregation. Decision support data must be presented at various
levels of aggregation, from highly summarized (DSD) to nearly atomic (OD).
Differentiate between operational data and decision support data - Dimensionality
- Dimensionality: Operational data focuses on representing individual transactions, rather than the effect of
those transactions over time (DSD). Decision support data includes multiple dimensions to show how data
relates across those dimensions.
What is a data warehouse, and what are its main characteristics?
How does it differ from a data mart?
Data warehouse: An integrated, subject-oriented, time-variant, nonvolatile collection of data that provides
support for decision making.
Components of a data warehouse are 1. Integrated
- Integrated. The data warehouse is a centralized, consolidated database that integrates data derived
from the entire organization and from multiple sources with diverse formats.
Components of a data warehouse are 2. Subject-oriented
- Subject-oriented. Data warehouse data is arranged and optimized to provide answers to questions from
diverse functional areas within a company.
Components of a data warehouse are 3. Time-variant
- Time-variant. In contrast to operational data, which focuses on current transactions, warehouse data
represents the flow of data through time.
Components of a data warehouse are 4. Non-Volatile
- Non-volatile. Once data enters the data warehouse, it is never removed.
what is data mart.
Data mart: A small, single-subject data warehouse subset that provides decision support to a small group or
function, created from a larger data warehouse for faster data access
Using a star schema in a data warehouse environment
The star schema consists of a central fact table that stores transactional or measured data. Surrounding the
fact table are one or more dimension tables, which are linked to the fact table through ER (Entity-Relationship)
connections. The fact table contains the measurements or numerical data, while the dimension tables
describe these facts by answering key questions such as “Who,” “What,” “Where,” “When,” and “How.”
Performance improvement techniques used in a star schema - 1. Normalizing Dimensional Tables:
- Normalizing Dimensional Tables: Simplifies semantic representation and user navigation through the
data by normalizing the dimension tables.
Performance improvement techniques used in a star schema - 2. Maintaining Multiple Fact Tables
- Maintaining Multiple Fact Tables: Stores fact tables at different aggregation levels (e.g., region, state, city)
to speed up query operations.
Performance improvement techniques used in a star schema - 3. Denormalizing Fact Tables
- Denormalizing Fact Tables: Improves performance by storing data more efficiently, reducing the number
of records needed to be accessed during queries.
Performance improvement techniques used in a star schema - 4. Partitioning and Replicating Tables
- Partitioning and Replicating Tables: Improves data access by splitting tables into subsets and replicating
them across different locations.