Data Warehousing Flashcards
Data warehouse
A collection of data that helps analysts to make decisions
True or false: Data warehouses store operational databases
False! Data warehouses store historical data.
Features of a data warehouse
Subject-oriented - provides info around a subject rather than operations
Integrated - data from many different sources
Time-variant - data is identified with a particular period (e.g. last 12 months)
Non-volatile – data is not erased when new data is added
Information processing
Processing data via queries or statistical analysis
Analytical processing
Processing data via Online Analytical Processing (OLAP) tools
Data mining
Finding hidden patterns and associations in data
Features of an enterprise data warehouse
o An EDW
o An operational data store
o Data marts
Operational data store
A hybrid data warehouse containing integrated information
Data extraction
Gathering data from a variety of sources
Data cleaning
Finding and correcting errors
Data transformation
Converting data to warehouse format
Data loading
Sorting/summarising/consolidating/checking integrity and building patterns accordingly
Refreshing
Updating data sources to the warehouse
Dimensional modelling
Different individual models (e.g. separate models for sales and inventory)
True or false: Dimensional modelling leads to fewer tables
True! Data is grouped together and includes redundancies.
Fact table
Measurement of business (e.g. sales)
True or false: Fact tables are usually small
False! Fact tables usually range from 1 to 5 TB.
Transaction fact table
Information related to events (e.g. product sales)
Snapshot fact table
Information from specific moments in time (e.g. year-end accounts)
Accumulating snapshot
A running tally (e.g. year to year sales figures)
Dimension table
Describes dimensions of the business (e.g. product, customer)
True or false: Dimension tables are larger than fact tables
False! They are usually smaller than fact tables.
Types of dimension table
o Time o Geography (address/location) o Product o Customer o Range (range of values for time, price, etc.)
Star schema
Dimension tables connected to a central fact table
True or false: A star schema is normalised
Both! The fact table is normalised but the dimension tables aren’t, resulting in faster read queries and analysis.
How does a star schema work?
Fact data is pulled from dimension tables, duplicated and stored in the fact table
Advantages of star schema
o Simpler queries – dimension tables are treated as one large table
o Easier business insights
o Better query performance
o No bottlenecks from normalised schemas
o Faster queries
o Read only
o Can be used to build OLAP cubes
Disadvantages of star schema
o Less accurate data due to normalisation
o Less capable of handling complex queries
o No many-to-many relationships
Snowflake schema
Like a star schema, but the dimension tables connect to other dimension tables
True or false: A snowflake schema is normalised
True! Its purpose is to normalise data in a star schema.
Advantages of snowflake schema
o No integrity issues
o Faster write queries
o Uses less space than star schema
Why are read queries slower in a snowflake schema?
The DBMS needs to search through multiple tables
Roll-up
Removes dimensions in a data cube by:
o Climbing up a hierarchy for a dimension
o Dimension reduction (e.g. orders grouped by city rather than country)
Drill-down
Adds dimensions to a data cube to create more detailed data
Slice
Selects a dimension and creates a sub-cube
Dice
Slicing multiple dimensions
Pivot/rotation
Swapping data axes to provide an alternative presentation of data