Chapter 2 - Normalization Flashcards
Normalization
RAOC
- The process of organizing tables in a relational database
- Eliminates data redundancy
- Decreases record locking
- Increases efficiency in concurrency
- Reduces data anomalies
Characteristics of a Normalized Database
DEIL
- detailed data
- data efficiently
- data integrity
- Results in a lot of tables
- Excellent for managing our day to day activities
Drawbacks of a Normalized Database
- Inefficient for data extraction, poor data analysis (Because it is stored in the most detailed format)
- stored in multiple databases more data is added queries become even more inefficient
Dimensional Modeling
data warehouse databases are designed using the Dimensional Model while Relational databases (OLTP) are designed using the Entity Relationship (ER) model
star schema
- A single central table called a fact table
- Fact table is surrounded by multiple tables called dimensions
- A fact table represents a multi dimensional hypercube
Shared or Conformed dimensions
Dimensions with connections to multiple fact tables are called shared or conformed
dimensions
Snowflake vs Star schema
If you imagine multiple dimensions brought to 3NF with a central fact table you
end up with a Snowflake schema. If you take a Star schema and normalize it, you end up with a
Snowflake schema.
Hybrid Schema
A hybrid schema occurs when you normalize only part of the design to accommodate shared
dimensions.
Star vs Snowflake vs Hybrid
• Use the Star schema if you can
• Use the Hybrid schema if you need to share a dimension with multiple dimensions
• Only use the Snowflake schema for quick Proof of Concept projects
o Because the design is closest to your source database that is already in 3NF and will
therefore require less time to construct and will be easier to load data
Granularity
the level of detail that data is stored in fact tables /dimensions.
The lower the level of detail, the larger the data amount in the fact table.
Dimensionality
The dimensional granularity of fact table is dependent on the number of the dimensions connected to
the fact table.