L5/6: Database Design Theory Flashcards
What is
Design Theory
about?
How to represent your data
in such a way that you
avoid Anomalies
caused by data redundancy, null values, etc
What are
Armstrongs 4 Rules
used for?
Discovering Functional Dependencies
from Functional Dependencies that are already known
Armstrong’s Rules
(names only)
- Reduction (trivial)
- Transitivity
- Augmentation
- Split/Combine
Armstrong’s Rules:
Reduction
Given a set of attributes: X, Y, Z
If Y is a subset of X (Y is in X),
Then X is Functionally Dependent on Y
X → Y
This is kind of trivial.
Armstrong’s Rules:
Transitivity
Given a set of attributes: X, Y, Z
If X is functionally dependent on Y (X → Y),
and Y is functionally dependent on Z ( Y → Z),
then X is functionally dependent on Z:
X → Y → Z, therefore X → Z
Armstrong’s Rules:
Augmentation
Given a set of attributes: X, Y, Z
If X is functionally dependent on Y (X → Y),
then XZ is functionally dependent on YZ ( XZ → YZ):
X → Y, therefore XZ → YZ for any Z
Armstrong’s Rules:
Split/Combine
Given a set of attributes: X, Y, Z
Combine:
If X is functionally dependent on Y (X → Y),
and X is functionally dependent on Z ( X → Z),
then X is functionally dependent on YZ:
X → Y , and X→Z, therefore X → YZ
Split:
If X is functionally dependent on YZ,(X → YZ),
then X is functionally dependent on Y ( X → Y)
and X is functionally dependent on Z ( X → Z),
X → YZ therefore: X→Y and X → Z
Good Functional Depenedencies
vs
Bad Functional Depenedencies
Good Functional Depenedency:
- Minimal Redundancy
- Less possibility of anomalies
- Example: EmployeeID → Name, Phone
Bad Functional Depenedency:
- A lot of redundancy
- May tie unrelated things together, allowing data anomalies
- Example: Position → PhoneNumber
Functional Dependencies
as Constraints
- A Functional Dependency will hold on some instances, and will not hold on others
- If designed as part of the schema, a FD can help define a valid instance
- Can check if a FD is violated by checking a single instance
- But cannot prove that a FD exists by checking a single instance
Using Functional Dependencies
for Database Schema Design:
General Steps
- Start with some Relational Schema
- Determine it’s Functional Dependencies (FDs)
- Use the FDs to design a better schema
- One which minimizes the possibility of anomalies
Definition:
Closure of a
Set of Attributes
Given a set of attributes: {A1, …, An }
and a set of FDs, F
The Closure: {A1, …, An }+
is the set of attributes B such that
{A1, …, An } → B
So, the closure is the set of attributes dependent on attributes of A
Algorithm for
Computing Attribute Closure
closure = x;
repeat until there is no change:
if there is an FD: U → V in F
such that U is a subset of the Closure,
then set closure = closure UNION V
Translation:
If an FD implies another attribute that is not in the closure, add it to the closure
Types of Anomalies
- Update Anomaly
- Deletion Anomaly
- Insertion Anomaly
- Redundancy
Anomaly Types:
Update Anomaly
Updating one tuple causes inconsistent data
Anomaly Types:
Deletion Anomaly
Deleting one kind of information makes you lose another
Anomaly Types:
Insertion Anomaly
Storing one kind of info depends on storing another
Anomaly Types:
Redundancy
Storage of multiple instances
of redundant information
What is a
Functional Dependency (FD) ?
- An Integrity Constraint
- not explicity expressed in the E/R diagram
- Occurs when one attribute in a relation uniquely determines another attribute
- Two attributes always occur in the same pair
- Example: classroom and course
- A → B
- If A, then B
Superkeys
and
Keys
Superkey
a set of attributes {A1, …, An} such that
for any other attribute B in a relation R,
{A1, …, An } → B
- All attributes are functionally dependent by a superkey
- A Key is a minimal superkey,
- no subset of the key is a superkey
Steps for finding
Keys and Superkeys
For each set of attributes X:
- Compute X+ (The Closure)
- If X+ = the set of all attributes, then X is a Superkey
- If X is minimal, then X is a Key
Functional Dependency:
Formal Definition
Let A, B be sets of attributes
We write A → B, or say “A functionally determines B”
If, for any tuples t1 and t2 :
t1[A] = t2[A] implies t1[B] = t2[B]
We call A → B a Functional Dependency
First Normal Form
(1NF)
A Relation is in First Normal Form (1NF) if:
Every field contains only Atomic Values
(No lists or sets)
This is an implicit requirement in the Relational Model.
Pretty simple: One column, one value
Second Normal Form (2NF)
A Relation R is in Second Normal Form (2NF) if:
Every non-key attribute in R is has Full Funtional Dependency on the Candidate Key of R
A non-2NF schema can be decomposed to 2NF
Full Functional Dependency
vs
Partial Functional Dependency
X → Y is a:
Full Functional Dependency
if removing attribute A from X makes X → Y not hold
Partial Functional Dependency
if after removing attribute A from X,
X → Y still holds
Decomposing a
Non-2NF Schema
to 2NF
To be in Second Normal Form, all attributes must be Fully Dependent on the Candidate Key
Steps:
- Identify which attributes are not Fully Dependent on a key
- Seperate into tables that are Fully Dependent
Decompose the following non-2NF schema:
Employee(ssn, pno, hours, ename, plocation)
into 2NF
- Identify candidate key: (ssn, pno)
- Identify Full Dependencies:
- (ssn, pno) → (hours, ename, plocation)
- ssn → ename
- pno → plocation
- Split into other tables;
- T1(ssn, pno, hours)
- T2(ssn, ename)
- T3(pno, plocation)
Third Normal Form
(3NF)
A Relation R, with Functional Dependencies F,
is in 3NF if:
For all Dependencies X→A in the Closure F+ :
- A is a part of X (trivial FD)
- X contains a key for R (X is a superkey) or
- A is part of some key for R
R should not have a non-key attribute that is functionally determined by another non-key attribute
Normal Forms:
1NF, 2NF, 3NF Summaries
- 1NF
- Remove non-atomic values
- 2NF
- Remove partial dependencies
- 3NF
- Remove transitive dependencies
First Normal Form (1NF):
- Test
- Remedy
Test
Should NOT have non-atomic values
Remedy
Create relations for each non-atomic attribute
Second Normal Form( 2NF):
- Test
- Remedy
Test
No non-key attributes are functional dependencies on just a part of primary key
Remedy
Decompose and create a new relation for each partial dependency
Third Normal Form (3NF):
- Test
- Remedy
Test
Should NOT have non-key attribute FD by another non-key attribute
Remedy
Decompose, create relation including non-key attributes that are functionally dependent on other non-keys
Boyce-Codd Normal Form (BCNF)
A Relation R is in BCNF if:
For all Functional Dependencies(X→A) in the Closure(F+):
- A is part of X (trivial FD)
- X contains a key for R, X is a superkey
Intuitively:
The only non-trivial dependencies are those in which a key determines some attributes.
Each tuple can be seen as an entity or relationship, identified by a key and described by attributes.
BCNF:
Relation Diagram
*Need an image
Each attribute is exclusively functionally dependent on the key
Motivation for
decomposing data into BCNF
- Redundancies in data can lead to data anomalies
- Bad functional dependencies can be a factor
- We want to detect and remove redundancies
- Sometimes, decomposition can lead to subtle and unwanted effects
Decomposition:
Two Types
- Lossless/Lossless-Join
- Dependency Preserving
Decomposition:
Checks
It is important to check that a decomposition doesn’t introduce new problems
Checks:
Good decomposition should allow us to:
- Recover the original relation
- Check integrity constraints efficiently
Decompositions:
- Basic Structure
- Projections
For an original relation:
R( [key attributes] , [Attr. Set A] , [Attr. Set B] )
R can be decomposed into two new relations, each using the Key of R as their own Keys:
R1( [key attributes], [Attr. Set A] )
R2( [key attributes], [Attr. Set B] )
These are called the Projections of R
Dependency Preservation
during Decomposition
A Functional Dependency(FD) X→Y in a relation R,
is Preserved in a projection Rn if:
Rn contains all of the attributes of X and Y
A FD can therefore be checked by accessing only R
True or False:
BNCF Decomposition is
always lossless
TRUE!
All Functional Dependencies in a BNCF table are already dependent on the superkey
This means that all tables would have the same key:
R = R1 Join R2 Join … Join Rn
Dependency Checking
after
Decomposition
A decomposition is Dependency Preserving if:
the set of new Functional Dependencies, F++
is the same as the set F+ from the original table
F++ = F+
BCNF
Normalization
Algorithm
Starting with a relation R, not in BCNF:
Let X in R,A be a single attribute in R,
and X → A be an FD that causes a violation of BCNF
Decompose into R-A and XA
If either R-A or XA is not in BCNF, decompose those Relations
Functional Dependencies:
Useful Rules derived from
Armstrong’s Axioms
-
Decomposition
- If (A1, …, An) -> (B1, …, Bm) then
- (A1, …, An) -> B1 , (A1, …, An)
- (A1, …, An) -> B2 , (A1, …, An)
- etc
- If (A1, …, An) -> (B1, …, Bm) then
-
Union
- If (A1, …, An) -> B1, …, (A1, …, An)-> Bm
- then (A1, …, An) -> (B1, …, Bm)
-
Pseudotransitivity
- If (A1, …, An) -> (B1, …, Bm)
- and (B1, …, Bm), (C1, …, Ck) -> (D1, …, Dp)
- then
- (A1, …, An)(C1, …, Ck) -> (D1, …, Dp)