CHAPTER 15 Normalization Flashcards
What is relational database design?
The grouping of attributes to form “good” relation schemas
Two levels of relation schemas
The logical “user view” level
The storage “base relation” level
Logical level
how users interpret the relation schemas
and the meaning of their attributes.
Storage level
how the tuples in a base relation are stored and updated
Database design approaches
bottom-up design
top-down design
bottom up design
considers the basic relationships among individual attributes as the starting point and uses those to construct relation schemas. not popular
top down design
starts with a number of groupings of
attributes into relations that exist together naturally
The implicit goals
of the design activity
information preservation, and minimum redundancy
informal guidelines for good relational design
Making sure that the semantics of the attributes is clear in the schema
Reducing redundant info. in tuples
Reducing the Null values in tuples
Disallowing the possibility of generating spurious tuples
GUIDELINE 1: Informally, each tuple in a relation should represent one entity or relationship instance. What are the problems if this is not followed?
Mixing attributes of multiple entities may cause problems
Information is stored redundantly wasting storage
Problems with update anomalies
Insertion anomalies
Deletion anomalies
Modification anomalies
Guideline 2: Design the base relation schemas so that no insertion, deletion, or modification anomalies are present in the relations. If there are anomalies what should you do?
If any anomalies are present, note them
clearly and make sure that the programs that update the database will operate correctly
Guildeline 3: As far as possible, avoid placing attributes in a base relation whose values may frequently be NULL. If there are null values what should you do?
If NULLs are unavoidable, make sure that they apply in exceptional cases only and do not apply to a majority of tuples in the relation.
spurious tuples
Bad designs for a relational database may result in erroneous results for certain JOIN operations
GUIDELINE 4: The relations should be designed to satisfy the lossless join condition.
No spurious tuples should be generated by doing a natural-join of any relations.
There are two important properties of decompositions
non-additive or losslessness of the corresponding join
preservation of the functional dependencies.
Functional dependencies
constraint between two sets of attributes from the
database
are used to define normal forms for relations
A set of FDs is minimal if it satisfies the following conditions:
Every dependency in F has a single attribute for its RHS
We cannot remove any dependency from F and have a set of dependencies that is equivalent to F.
We cannot replace any dependency X -> A in F with a dependency Y -> A, where Y proper-subset-of X ( Y subset-of X) and still have a set of dependencies that is equivalent to F.
Normalization
The process of decomposing unsatisfactory “bad” relations by breaking up their attributes into smaller relations
Normal form
Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form
Normalization of data
process of analyzing the given relation schemas based on their FDs and primary keys to achieve the desirable properties of (1) minimizing redundancy and (2) minimizing the insertion, deletion, and update anomalies
lossless join property
guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition.
dependency preservation property
ensures that each functional dependency is represented in some individual relation resulting after decomposition.
Inference Rules for FD
Decomposition If X -> YZ, then X -> Y and X -> Z
Union If X -> Y and X -> Z, then X -> YZ
Psuedotransitivity If X -> Y and WY -> Z, then WX -> Z
Normalization use
carried out in practice so that the resulting designs are of high quality and meet the desirable properties
Denormalization
the process of storing the join of higher normal form relations as a base relation
superkey
set of attributes that makes sure the tuples are unique.
key
superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
candidate key
If a relation schema has more than one key, each is called a candidate key
1st Normal Form
Disallows composite attributes, multivalued attributes, and nested relations; attributes whose values for an individual tuple are non-atomic
2nd Normal Form
if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
3rd Normal Form
if it is in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
BCNF
A relation schema R is in BCNF if whenever a nontrivial functional dependency X → A holds in R, then X is a superkey of R.