Normalization Flashcards
Data cleaning, First Normal Form, Second Normal Form, Third Normal Form
data cleaning
fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset
First Normal Form
Table format, no repeating groups, and PK
identified
Second Normal Form
1NF and no partial dependencies
Third Normal Form
2NF and no transitive dependencies
Boyce-Codd normal form (BCNF)
every determinant is a candidate key (special case of 3NF)
Fourth normal form (4NF)
3NF and no independent multivalued
dependencies
What is normalization?
process for evaluating and decomposing each table (relation) into
multiple tables (relations) to minimize data redundancies
Why is normalization required?
when we build the database, we have raw data that’s redundant, inconsistent, and lack data integrity.
ex: spreadsheets don’t enforce datatypes or ranges
each table should consists of what 2 things?
primary key, set of mutually independent attributes
safety levels checks through normal form
(least important –> most important)
!NF –> 2NF –> 3NF –> 4NF –> 5NF
Fifth Normal Form (5NF) and domain-key normal form are defined but not . . .
generally enforced