Lecture 5 - A Flashcards
What is normalization?
- Goal of removing any possibility of redundancy when designing a database.
If the data has data redundancy and is not normalized, then:
- It it is difficult to trust and updated it without facing data anomalies.
- Insertion, update and deletion anomalies are very frequent if a database is not normalised.
Definition: Direct Redundancy
If data in the database exist in two or more places.
Data Redundancy
If data can be calculated from other data items.
Data Integrity
The data in the database is consistent and satisfy integrity constraints.
Why is redundancy a bad thing?
When you modify data in case of redundancy, you must do so in more than one place, opening the possibility of data becoming inconsistent across the database.
Definition: Primary Key
A column or set of columns that identify a particular row in a table.
Minimum candidate key.
Super Key
Single key or set of multiples keys that identifies rows of a table. Super key is a superset of a candidate key. It is any group of columns that have unique values together.
Candidate Key
A set of attributes which identify tuples of a table uniquely. It is essentially a super key without repeated attributes.
Let us say we have several columns that individually have unique rows each: Any of them can therefore be a candidate key.
If any proper subset of a super key is a super key then that key cannot be a candidate key.
Foreign Key
It is an attribute in a table which is used to define its relationship with another table.
If you attempt to change the value of a foreign key in another table, the database will not allow it.
Composite and Compound key
A composite key is derived from a combination of two or more columns that combined make a unique column, which individually does not provide uniqueness.
Alternate keys
Candidate keys that are not primary keys.
Functional dependency
Knowing the value of attribute A you can immediately look up the value of attribute B. eg: If you have the primary key, you can look up any other column-value in that row.
Categorical Data
Collection of data that is divided into distinct groups.
1NF
4 basic rules:
- Each column must be single-valued
- A columns should contain values that are of the same type. Make sure other columns don’t hold the same type.
- Each column should have a unique name.
- Order in which data is saved doesn’t matter.
2NF
2 rules:
- Respect 1NF
- Should not have partial dependency (each non key field must reference the same thing as the primary key)
- Each non key field must depend on the entire primary key.
Creating another table would solve the issue.
3NF
2 rules:
- Respect 2NF
- No transitive dependency on the primary key.
- Each non prime attribute in a table should depend on every candidate key, it should never depend non part of a candidate key, and it should never depend on other non-prime attributes.
Trick: Can you tell the attribute of that column from the primary key only? or can you get that information from other columns? non prime to non prime derivation.
Boyce-Codd NF
A stronger version of 3NF
- Each attribute in a table should depend on every candidate key, it should never depend non part of a candidate key, and it should never depend on other non-prime attributes.
4NF
- 3NF
- Multivalued dependencies in a table must be multivalued dependencies on the key.
5NF
- Can the table be the result of a few inner joins? Then it is not in 5NF.
- In order to be in 5NF, a table must be in its intact state so that it is not able to be divided.
- If it is able to be divided then it does not even repeat 4NF in the first place.
- Car example, buying pieces from a company. (SPC TABLE)
Denormalisation
Addresses a fundamental fact in databases: Read and join operations are slow.
Pros and cons of Denormalisation:
Pros:
- Faster data reads
- Simpler queries
- Data available quickly
Cons:
- Slower writes
- More database complexity
- Requires more storage
Denormalisation properties:
- Maintaining history (keeps old values stored)
- Improving query perfomance
- Speeding up reporting
- Computing commonly-needed values up front