Normalisation Flashcards
3
Normalisation
It’s the process of organizing data efficiently to avoid redundancy and maintain accuracy.
Normal forms
Normalization is divided into different “normal forms” (NF) to guide the process of organizing data efficiently.
First Normal Form (1NF)
Rule: Eliminate repeating groups of data and ensure each column holds atomic values (indivisible).
Example: Consider a table for storing customer orders. Instead of having multiple columns for different items ordered (e.g., Item1, Item2, Item3), each order should have its own row with a separate column for each item ordered
Second Normal Form (2NF)
Rule: Meet the requirements of 1NF and ensure that non-key attributes depend on the entire primary key.
Example: Continuing with the customer orders example, if we have a composite primary key (e.g., OrderID and ProductID), each non-key attribute (e.g., OrderDate, ProductPrice) should depend on both the OrderID and ProductID together, not just one of them.
Third Normal Form (3NF)
Rule: Meet the requirements of 2NF and ensure that non-key attributes depend only on the primary key, not on other non-key attributes.
Example: Extending the customer orders table, if we have a non-key attribute like CustomerName, it should depend solely on the OrderID (primary key), not on any other attributes like ProductID.
Redudancy
Redundancy refers to the unnecessary repetition or duplication of information within a dataset or system.
Update anomalies
Insertion
Deletion
Modification
Insertion anomalies
It means you can’t insert new data into a database without having to provide additional, possibly unnecessary information.
Deletion anomalies
Happen when removing data from a database leads to unintentional loss of other related data.
Modification anomalies
When making changes to data in a database leads to inconsistencies or errors.
Insertion anomalies example
Imagine you have a table to store information about students and the courses they are enrolled in. Each row represents a student-course combination. Let’s say the table has columns for StudentID, StudentName, CourseID, and CourseName.
An insertion anomaly might occur if you try to add a new student who hasn’t enrolled in any courses yet. Since the table is designed to store both student and course information together, you can’t insert the new student without also specifying at least one course they are enrolled in. This forces you to provide unnecessary course information just to add the new student
Deletion anomalies examples
Imagine you have a table storing employee information, including their department. Each row represents an employee, with columns for EmployeeID, EmployeeName, and Department. Now, let’s say an employee leaves the company, and you delete their row from the table.
If this employee was the only one in their department, deleting their record would also remove the department information from the table. This unintended loss of department data is a deletion anomaly because you didn’t mean to lose department information, but it happened because it was tied to the employee being deleted.
To prevent deletion anomalies, you might need to redesign the database to store department information separately or handle deletion operations more carefully to ensure related data isn’t lost unintentionally.
Modification anomalies example
Imagine you have a table storing customer orders, including order details like OrderID, CustomerName, and ProductName. Now, let’s say a customer changes their name, and you need to update it in the database.
If you only update the customer’s name in one place (e.g., in the table where orders are stored), you might end up with inconsistencies if the same customer’s name appears in other tables. For instance, if the customer’s name is also stored in a separate table for customer information, and you forget to update it there, you’d have inconsistent data. This inconsistency is a modification anomaly because the modification (updating the customer’s name) didn’t propagate properly across all relevant places.
Transitive dependency
A relationship between between three columns: A (the
PK), B and C.
Hence, a transitive dependency occurs when a non-PK
column is functionally dependent on another non-PK
column