Normalisation Chapter Flashcards
What does normalisation ensure?
No data redundancy
Thus removing the possibility of update anomalies
Normalisation helps to identify a suitable set of relations to represent data in the database
Stronger definition of 3NF was subsequently defined
Boyce-Codd Normal Form
Database schema
Consists of a group of relations
Relation
Consist of a set of attributes
When the Data requirements of an organisation are identified, how are these attributes grouped into suitable relations?
The common sense of the database designer.
By mapping ER diagrams onto relations.
Functional dependencies
a functional dependency is a constraint between two sets of attributes in a relation from a database
Let R be
NewStudent(stuId, lastName, major, credits, status, socSecNo)
FDs in R include
{stuId}→{lastName}, but not the reverse
{stuId} →{lastName, major, credits, status, socSecNo, stuId}
{socSecNo} →{stuId, lastName, major, credits, status, socSecNo}
{credits}→{status}, but not {status}→{credits}
Normalisation
Is the process of testing the correctness of a logical data model
What attributes are classed
Key attributes
Non-key attributes
Normalisation is a formal method
It identifies relations based on:
Primary key and the functional dependencies between their attributes
Update anomalies
To minimise data redundancy thus reducing file storage space
What are the three categories of update anomalies
Insertion anomalies
Delete anomalies
Modification anomalies
Insert a branch that currently has no members of staff into the staffBranch relation
To do this you must enter NULL in the attributes of staff but Staff_No is a primary key, primary key may not be NULL
If we remove a staff member from the StaffBranch relation we also remove information about the branch at which they work.
If the staff member happened to be the last member at this branch
What will happen?
We lose all details of that branch from the database
Change the telephone number for branch B3 in the StaffBranch relation
We must update the rows of all staff located at branch B3
If some of the rows are not updated, this results in inconsistent data
A -> B
What does this tells us?
A is said to be the determinant
B is said to be the dependent
Un-normalisation form
Contains one or more repeating groups
Attributes values are non-atomic
A relation is in 1NF if
It contains no repeating groups
All non key attributes are functionally dependent on the primary key
A relation is in 2NF if
It is in 1NF
All non key attributes are fully functionally dependent on the primary key
B is fully functionally dependent on A if
B is functionally dependent on A and not any subset of A
B is partially dependent on A if
some attribute can be removed from A and the dependency still holds
A relation is in 3NF if
It is in 2NF
Non Key attributes are not transitively dependent on the primary key
If A,B and C are attributes of a relation and
A->B and B->C
Then C is transitively dependent on A via B
Key attributes/non-key attributes example
Key attributes: A key attribute is the unique characteristic of the entity. For ex. Name and hire date are attributes of the entity Employee
Non-Key Attributes: Non-key attributes are attributes that are not part of a key. Consider attributes for first name, last name, birth date;
Full functional dependency example
Definition: A full functional dependency occurs when you already meet the requirements for a functional dependency and the set of attributes on the left side of the functional dependency statement cannot be reduced any farther
Examples: For example, “{SSN, age} -> name” is a functional dependency, but it is not a full functional dependency because you can remove age from the left side of the statement without impacting the dependency relationship.
Transitive dependency example
a transitive dependency is a functional dependency which holds by virtue of transitivity
A → B
It is not the case that B → A
B → C
Insertion anomaly
user is unable to insert a new record of data when it should be possible to do so because not all other information is available
Deletion anomaly
Deletion anomaly – when a record is deleted, other information that is tied to it is also deleted
Update anomaly
Update anomaly –a record is updated, but other appearances of the same items are not updated
anomaly
An anomaly is an inconsistent, incomplete, or contradictory state of the database
database
A database is a collection of records stored on some type of media. Storage in the past has included punch cards, paper tape, magnetic tapes and disks.
Advantages and Limitations for database
Advantages
Reduced data redundancy
Reduced updating errors and increased consistency
Greater data integrity and independence from applications programs
Improved data access to users through use of host and query languages
Improved data security
Reduced data entry, storage, and retrieval costs
However, the following can be viewed as some of the limitations of a database:
Disadvantages
Database systems are complex, difficult, and time-consuming to design
Substantial hardware and software start-up costs
Damage to database affects virtually all applications programs
Extensive conversion costs in moving form a file-based system to a database system
Initial training required for all programmers and users
Stages in Creating a Database
Data analysis,Physical implementation
An entity
An entity is an instance of a given entity type
An entity occurrence example
An entity occurrence is an instance of an entity
eg: Billy Jones (ie: SN12345, Billy, Jones, 18/08/1950)
attribute
An attribute is an item of information which is stored about an entity
Entity Integrity
Entity integrity is a basic constraint of database relational model (abbreviated RM) that refers to the morphology of the primary key.
Referential Integrity
Referential integrity is a relational database concept, which states that table relationships must always be consistent. In other words, any foreign key field must agree with the primary key that is referenced by the foreign key.
cardinality
uniqueness of data values contained in a particular column (attribute) of a database table
participation
A relationship instance is two entities of one or two types associated by virtue of a defined relationship between them
Structural independence
Structural independence exists when it is possible to make changes in the file structure without affecting the application programs ability to access the data
relational model
The relational model (RM) for database management is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by Edgar F. Codd.[1][2] In the relational model of a database, all data is represented in terms of tuples, grouped into relations.
The purpose of the relational model is to provide a declarative method for specifying data and queries: users directly state what information the database contains and what information they want from it, and let the database management system software take care of describing data structures for storing the data and retrieval procedures for answering queries.
Phantom Reads
Phantom Reads
Phantom reads occur when an insert or delete action is performed against a row that belongs to a range of rows being read by a transaction.
For example, an editor makes changes to a document submitted by a writer, but when the changes are incorporated into the master copy of the document by the production department, they find that new unedited material has been added to the document by the author.
Locking Protocol
A locking protocol is a set of rules followed by all transactions while
requesting and releasing locks. Locking protocols restrict the set of
possible schedules.
Deadlock , avoided , prevented,
In a database, a deadlock is a situation in which two or more transactions are waiting for one another to give up locks. For example, Transaction A might hold a lock on some rows in the Accounts table and needs to update some rows in the Orders table to finish.
avoided : resources currently available;
resources currently allocated to each process;
resources that will be required and released by these processes in the future.
prevented: To prevent any deadlock situation in the system, the DBMS aggressively inspects all the operations, where transactions are about to execute
Entity
Is an instance of a given entity type
Entity type
Is a category of a thing or object for example students, horses
Relationships
Is some association between entities
transaction and what properties
A transaction symbolizes a unit of work performed within a database management system (or similar system) against a database
properties: ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably.
Lost updates examples
Lost updates occur when two or more transactions select the same row and then update the row based on the value originally selected. Each transaction is unaware of other transactions. The last update overwrites updates made by the other transactions, which results in lost data
For example, two editors make an electronic copy of the same document
Uncommitted Dependency
Uncommitted Dependency (Dirty Read) Uncommitted dependency occurs when a second transaction selects a row that is being updated by another transaction. The second transaction is reading data that has not been committed yet and may be changed by the transaction updating the row. For example, an editor is making changes to an electronic document.
Inconsistent Analysis
Inconsistent Analysis (Nonrepeatable Read) Inconsistent analysis occurs when a second transaction accesses the same row several times and reads different data each time. Inconsistent analysis is similar to uncommitted dependency in that another transaction is changing the data that a second transaction is reading. However, in inconsistent analysis, the data read by the second transaction was committed by the transaction that made the change. Also, inconsistent analysis involves multiple reads (two or more) of the same row and each time the information is changed by another transaction; thus, the term nonrepeatable read. For example, an editor reads the same document twice, but between each reading, the writer rewrites the document.
Attributes
An entity is characterised by a number of attributes
Cardinality
Cardinality concerns the number of instances of an entity involved in a relationship
Weak entity type
Is an entity type whose existence depends on the existence of another two entities
Participation give example
A relationship instance is two entities of one or types associated by virtue of a defined relationship between them
Eg. Catherine Horgan cares people’s
Mandatory membership example
Membership of an entity type in a relationship is mandatory if each entity of a type must participate in an instance under that relationship
Eg. A lecturer teaches at least one module
Optional membership example
Membership of an entity type in a relationship is optional if entities of a type can exist without participating in an instance under that relationship
Ex. A lecturer may teach some module.
Ex. A lecturer may not teach any modules
Data independence
Data independence is the type of data transparency that matters for a centralized DBMS
Why normalise a relational scheme
Helps to identify a set of relatives data in the database
2PL locking protocol
two-phase locking (2PL) is a concurrency control method that guarantees serializability
By the 2PL protocol locks are applied and removed in two phases:
Expanding phase: locks are acquired and no locks are released.
Shrinking phase: locks are released and no locks are acquired.
Two types of locks are utilized by the basic protocol: Shared and Exclusive locks. Refinements of the basic protocol may utilize more lock types. Using locks that block processes, 2PL may be subject to deadlocks that result from the mutual blocking of two or more transactions.
Sql
Mobile-friendly - SQL (Structured Query Language) is a standard interactive and programming language for getting information from and updating a database
Database management system
A database management system (DBMS) is system software for creating and managing databases
Data independence
Data independence is the type of data transparency that matters for a centralized DBMS
Why normalise a relational scheme
Helps to identify a set of relatives data in the database
2PL locking protocol
two-phase locking (2PL) is a concurrency control method that guarantees serializability
By the 2PL protocol locks are applied and removed in two phases:
Expanding phase: locks are acquired and no locks are released.
Shrinking phase: locks are released and no locks are acquired.
Two types of locks are utilized by the basic protocol: Shared and Exclusive locks. Refinements of the basic protocol may utilize more lock types. Using locks that block processes, 2PL may be subject to deadlocks that result from the mutual blocking of two or more transactions.
Sql
Mobile-friendly - SQL (Structured Query Language) is a standard interactive and programming language for getting information from and updating a database
Database management system
A database management system (DBMS) is system software for creating and managing databases
Data independence
Data independence is the type of data transparency that matters for a centralized DBMS
Why normalise a relational scheme
Helps to identify a set of relatives data in the database
2PL locking protocol
two-phase locking (2PL) is a concurrency control method that guarantees serializability
By the 2PL protocol locks are applied and removed in two phases:
Expanding phase: locks are acquired and no locks are released.
Shrinking phase: locks are released and no locks are acquired.
Two types of locks are utilized by the basic protocol: Shared and Exclusive locks. Refinements of the basic protocol may utilize more lock types. Using locks that block processes, 2PL may be subject to deadlocks that result from the mutual blocking of two or more transactions.
Sql
Mobile-friendly - SQL (Structured Query Language) is a standard interactive and programming language for getting information from and updating a database
Database management system
A database management system (DBMS) is system software for creating and managing databases
Data independence
Data independence is the type of data transparency that matters for a centralized DBMS
Why normalise a relational scheme
Helps to identify a set of relatives data in the database
2PL locking protocol
two-phase locking (2PL) is a concurrency control method that guarantees serializability
By the 2PL protocol locks are applied and removed in two phases:
Expanding phase: locks are acquired and no locks are released.
Shrinking phase: locks are released and no locks are acquired.
Two types of locks are utilized by the basic protocol: Shared and Exclusive locks. Refinements of the basic protocol may utilize more lock types. Using locks that block processes, 2PL may be subject to deadlocks that result from the mutual blocking of two or more transactions.
Sql
Mobile-friendly - SQL (Structured Query Language) is a standard interactive and programming language for getting information from and updating a database
Database management system
A database management system (DBMS) is system software for creating and managing databases
Data independence
Data independence is the type of data transparency that matters for a centralized DBMS
Why normalise a relational scheme
Helps to identify a set of relatives data in the database
2PL locking protocol
two-phase locking (2PL) is a concurrency control method that guarantees serializability
By the 2PL protocol locks are applied and removed in two phases:
Expanding phase: locks are acquired and no locks are released.
Shrinking phase: locks are released and no locks are acquired.
Two types of locks are utilized by the basic protocol: Shared and Exclusive locks. Refinements of the basic protocol may utilize more lock types. Using locks that block processes, 2PL may be subject to deadlocks that result from the mutual blocking of two or more transactions.
Sql
Mobile-friendly - SQL (Structured Query Language) is a standard interactive and programming language for getting information from and updating a database
Database management system
A database management system (DBMS) is system software for creating and managing databases
Data independence
Data independence is the type of data transparency that matters for a centralized DBMS
Why normalise a relational scheme
Helps to identify a set of relatives data in the database
2PL locking protocol
two-phase locking (2PL) is a concurrency control method that guarantees serializability
By the 2PL protocol locks are applied and removed in two phases:
Expanding phase: locks are acquired and no locks are released.
Shrinking phase: locks are released and no locks are acquired.
Two types of locks are utilized by the basic protocol: Shared and Exclusive locks. Refinements of the basic protocol may utilize more lock types. Using locks that block processes, 2PL may be subject to deadlocks that result from the mutual blocking of two or more transactions.
Sql
Mobile-friendly - SQL (Structured Query Language) is a standard interactive and programming language for getting information from and updating a database
Database management system
A database management system (DBMS) is system software for creating and managing databases
Data independence
Data independence is the type of data transparency that matters for a centralized DBMS
Why normalise a relational scheme
Helps to identify a set of relatives data in the database
2PL locking protocol
two-phase locking (2PL) is a concurrency control method that guarantees serializability
By the 2PL protocol locks are applied and removed in two phases:
Expanding phase: locks are acquired and no locks are released.
Shrinking phase: locks are released and no locks are acquired.
Two types of locks are utilized by the basic protocol: Shared and Exclusive locks. Refinements of the basic protocol may utilize more lock types. Using locks that block processes, 2PL may be subject to deadlocks that result from the mutual blocking of two or more transactions.
Sql
Mobile-friendly - SQL (Structured Query Language) is a standard interactive and programming language for getting information from and updating a database
Database management system
A database management system (DBMS) is system software for creating and managing databases
Data independence
Data independence is the type of data transparency that matters for a centralized DBMS
Why normalise a relational scheme
Helps to identify a set of relatives data in the database
2PL locking protocol
two-phase locking (2PL) is a concurrency control method that guarantees serializability
By the 2PL protocol locks are applied and removed in two phases:
Expanding phase: locks are acquired and no locks are released.
Shrinking phase: locks are released and no locks are acquired.
Two types of locks are utilized by the basic protocol: Shared and Exclusive locks. Refinements of the basic protocol may utilize more lock types. Using locks that block processes, 2PL may be subject to deadlocks that result from the mutual blocking of two or more transactions.
Sql
Mobile-friendly - SQL (Structured Query Language) is a standard interactive and programming language for getting information from and updating a database
Database management system
A database management system (DBMS) is system software for creating and managing databases
Data independence
Data independence is the type of data transparency that matters for a centralized DBMS
Why normalise a relational scheme
Helps to identify a set of relatives data in the database
2PL locking protocol
two-phase locking (2PL) is a concurrency control method that guarantees serializability
By the 2PL protocol locks are applied and removed in two phases:
Expanding phase: locks are acquired and no locks are released.
Shrinking phase: locks are released and no locks are acquired.
Two types of locks are utilized by the basic protocol: Shared and Exclusive locks. Refinements of the basic protocol may utilize more lock types. Using locks that block processes, 2PL may be subject to deadlocks that result from the mutual blocking of two or more transactions.
Sql
Mobile-friendly - SQL (Structured Query Language) is a standard interactive and programming language for getting information from and updating a database
Database management system
A database management system (DBMS) is system software for creating and managing databases
Data independence
Data independence is the type of data transparency that matters for a centralized DBMS
Why normalise a relational scheme
Helps to identify a set of relatives data in the database
2PL locking protocol
two-phase locking (2PL) is a concurrency control method that guarantees serializability
By the 2PL protocol locks are applied and removed in two phases:
Expanding phase: locks are acquired and no locks are released.
Shrinking phase: locks are released and no locks are acquired.
Two types of locks are utilized by the basic protocol: Shared and Exclusive locks. Refinements of the basic protocol may utilize more lock types. Using locks that block processes, 2PL may be subject to deadlocks that result from the mutual blocking of two or more transactions.
Sql
Mobile-friendly - SQL (Structured Query Language) is a standard interactive and programming language for getting information from and updating a database
Database management system
A database management system (DBMS) is system software for creating and managing databases
Data independence
Data independence is the type of data transparency that matters for a centralized DBMS
Why normalise a relational scheme
Helps to identify a set of relatives data in the database
2PL locking protocol
two-phase locking (2PL) is a concurrency control method that guarantees serializability
By the 2PL protocol locks are applied and removed in two phases:
Expanding phase: locks are acquired and no locks are released.
Shrinking phase: locks are released and no locks are acquired.
Two types of locks are utilized by the basic protocol: Shared and Exclusive locks. Refinements of the basic protocol may utilize more lock types. Using locks that block processes, 2PL may be subject to deadlocks that result from the mutual blocking of two or more transactions.
Sql
Mobile-friendly - SQL (Structured Query Language) is a standard interactive and programming language for getting information from and updating a database
Database management system
A database management system (DBMS) is system software for creating and managing databases
Data independence
Data independence is the type of data transparency that matters for a centralized DBMS
Why normalise a relational scheme
Helps to identify a set of relatives data in the database
2PL locking protocol
two-phase locking (2PL) is a concurrency control method that guarantees serializability
By the 2PL protocol locks are applied and removed in two phases:
Expanding phase: locks are acquired and no locks are released.
Shrinking phase: locks are released and no locks are acquired.
Two types of locks are utilized by the basic protocol: Shared and Exclusive locks. Refinements of the basic protocol may utilize more lock types. Using locks that block processes, 2PL may be subject to deadlocks that result from the mutual blocking of two or more transactions.
Sql
Mobile-friendly - SQL (Structured Query Language) is a standard interactive and programming language for getting information from and updating a database
Database management system
A database management system (DBMS) is system software for creating and managing databases
Data independence
Data independence is the type of data transparency that matters for a centralized DBMS
Why normalise a relational scheme
Helps to identify a set of relatives data in the database
2PL locking protocol
two-phase locking (2PL) is a concurrency control method that guarantees serializability
By the 2PL protocol locks are applied and removed in two phases:
Expanding phase: locks are acquired and no locks are released.
Shrinking phase: locks are released and no locks are acquired.
Two types of locks are utilized by the basic protocol: Shared and Exclusive locks. Refinements of the basic protocol may utilize more lock types. Using locks that block processes, 2PL may be subject to deadlocks that result from the mutual blocking of two or more transactions.
Sql
Mobile-friendly - SQL (Structured Query Language) is a standard interactive and programming language for getting information from and updating a database
Database management system
A database management system (DBMS) is system software for creating and managing databases
Data independence
Data independence is the type of data transparency that matters for a centralized DBMS
Why normalise a relational scheme
Helps to identify a set of relatives data in the database
2PL locking protocol
two-phase locking (2PL) is a concurrency control method that guarantees serializability
By the 2PL protocol locks are applied and removed in two phases:
Expanding phase: locks are acquired and no locks are released.
Shrinking phase: locks are released and no locks are acquired.
Two types of locks are utilized by the basic protocol: Shared and Exclusive locks. Refinements of the basic protocol may utilize more lock types. Using locks that block processes, 2PL may be subject to deadlocks that result from the mutual blocking of two or more transactions.
Sql
Mobile-friendly - SQL (Structured Query Language) is a standard interactive and programming language for getting information from and updating a database
Database management system
A database management system (DBMS) is system software for creating and managing databases
Data independence
Data independence is the type of data transparency that matters for a centralized DBMS
Why normalise a relational scheme
Helps to identify a set of relatives data in the database
2PL locking protocol
two-phase locking (2PL) is a concurrency control method that guarantees serializability
By the 2PL protocol locks are applied and removed in two phases:
Expanding phase: locks are acquired and no locks are released.
Shrinking phase: locks are released and no locks are acquired.
Two types of locks are utilized by the basic protocol: Shared and Exclusive locks. Refinements of the basic protocol may utilize more lock types. Using locks that block processes, 2PL may be subject to deadlocks that result from the mutual blocking of two or more transactions.
Sql
Mobile-friendly - SQL (Structured Query Language) is a standard interactive and programming language for getting information from and updating a database
Database management system
A database management system (DBMS) is system software for creating and managing databases
Data independence
Data independence is the type of data transparency that matters for a centralized DBMS