1.3.2 Databases Flashcards
Flatfile database
- consists of only a single file [1]
- requires little expertise to maintain [1]
- may have redundant data which can lead to increased storage requirements [1]
- flat file database is harder to update so data might be inconsistent [1]
Relational database
- a relational database is one that uses different tables for different entities
- consists of linked tables [1]
- maintains data integrity [1]
Entity
an item of interest about which information is stored
Primary key
- a unique identifier for each record [1]
- used to establish relationships between tables
- can be used to fetch any record from a table
Foreign key
- the field/column which links two tables together [1]
- it is the field/column that refers to the primary key in another table [1]
Secondary key
- specifying another attribute to be indexed for faster search [1]
- an indexed field that is not unique
- the index consumes additional storage space in the database [1]
Entity- relationships modelling
- one -to one
- one - to many
- many to many
One - to - one
Each entity can only be linked to one other entity
One- to- many
One table can be associated with many other tables
Many-to-many
Many tables can be associated with many other tables
Normalisation
- the process of coming up with the best possible layout for a relational database
- normalisation tries to accomplish the following :
- no redundancy so data integrity is maintained
- consistent data throughout linked tables
- records can be added and removed without issues
- complex queries can be carried out
First normal form 1NF
(To get to 1NF a tables must follow 5 rules)
- all field names must be unique ( to avoid confusion about which record should be retrieved or updated)
- values in fields must be from the same domain (e.g data type)
- values in fields should be atomic so for every cell there must only be a single value
- no two records can be identical
- each table needs a primary key
No duplicate records in 1NF
this is because duplicate records take up unnecessary space and can cause inconsistencies if updates are not performed on all of them
Second normal form (2NF)
- the data is already in 1NF
- any partial dependencies have been removed
- meaning no fields can depend on only part of a composite primary key
- so each table should serve its own single purpose
Partial dependency
- Means one of more fields depend on only part of the primary key
- this issue can arise if the primary key is a composite key
Fixing a many to-many relationship
- create a linking table
- assign primary key from the 2 initial tables as the composite primary key for the new linking table
- this flips the M:M relationship to become to 1:M relationships
Third normal form (3NF)
- data is already in 2NF
- any non-key dependencies are removed
- all keys are dependant on the primary key, the whole primary key and nothing but the primary key
Non key dependencies
When the value of a field depends on the value of another field which is not the primary key
Indexing
- creates a Data structure on specific fields/columns allowing the database engine to quickly locate and access relevant records during queries
- primary key is automatically indexed
- adding an index to a field speeds up searches on that field which would otherwise have to be searched sequentially
Indexing advantages
- allows faster search + data retrieval from the indexed field[1]
Indexing disadvantages
- consume additional storage space
- for small tables, benefits of indexing may be negligible or even counter productive
Handling data
Consists of 4 parts :
- capturing data
- selecting data
- managing data
- exchanging data
Data capturing methods
- paper based forms
- OCR (optical character recognition)
- OMR (optical mark recognition)
- MICR (magnetic ink character recognition)
- sensors
- smart card readers
- barcode scanners
Paper based forms
- involves a human reading the form and manually typing the information into a computer based system
- to avoid errors :
- every part of the form must be labelled clearly
- instructions to complete the form in black pen
- instructions to complete the form in capital letters
- use of tick boxes
- squares for entering each letter separately
OCR
- automatically reads text by interpreting the shape of letters
- works better with printed text than handwriting
- post offices use OCR to read postcodes and route mail
- road cameras use OCR to recognise number plates to identify drivers who are speeding
OMR
- often used for multiple choice tests and lottery tickets
- significantly reduces risk of human error
MICR
Used to scan cheques
Selecting and managing data
- involves only selecting data that meets a certain criteria eg speed camera automatically only photograph vehicles which are exceeding the speed limit
- collected data can be managed using SQL to sort/select data from different tables which match the criteria
- using selected data :
- reports may be produced
- letters sent out by post or email
- new stock items automatically reordered
- records added, updated or deleted
Exchanging data
- the process of transferring data which has been collected
- common way to exchange data is EDI
EDI ( electronic data interchange )
- using standardised message formatting documents can be exchanged electronically
- doesn’t require human interaction
- significantly increases speed of data transmission
- however this means any error in the data will be replicated across multiple systems
Methods of exchanging data
- CSV
- JSON
- XML
- EDI
- SQL
- APIs
SQL
Is a declarative programming language used to manipulate databases
Referential integrity
- ensures that changes across a database are consistent [1]
- if a record is removed, all references to it are removed (cascade delete can be used) [1]
- a foreign key must have a corresponding primary key in another table [1]
- prevents orphaned records
Transaction processing
- a transaction is a single operation executed on data
- each transaction must succeed or fail as a single unit - can never be only partially complete
- eg in bank transfers the transaction is withdrawal from one account and deposit into another
- both operations should happen or neither should happen
ACID
- a set of rules that ensures that transactions are processed reliably and data integrity is maintained
- Atomicity
- Consistency
- Isolation
- Durability
Record Locking
- the technique of preventing simultaneous access to records in a database in order to prevent inconsistencies or loss of updates
- so while one person is editing a record, the record is locked preventing others from accessing the same record until the transaction is complete or cancelled
Record locking benefits
- maintains data consistency and integrity by preventing conflicting records
- allows concurrent read access to records without comprising data consistency
Record locking cons
- a deadlock could occur and this happens when two or more transactions are waiting for each other to release resources causing them to wait indefinitely
Deadlock resolution
- serialisation :
- timestamp ordering
- or commitment ordering may be used
Serialisation
- involves assigning a unique serial order to transactions ensuring that transactions execute one after the other rather than concurrently
Time stamp ordering
- Assigns a timestamp to each transaction based on its start time
- transactions are then ordered based on the timestamps
- if a deadlock is detected, the DBMS can compare the timestamps of conflicting transactions
- the transaction with the lower timestamp is rolled back allowing the others to proceed
Commitment ordering
- transactions are ordered in terms of their dependencies on each other as well as the time they were initiated
- can be used to prevent a deadlock by blocking one request until another is completed
Redundancy
- The process of having one or more copies of data in physically different locations
- this means if there is damage to one copy, the others will remain unaffected and can be recovered
Atomicity
- A transaction must be processed in its entirety or not at all [1]
- can never be only partially complete
Consistency
- a transaction must maintain the referential integrity rules between linked tables
- ensuring that data starts with a consistent state and ends with a consistent state
Isolation
- ensures that the outcome of concurrent transactions is the same as if the transactions were completed sequentially [1]
- transactions are isolated from one another preventing conflicts and data inconsistencies
- record locking can be used to ensure isolation
Durability
- ensures that committed data/transaction is not lost even in the case of power cut/ system failure [1]
- durability can be achieved by storing completed transaction in secondary storage [1]