Distributed Data Concepts Flashcards
Transaction
Completes changes successfully or not at all. This lets multiple users access the database without fear of data inconsistencies.
Rollback
Undoes all changes in a transaction due to an error
True or false: A transaction can only have one read operation and one write operation
False! A transaction can have as many operations as needed.
Online Transaction Processing (OLTP)
Real-time access to data
Online Analytical Processing (OLAP)
Involves fewer, more intensive transactions (e.g. banks processing large amounts of money) than OLTP
Lost update
When multiple users attempt a transaction at the same time, only one completes successfully
Dirty read
Data loss (e.g. due to a power cut)
True or false: In the event of data loss, data is always reset to its original value
False! Whether the data is reset depends on whether the changes made were committed beforehand.
Inconsistent analysis
One user reads data while another user is updating it, resulting in a mix of old and new values
Locking
Transactions lock part of the database before updating via a shared lock (read only) or an exclusive lock (read/write)
Timeout
A transaction rolls back after a certain amount of time
Deadlock detection
The smallest transaction is rolled back
Distributed databases
Data is stored in different physical locations
True or false: Distributed databases can lead to performance issues
True! More locations = higher risk of performance issues
CAP Theorem
Consistency - transactions ensure this by rolling back if an error occurs
Availability - every query request gets a response
Partition tolerance - coping with network failures/delays
True or false: A good database needs all CAP aspects
False! According to Brewer’s Theorem, only 2 CAP aspects are needed at one time (one being partition tolerance).
Why is partition tolerance necessary?
Occasional network issues is better than data anomalies
Logging
Transactions are logged, containing original and new values in case of rollback
Checkpoint
Logs which transactions are running, committed, etc.
What are some issues that can damage a database?
o Sudden crash or loss of power to servers
o Hardware problems (e.g. corrupt/broken disks)
o Flood/fire/etc. in server room
o Accidents
o Malicious damage
True or false: In the event of damage, all incomplete transactions must be undone
True! These transactions will have neither COMMIT nor ROLLBACK in the log.
True or false: If the database is damaged, completed transactions must be undone
False! Completed transactions (that weren’t marked as complete at the last checkpoint) are redone in case the DBMS didn’t finish writing the changes to the disk.
Where should archives and log files be stored?
At a different location to the database, preferably not on a server
Why is indexing important?
It makes query processing more efficient; without it, a DBMS would have to search a whole table to look for data.
Clustered index
Controls how data is stored on a disk
Non-clustered index
Contains a pointer (row locator) to each row