Distributed data concepts Flashcards
what is a transaction
a logical unit of work which either completes in its entirety or not at all, its aim is to keep the database consistant
what do transactions end with
COMMIT – commit the changes and successfully end
ROLLBACK – an error has occurred so undo everything from the transaction
Programmatic SQL – normal termination of the program is a COMMIT
Programmatic SQL – abnormal/error termination is a ROLLBACK
ACID properties of a transaction
Atomicity, Consistency, Isolation, Durability
What does Atomicity mean
completes in its entirety or not at all
what does consistency mean
the database must be in a consistent state - a DBMS or application developers need to ensure this
what does Isolation mean
each transaction executes independently of others
what does durability mean
changes made by a transaction must persist - recovery systems must ensure this
OLTP
Online transaction processing
Quick real-time access to data to read or modify it
OLAP
Online Analytical Processing
involves fewer, more intensive transactions than OLTP
Problems with transactions
The lost update problem
The dirty read or uncommitted dependency problem
the inconsistent analysis problem
The lost update problem
an update done to a data item by a transaction is lost as it is overwritten by the update done by another transaction
The dirty read problem (uncommitted dependancy problem)
A dirty read occurs when a transaction reads data that has not yet been committed
The inconsistent analysis problem
when one user is reading the several pieces of data when another is updating them data the user reading the data may end up with a mix of old and new values
How to fix the problems with transactions
making all transactions run serially, this causes performance issues if only one transaction is run at once
read operations can be run in parallel
if a transaction is writing to one part of the database, it can still access other parts which arent affected by the update
what is locking
transactions lock part of the database before updating
what is a shared lock
the data item can only be read by the transaction
what is an exclusive lock
read or write
what is a deadlock
it is possible that 2 transactions are each waiting for the other to release a lock
how can a deadlock be resolved
timeouts - transaction rolls back after a certain amount of time
deadlock detection - rollback the transaction which would cost the least to stop
deadlock prevention - try look for the problem in advance (not common as is tricky to do)
what is a distributed database
the data is stored in different physical locations but the DBMS makes it invisible to the end-user
this causes more performance issues but has more storage
CAP theorem
Consistency
Availability
Partition tolerance
Consistency(CAP)
the database must be in a consistent state, transactions ensure this by rolling back if an error occurs when in an inconsistent state
availability
every query request gets a response
partition tolerance
distributed databases can cope with network failures / network delays
Brewers theorem
Can only expect to have 2 of these CAP aspects at any given time, partition tolerance is a necessity, therefore designers have to trade-off between consistency and availability
logging
the database keeps a log of all transactions made, including before and after values,
checkpoints
these are made periodically noting which transactions are running, which are committed, etc. transactions are suspended as the checkpoint is made.
All shown as committed are fully complete and written to the disk
archiving
databases are regularly archived by its administrator into offline storage
what issues can a database be suddenly hit by
Sudden crash or loss of power to servers
Hardware problems (e.g. corrupt/broken disks)
Flood/fire, etc. (in the server room)
Accidental issues (e.g. bug in a program accessing the DB, the user doing something silly)
Malicious damage
how can database be recovered
using the log file, as it shows what transactions have been made.
any incomplete transactions need to be undone
completed transactions will need to be redone in case the DBMS had not finished writing the changes to the disk
transactions marked as complete at the last checkpoint don’t need to be redone
Depending on the problem, the database may need to be reloaded from the previous archive before the log file entries are redone
where should the log file be stored/archived
on a different storage medium to the database, archive has to be stored somewhere other than the server location