Reliability Management Flashcards
Reliability manager
Responsible for atomicity and durability 2 of the ACID propertries.
Implements these transactional commands:
- begin transaction (B)
- commit work (C)
- rollback work (A, abort)
Recovery primitives:
- cold restart
- warm restart (main memory failures)
Interacts with buffer manager to ensure read/write requests’ reliability, and may generate more read/write requests for reliability purposes.
It exploit log file: a log of the DBMS activity stored in stable memory.
It prepares data for recovery by using: checkpoints and dumps.
ACID
Atomicity: changes on multiple data points are made as it was just one operation. -> either all changes are performed or none.
Consistency: any change must be consistent with the rules (from constraints, triggers, cascades). I.E. to avoid to create money out of thin air (in banks)
Isolation: every transaction is executed in a isolated and indipendent way.
Durability: after the transaction is committed the changes endure.
Stable memory
Memory that is resistant to failure:
- it does not exist in real life
- approximated by robust write protocols and redundancy
Failures in stable memory are THE END OF THE F*CKING WORLD (catastrophic).
Log file
Records transaction activities in chronological order.
Two types of record:
- transaction
- system
Writing:
- records are written in current block in sequential order
- records belonging to different transactions are interleaved
Undo/Redo
Undo
Insert O -> delete O
update O -> write the before state of O
delete O -> write BS of O
Redo
insert O -> write AS of O
update O -> write AS of O
delete O -> delete O
Idempotency property:
Undo or Redo can be applied an arbitrary number of times without chaning the outcome. -> UNDO(UNDO(ACTION)) = UNDO(ACTION)
REDO(REDO(REDO(REDO(A)))) = REDO(REDO(A))
Checkpoint
Operation periodically requested by Reliability manager to buffer manager.
Duing checkpoint the dbms: writes data of committed and aborted transactions on disk (sync. write with force primitive) and records the set of active transactions. After this the chekpoint record is written (using force primitive) on log containing set of active transactions.
After the checkpoint, all committed transactions are permanently stored on disk.
DUMP
It creates a complete copy of the state of the database:
- performed (typically) when the system is offline
- stored in stable memory (off-line)
- copy may be incremental
At the end of the dump, a dump record is written in the log file:
- date and time of the dump
- dump device used
Writing the log: rules
WAL: WRITE AHEAD LOG
- Before state of data in log record is written in stable memory before database data is written on disk -> allows undo of data already written on disk
Commit precedence
- After state of data in log record is written in stable memory before commit -> allows redo operations for already committed transactions that haven’t been written on disk.
In practice:
- Sync (force) write for data modification on disk and on commit
- The log is written in an async. way for abort and rollback.
Commit record on the log is a milestone:
- If not written in the log, upon failure the transaction should be undone.
- If written, upon failure the transaction should be redone.
Writing the log guaranteeing properties
Usage of robust protocols to guarantee reliability is costly.
It is require to guarantee the ACID properties.
Log writing is optimized through: compact format, parallelism and commit of groups of transactions.
Protocols for writing log and database
- All data writes performed before commit
- does not require redo of committed transactions
- All data writes performed after commit
- does not require undo of committed transactions
- Disk writes take palce both before and after commit
- requires both undo and redo operations
- mixed approach adopted in real systems.
Types of failures
Part of recovery management
System failure:
- Caused by software problems or power supply interruptions
- It causes losing the main memory content (buffer), not the disk (database and log)
Media failure:
- Caused by failure of devices managing secondary memory
- It causes losing the database content on disk, but not the log content (stored in stable storage)
Fail-stop model and recovery
Failure -> system stop
Recovery depends on failure type:
- system faiulres -> warm restart
- media failures -> cold restart
When recovery end the systems becomes again available to perform transactions
Warm Restart, Transaction categories
- Completed (t1) before checkpoint
- no recovery needed
- Committed (t2,4), but for which some writes on disk may not have been done yet
- redo needed
- Active (t3,5) transaction at the time of failure
- they did not commit
- undo is needed
Checkpoint records is not really needed to enable recovery, but provedies faster warm restart. Without checkpoint record needs to be readh from last dump.
Algo:
- Read backwards the log until the last checkpoint
- Detect transactions which should be undone/redone
- At last checkpoint: insert in UNDO set all transactions for which the begin record is found.
- Read forward log: insert into UNDO set all transaction for which begin record is found, and move transaction from undo to redo list when commit record is found
- Transactions endind with rollback remain in UNDO set.
- Data recovery
- Log is read backwards from the time of failure until the beggining of the oldest transaction in the UNDO list
- all actions from transaction in the undo list are undone
- for each of these transaction the begin record is reached (even if earlier than checkpoint)
- Log is read forward from the begiining of the oldest transaction in REDO list.
- Actions of transactions in redo list are applied to database
- starting point for each transaction is its begin record
- Log is read backwards from the time of failure until the beggining of the oldest transaction in the UNDO list
Warm restart algorithm
Algo:
- Read backwards the log until the last checkpoint
- Detect transactions which should be undone/redone
- At last checkpoint: insert in UNDO set all transactions for which the begin record is found.
- Read forward log: insert into UNDO set all transaction for which begin record is found, and move transaction from undo to redo list when commit record is found
- Transactions endind with rollback remain in UNDO set.
- Data recovery
- Log is read backwards from the time of failure until the beggining of the oldest transaction in the UNDO list
- all actions from transaction in the undo list are undone
- for each of these transaction the begin record is reached (even if earlier than checkpoint)
- Log is read forward from the begiining of the oldest transaction in REDO list.
- Actions of transactions in redo list are applied to database
- starting point for each transaction is its begin record
- Log is read backwards from the time of failure until the beggining of the oldest transaction in the UNDO list
Cold Restart
Manages failures demaging (a portion of) the database on disk
Steps
- Access the last dump to restore the damaged poriton of the disk
- Starting from the last dump record, read the log forward and redo all actions on the database and transaction commit/abort
- perform warm restart
Alternative to 2 and 3 are:
- Perform only actions of committed transactions
- Requires two log reads
- detect committed transactions
- build redo list
- redo actions of transactions in REDO list
- detect committed transactions