Checkpoints Flashcards
Checkpoints
Used to declare a point before which the DBMS was in a consistent state, and all transactions were committed.
Advantages of Checkpoints
- faster recovery
- less space used for log-file
Simple checkpointing
Checkpoint the log periodically.
No need to undo transactions before a checkpoint.
Simple Checkpointing process
- Stop accepting new transactions
- Wait until all active transactions finish and have written commit/abort record on log
- Flush the log to disk
- Write log record checkpoint
- Flush again
- Resume accepting transactions
Transactions interrupting checkpoints
Other transactions cannot interrupt a checkpoint, and the transactions within a checkpoint have to finish before we can start anything else.
ARIES Checkpoints
Two things are required for ARIES checkpoints:
- Undo and redo logging
- Transactions do not write to buffer before they are sure they want to commit
Writing ARIES Checkpoints
We write:
<CHECKPOINT(T1, T2, …)>
in the log and flush it, with these transactions not being committed or aborted as they are in progress.
We then write the content of the buffer to disk.
Then we write
<END>
in the log and flush it.
</END>
How writing ARIES checkpoints work
- we have a checkpoint with a set amount of transactions.
- when using undo/redo, we look for the first checkpoint with a corresponding end checkpoint.
- we check that checkpoint’s transactions to see which ones have committed.
- we then undo to the earliest transaction that has not committed.
Writing checkpoints example
If T1 begins before T2, and neither has committed, we start at T1 and undo both.
However, lets say you have T1 and T2 again, and T2 has committed but T1 has not. We then redo the committed transactionsafter the cehckpoint and undo the uncommitted transactions before the checkpoint.
Conflict serialisability v Recovery
CS:
- equivalent to serial schedules
- ensures consistency and correctness
- can be enforced using two-phase locking
Logging and Recovery:
- ensures we can restore desired database states
- undo incomplete transactions, redo complete transactions
- robust; works even after system failure
Dirty reads
The isolation property is not fully enforced for efficiency reasons, which leads to us reading uncommitted transactions.
Can slow the system when an abort occurs, since you have to rollback multiple transactions (since one transaction reads from another uncommitted one, both may have to be rollbacked).
Cascading Rollbacks
Lets say we have two transactions which happen in a serial manner (T1 -> T2). Both commits happen after T2.
If we have to rollback T1, then if T2 has used something from T1, then we also rollback T2.
The issue is that if we do not rollback everything, it leads to isolation problems.
If we do abort them, we are rolling back a committed transaction, leading to durability breaks.
Recoverable Schedules
A schedule s is recoverable if the following is true:
If a transactions T1 commits and has read item X that was written before by a different transaction T2, then T2 must commit before T1.
Recoverable Schedules Example
Lets say we have:
T2 ; Write X
T1; Read X
T2 ; Commit
SYSTEM ERROR
T1; Commit
If we were to perform cascading rollbacks, since they should only effect active transactions, only T1 will be rolled back. This is not an issue since it is performing the read again with the same data since we committed beforehand.
Cascade-less Schedules
Only reads values that were written by committed transactions.
Cascade-less Schedule Properties
- Recoverable
- (In general) not serialisable
- Can lead to deadlocks
Strict Schedules
Each transaction reads and writes values that have already been committed.
Strict Two-Phase Locking
Enforces both:
- Conflict-serialisability
- Strict schedules
A transaction must not release any simple or exclusive lock until it has committed or aborted, and that the commit or abort log has been wrote to disk (does not apply to shared locks).
Conflict Serialisable Schedule Relations
- always serialisable
- can be recoverable
- can be cascade-less
- can be serial
- can be strict
Serialisable Schedule Relations
- can be conflict-serialisable
- can be recoverable
- can be cascade-less
- can be serial
- can be strict
Recoverable Schedule Relations
- can be serialisable
- can be cascade-less
- can be conflict-serialisable
- can be strict
- can be serial
Strict Schedules Relations
- always cascade-less
- always serialisable
- always conflict-serialisable
- always recoverable
- can be serial
Serial Schedules Relations
- always cascade-less
- always serialisable
- always conflict-serialisable
- always recoverable
- always strict
Strict 2PL Deadlock
Strict two-phase locking is still susceptible to deadlock.
Dealing with deadlocks
- Assume a transaction is in a deadlock if it exceeds a time limit.
- Wait-for graphs (restart certain transactions)
- Using timestamps
There are more general ways:
1) Detect deadlock and fix it
2) Prevent deadlock with deadlock free schedules
Timestamps
Each transaction is assigned a unique integer TS(T) upon arrival.
For example, if T1 arrives earlier than T2, we require TS(T1) < TS(T2) (we say that T1 is older than T2).
Does not allow resets.
Wait-Die Scheme
Old transactions will wait for unlocks:
- If t1 is older than t2, then t1 is allowed to wait further for t2 to unlock
- If t1 is younger than t2, then t1 is rollbacked and dies
Wound-Wait Scheme
Old transactions never wait for unlocks:
- If t1 is older than t2, then t2 is rollbacked unless it has finished
- If t1 is younger than t2, then t1 is allowed to wait further for t2 to unlock
Timestamp-based Schedulers
For these schedulers, there are three possible actions per transaction:
- grant request
- abort transaction
- delay transaction
Allows resets.
Maintaining timestamps of transactions for item X
For each database item X, maintain:
- Read time of X: RT(X) -> timestamp of youngest transactions which read X
- Write time of X: WT(X) -> timestamp of youngest transactions that wrote X
Examples of maintaining timestamps
If T1 requests to read X, either:
- abort and restart T1 if WT(X) > TS(T1)
- grant request otherwise
If T1 requests to write X, either:
- abort and restart T1 if WT(X) > TS(T1) or if RT(X) > TS(T1)
- grant request otherwise
Starvation
Can happen if cascading rollbacks occur in timestamp-based schedulers.
If we have:
T1 -> r(X), w(x), r(y)
T2 -> r(y), w(y), r(x)
with TS(T1) = 1, and TS(T2 = 2), we can have starvation occur, with the order being.
Line1(T1)
Line2(T1)
Line3(T1)
Line1(T2)
Line2(T2)
Line3(T2)
Line4(T1)
RESTART
Line1(T1)
Line2(T1)
Line3(T1)
Line1(T2)
RESTART
Timestamp-based Scheduling overall
- Enforces conflict-serialisable schedules
- Prevents deadlocks
- Allows cascading rollbacks
- Allows starvation
Multi-version Concurrency Control
The idea is the same as time-stamp based approaches to scheduling, but you have multiple versions of the database.
So instead of overwriting things, we create a new version of the item at time TS(T_i).
Also, whenever you read something, you read the latest version if the timestamp if bigger.
Rules
- Writes ; abort and restart T1 if RT(X > TS(T1) and otherwise, grant request
- Reads ; always grant.
There is also a strict variant, where you delay reads until the transaction you read from commits.