13. Recovery Flashcards

Question 1

Q

Force/No Force policies - trade-off, goal

Answer

A

The force policy states when a transaction finishes, force all modified data pages to disk before the transaction commits.

Enforces durability but with bad performance (a lot of unnecessary writes).

The no-force polity states to only write back to disk when the page needs to be evicted from the buffer pool.

Performance is great, but enforcing durability is tricky (e.g. redo-logging implementation).

The goal is to implement a no-force policy with good durability.

Question 2

Q

Steal/No-Steal policies - trade-off, goal

Answer

A

The no-steal policy states that pages cannot be evicted from memory (and thus written to disk) until the transaction commits.

Ensures atomicity (no partial writes until commit), but handcuffs how we can use memory.

The goal is to implement a steal policy, which allows modified pages to be written to disk before a transaction finishes.

Ensuring atomicity becomes complex (e.g. implementing undo-logging to rollback incomplete transactions)

Question 3

Q

What is a database log?

Answer

A

A log is a sequence of log records that describe the operations that the database has done.

Question 4

Q

Log record types - describe

Answer

A

UPDATE: for SQL insert/delete/update

fields - {XID, pageID, offset, length, old data, new data}

COMMIT: signifies that a transaction is starting the commit process

ABORT: signifies that a transaction is starting the aborting process

END: signifies that a transaction is finished (usually means that it finished committing or aborting)

Question 5

Q

How log records are stored?

Answer

A

Log records are stored like DB records - combined into log pages. Log pages need to be operated on in memory but need to be written to disk to be stored permanently.

Question 6

Q

Write-Ahead Logging (WAL) - 2 rules

Answer

A

Log records must be written to disk before the corresponding data page gets written to disk. This is how we will achieve atomicity. The intuition for this is that if a data page is written first and then the database crashes we have no way of undoing the operation because we don’t know what operation happened!
All log records must be written to disk when a transaction commits. This is how we will achieve durability. The intuition is that we need to persistently track what operations a committed transaction has performed. Otherwise, we would have no idea what operations we need to redo. By writing all the logs to disk, we know exactly which operations we need to redo in the event that the database crashes before the modified data pages are written to disk!

Question 7

Q

What is LSN (for DB logging)?

What are pageLSN and flushedLSN?

Answer

A

Stands for Log Sequence Number. The LSN is a unique increasing number that helps signify the order of the operations (if you see a log record with LSN = 20 then that operation happened after a record with LSN = 10).

The flushedLSN is a counter in RAM that keeps track of the LSN of the last log record that has been flushed to disk.

The pageLSN is a piece of page metadata that stores the LSN of the operation that last modified the page.

Question 8

Q

Before page i is allowed to be flushed to disk, what inequality must hold? Why?

pageLSNi ___ flushedLSN

Answer

A

≤, This comes from our first rule for WAL - we must flush the corresponding log records before we can flush the data page to disk. A data page is only flushed to disk if the LSN of the last operation to modify it is less than or equal to the flushedLSN. In other words, before page i can be flushed to disk, the log records for all operations that have modified page i must have been flushed to disk.

Question 9

Q

How to abort a transaction using a write-ahead log?

Answer

A

The first thing we will do is write an ABORT record to the log to signify that we are starting the process.

Then we will start at the last operation in the log for that transaction. We will undo each operation in the transaction and write a CLR record to the log for each undone operation.

A CLR (Compensation Log Record) is a new type of record signifying that we are undoing a specific operation. It is essentially the same thing as an UPDATE record (it stores the previous state and the new state), but it tells us that this write operation happened due to an abort.

Question 10

Q

Fill in the equality below to enforce the WAL rule that all the logs must be flushed to disk before a transaction T can commit:

flushedLSN __ lastLSN of T

Answer

A

≥

If the flushedLSN is is greater than the last operation of the transaction then we know all of the logs for that transaction are on disk.

Question 11

Q

What is UNDO logging? Which recovery policy does it implement (if it’s the only recovery mechanism)?

Describe how it works

Answer

A

We undo the effects of all the transactions that have not yet been committed, while not doing so for those that have.

Implements FORCE-STEAL policy

On recovery, we run the recovery manager before accepting any new queries. We scan the log from the end to determine whether each of the transactions is completed or not. The action that we take based on the log record we encounter is as follows:

COMMIT/ABORT T: mark T as completed
UPDATE T, X, old, new: if T is not completed, write X=old to disk, else ignore

… proceed until the start of the log or the latest checkpoint (if we have them)

Question 12

Q

What is REDO logging? Which recovery policy does it implement (if it’s the only recovery mechanism)?

Describe the process

Answer

A

we redo the actions of all the transactions that were committed

NO-STEAL-NO-FORCE policy

On recovery, we run the recovery manager before accepting any new queries. We just read the log from the beginning (or from the latest checkpoint if we have them) and redo all updates of committed transactions.

Question 13

Q

ARIES Recovery Algorithm - main idea, 3 phases

Answer

A

When a database crashes, the only things it has access to are the logs that made it to disk and the data pages on disk. From this information, it should restore itself so that all committed transactions’ operations have persisted (durability) and all transactions that didn’t finish before the crash are properly undone (atomicity). The recovery algorithm consists of 3 phases that execute in the following order:

Analysis Phase: reconstructs the Xact Table and the DPT (dirty page table)
Redo Phase: repeats operations to ensure durability
Undo Phase: undoes operations from transactions that were running during the crash to ensure atomicity

Question 14

Q

What is a transaction table in ARIES recovery protocol?

Answer

A

An in-memory data structure that stores information on the active transactions. The transaction table has three fields:

XID: transaction ID
status: either running, committing, or aborting
lastLSN: the LSN of the most recent operation for this transaction

Question 15

Q

What is a DPT in ARIES recovery protocol?

Answer

A

Dirty Page Table. The DPT keeps track of what pages are dirty (recall from many modules ago that dirty means the page has been modified in memory, but has not been flushed to disk yet). This information will be useful because it will tell us what pages have operations that have not yet made it to disk.

The DPT only has two columns:

Page ID
recLSN: the first operation to dirty the page

Question 16

Q

For a page P that is in the DPT, fill in the following inequality for what must always be true:

recLSN of P __ in memory pageLSN of P

Answer

A

≤

If a page is in the dirty page table then it must be dirty, so the last update must not have made it to disk. The recLSN is the first operation to dirty the page, so it must be smaller than the last operation to modify that page.

Question 17

Q

For a page P that is in the DPT, fill in the following inequality for what must always be true:

recLSN of P __ on disk pageLSN of P

Answer

A

>

If the page is dirty then the operation that dirtied the page (recLSN) must not have made it to disk, so it must have came after the operation that did make it to disk for that page.

Question 18

Q

ARIES analysis phase - goal, describe the steps

Answer

A

The entire purpose of the analysis phase is to rebuild what the Xact Table and the DPT looked like at the time of the crash.

To do this, we scan through all of the records in the log beginning from the start.

On any record that is not an END record: add the transaction to the the Xact Table. Set the lastLSN of the transaction to the LSN of the current record
If the record is a COMMIT or an ABORT record, change the status of the transaction in the Xact Table accordingly
If the record is an UPDATE record, if the page is not in the DPT add the page to the DPT and set recLSN equal to the LSN
If the record is an END record, remove the transaction from the Xact Table.

Question 19

Q

What is checkpointing? Describe the process.

Answer

A

Checkpointing writes the contents of the Xact Table and the DPT to the log. This way on recovery, instead of starting from the beginning of the log, we can start from the last checkpoint.

Checkpointing actually writes two records to the log, a < BEGIN CHECKPOINT > record that says when checkpointing started and an < END CHECKPOINT >record that says when we finished writing the tables to the log.

The tables written to the log can be the state of tables at any point between the < BEGIN CHECKPOINT > and < END CHECKPOINT >. This means we need to start at the < BEGIN CHECKPOINT > because we’re not sure if the records after it are actually reflected in the tables that were written to the log.

Question 20

Q

ARIES REDO phase. Goal. Where to start? Describe the process.

Answer

A

Redo Phase ensures durability. We will repeat history in order to reconstruct the state at the crash. We start at the smallest recLSN in the DPT because that is the first operation that may not have made it to disk. We will redo all UPDATE and CLR operations unless one of the following conditions is met:

The page is not in the DPT. If the page is not in the DPT it implies that all changes (and thus this one!) have already been flushed to disk.
recLSN > LSN. This is because the first update that dirtied the page occurred after this operation. This implies that the operation we are currently at has already made it to disk, otherwise it would be the recLSN.
pageLSN (disk) ≥ LSN. If the most recent update to the page that made it to disk occurred after the current operation, then we know the current operation must have made it to disk.

Question 21

Q

ARIES UNDO phase. Goal. Where to start? Describe.

Answer

A

Undo phase ensures atomicity. The undo phase will start at the end of the log and works its way towards the start of the log. It undoes every UPDATE (only UPDATEs!) for each transaction that was active (either running or aborting) at the time of the crash so that we do not leave the database in an intermediate state. It will not undo an UPDATE if it has already been undone (and thus a CLR record is already in the log for that UPDATE).