Lecture 23-24 Flashcards

Question 1

Q

Why is recovery needed?

Answer

A

Disks do not reflect the true state of the commited database. The contents of main memory could be lost by system crashes, transaction failures, disk failurs or other problems. If there is a failure some transactions will need to be undone or redone.

Question 2

Q

What is the system log file? What entries can there be?

Answer

A

A file maintained on the disk of all changes to the database, it is a sequential, append-only file that is only affected by disk or catastrophic failures.
Entries can be
[Start, Ti]
[Write_item, Ti, X, old_value, new_value]
[Commit, Ti]
[Abort, Ti]

Question 3

Q

What is the commit point of a transaction? What occurs in system failure?

Answer

A

The commit point is when all operations were executed successfully and recorded in the log file. When this happens we need to force write a commit record to the log file. Beyond this the transaction is commited and if system failure occurs a redo must be done, if the commit point was not reached an undo must be done.

Question 4

Q

What occurs in disk block caching? What are some key terms?

Answer

A

disk blocks cached in main memory buffers, then updated, then written back to disk. In place buffering involves writing the buffer to the original disk location.
The before image(BFIM) is the value before updating, the after image(AFIM) is the value afterwards.
In write ahead logging the BFIM is recorded in the log entry and flushed to disk before the BFIM is overwritten with the AFIM on disk.

Question 5

Q

What is a checkpoint? What is involved?

Answer

A

The point at which a record is written into the log and all modified buffers are written to the disk.
They involve: temporarily suspending transaction execution, Force writing contents of the buffers to disk(logging the force-write), write a checkpoint record to log and force write log to disk. Finally we resume transaction execution.

Question 6

Q

What is the general recovery process?

Answer

A

On startup users are locked out and the log file is checked, if a failure occured users remain locked out and the recovery process is performed before activation of users.

Question 7

Q

What are steal/No-steal and Force/No-force? What is most common?

Answer

A

Steal allows writing an updated buffer to the disk before the transaction is commited, No-steal means undos never have to be done.
Force, all pages updated by a transaction are immediately written to disk upon commit. Redo will never be needed if force is done.
However, steal/no-force is the most common as it reduces buffer usage(steal) and I/O usage of Force.

Question 8

Q

What is deferred update? How do we recover a single user?

Answer

A

deferred update is how no steal is done. Postpone any actual updates to the database until successful transaction completion and the commit point is reached. During the transaction, updates are recorded only in the log file and cache buffers. After the transaction reaches its commit point and the log is force-written, the updates are recorded.

To recover we create a commit list: the committed transactions since last checkopint and active list, the active transactions. Search forward from most recent checkpoint, if a start transaction is found add to active, add any write item records to active. If a commit record is found, move it and write items to committed list. When end of log is reached redo all write items records in committed list.

Question 9

Q

What is immediate update? How do we recover?

Answer

A

steal, database updated without needing to wait for commit point. Update operation must still be recorded before being applied. To recover we do the same as for deferred, but we work backwards through active list also, and undo all write item entries.

Question 10

Q

What is a catastrophic failure? What can we do?

Answer

A

Disk failure has occured and system log is lost, we should have database backups, periodically copying the whole database to a cheap storage medium and frequently backing up the system log, with a new log started every database backup.

Question 11

Q

What is the optimisation process for queries? What is typically used for this?

Answer

A

Convert to relation algebra, generate execution plans and the choose the cheapest. A query tree can aid this process, it is a tree that corresponds to a relational algebra expession, order of execution starts at leaves and goes to root. Leaf nodes are the input tables, meaning joins will end up close to bottom.

Question 12

Q

What are some algorithms for simple selection?

Answer

A

linear search, test every record
Binary search, can be done if search involves equality comparison on key attribute in ordered file.
Use primary index or hash key, for when selection condition involves equality on key attribute with primary key or hash.
Use primary index to retrieve multiple records, if comparison is >, >=, , >=.

Question 13

Q

Which algorithms for simple selection can be used for range of value queries?

Answer

A

B+ tree and primary index for multiple records.

Question 14

Q

How are joins implemented?

Answer

A

Can be done using a nested-loop join(brute force), single-loop join(access structure to retrieve matching records), or sort-merge join(sort both lists and then check).

Question 15

Q

How are PROJECTs implemented?

Answer

A

Simple if attribute list includes a key, eliminates duplicates if DISTINCT is used.

Question 16

Q

How do we generate query plans? What is important we consider with relation to cheapness?

Answer

Study These Flashcards

A

A set of query execution plans is constructed by combining a set of candidate procedures, its important we don’t create too many plans, as this has its own cost. Heuristics keep the number of generated plans within reason. We then assign a cost to each plan based on size of data set and estimated intermediate tables. The cheapest is then chosen.

Question 17

Q

What are the main heuristics in query optimisation?

Answer

Study These Flashcards

A

SELECT as early as possible to reduce tuples, PROJECT as early as possible to reduce attributes, JOINs should be later

Question 18

Q

What are the main heuristics in algebraic optimisation?

Answer

Study These Flashcards

A

Any SELECT operations with conjunctive conditions should be broken into a cascade, this allows them to be moved down different branches of the query tree.
Each SELECT operation should be as far down the query tree as permitted by attributes.
Using associativity, leaf nodes should be rearranged so most restrictive SELECT operations are first.
Break down and move lists of projection attributes down the tree as far as possible by creating new PROJECT operations as needed.

Question 19

Q

How do we estimate plan costs?

Answer

Study These Flashcards

A

Access to secondary storage, storage costs, computation costs, and communication costs.

Lecture 23-24 Flashcards

(19 cards)