2. Disk and Files Flashcards

Question 1

Q

How databases use memory and disks? Why?

Answer

A

Whenever a database processes data, that data must exist in memory. Accessing this data is relatively fast, but once the data becomes very large, it becomes impossible to fit all of it within memory. Disks are used to cheaply store all of a database’s data, but they incur a large cost whenever data is accessed or new data is written.

Question 2

Q

Basic units of data for databases and disks

Answer

A

The basic unit of data for relational databases is a record (row). These records are organized into relations (tables) and can be modified, deleted, searched, or created in memory.

The basic unit of data for disk is a page, the smallest unit of transfer from disk to memory and vice versa

Question 3

Q

How relational databases are represented on a disk?

Answer

A

In order to represent relational databases in a format compatible with disk, each relation is stored in its own file and its records are organized into pages in the file.

Question 4

Q

How relations are organized on disk?

4 main decisions to make?

Based on what are they made?

Answer

A

Based on the relation’s schema and access pattern, the database will determine:

(1) type of file used, (2) how pages are organized in the file, (3) how records are organized on each page, (4) and how each record is formatted.

Question 5

Q

2 main types of database files

Answer

A

Heap Files and Sorted Files

Question 6

Q

What’s meant by “one I/O” in databases?

Answer

A

1 I/O is equivalent to 1 page read from disk or 1 page write to disk

Question 7

Q

What is a heap file?

Answer

A

A heap file is a file type with no particular ordering of pages or of the records on pages

Question 8

Q

2 main implementations of a heap file

Answer

A

Linked List

Page Directory

Question 9

Q

Linked List Heap File - describe the structure

Answer

A

In the linked list implementation, each data page contains records, a free space tracker, and pointers (byte offsets) to the next and previous page. There is one header page that acts as the start of the file and separates the data pages into full pages and free pages.

When free data pages become full, they are moved from the free space portion to the front of the full pages portion of the linked list.

Question 10

Q

Page Directory Heap File - describe the structure

Answer

A

Linked list for header pages.

Each header page contains a pointer (byte offset) to the next header page, and its entries contain both a pointer to a data page and the amount of free space left within that data page.

Question 11

Q

The main advantage of page directories over linked lists for heap files?

Answer

A

Fast inserts.

Needs to read only header pages to find a page with free space.

Question 12

Q

Heap files vs. sorted files - the main trade-off

Answer

A

Heap files - fast insert

Sorted files - fast search

Question 13

Q

What is a sorted file?

Answer

A

A sorted file is a file type where pages are ordered and records within each page are sorted by key(s).

Question 14

Q

How sorted files are commonly implemented?

Answer

A

These files are implemented using Page Directories and enforce an ordering upon data pages based on how records are sorted.

Question 15

Q

The average cost of insert and search for sorted files

Answer

A

Searching through sorted files takes logN I/Os where N = # of pages since binary search can be used to find the page containing the record. Meanwhile, insertion, in the average case, takes logN + N I/Os since binary search is needed to find the page to write to and that inserted record could potentially cause all later records to be pushed back by one. On average, N / 2 pages will need to be pushed back, and this involves a read and a write IO for each of those pages, which results in the N I/Os term.

Question 16

Q

2 main record types

Answer

Study These Flashcards

A

Fixed Length Records (FLR) and Variable Length Records (VLR)

FLRs only contain fixed length fields (integer, boolean, date, etc.)

VLRs contain both fixed length and variable length fields (eg. varchar),

Question 17

Q

How Variable Length Records store their fields?

Answer

Study These Flashcards

A

VLRs store all fixed length fields before variable length fields and use a record header that contains pointers to the end of the variable length fields.

Question 18

Q

Record ID - purpose, components

Answer

Study These Flashcards

A

every record can be uniquely identified by its record id -

[page #, record # on page].

Question 19

Q

Packed pages with Fixed Length Records - describe

Answer

Study These Flashcards

A

Pages containing FLRs always use page headers to store the number of records currently on the page.

If the page is packed, there are no gaps between records. This makes insertion easy as we can calculate the next available position within the page using the # of existing records and the length of each record. Once this value is calculated, we insert the record at the computed offset. Deletion is slightly trickier as it requires moving all records after the deleted record towards the top of the page by one position to keep the page packed.

Question 20

Q

Unpacked pages with Fixed Length Records - describe

Answer

Study These Flashcards

A

Pages containing FLRs always use page headers to store the number of records currently on the page.

If the page is unpacked, the page header typically stores an additional bitmap that breaks the page into slots and tracks which slots are open or taken. Using the bitmap, insertion involves finding the first open bit, setting the new record in the corresponding slot, and then setting the bit for that slot. With deletion, we clear the deleted record’s corresponding bit so that future inserts can overwrite that slot.

Question 21

Q

Pages with Variable Length Records - describe

Answer

Study These Flashcards

A

Each page uses a page footer that maintains a slot directory tracking slot count, a free space pointer, and entries. The footer starts from the bottom of the page rather than the top so that the slot directory has room to grow when records are inserted.

Each entry in the slot directory consists of a [record pointer, record length] pair.

insert - at the free space pointer, a new [pointer, length] pair is set in any available null entry (or a new one if no nulls)

delete - set both the record pointer and record length to null.

Re-grouping/packing depends on the implementation.

Question 22

Q

Layered organization of DBMS - describe all layers

Answer

Study These Flashcards

A

2. Disk and Files Flashcards

(22 cards)