Storing Data: Disks and Files Flashcards

1
Q

Why can’t you store everything in main memory

A
  • buying enough storage costs too much
  • main memory is volatile to power off (ie. not persistent)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Typical storage hierarchy has 3 storage types. Name them and what they are used for

A

Main memory: for currently used data

Disk: for main database (frequently used)

Tapes: for archiving older versions (infrequently used)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

DBMS stores DB where?

what are the major implications for DBMS designs?

A

DBMS store DB on hard disks

Major implications are planning how to do read/write (I/O operations). They are high-cost operations when compared to CPU operations, so they need to be carefully considered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the main advantage disks have over tapes

A

disks are random access and tapes are sequential. Therefore faster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

true or false: pages are the smallest unit of data retrieval you can read and write to

A

true: you must read and write an entire page, even if you make a minor change to a file/record

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

True or false: a plater has two surfaces

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how does the arm assembly move to position

A

it moves in and out to position the disk heads on a desired track

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is a cylinder in a disk structure

A

Same track on all surfaces (ie. top and bottom of each platter)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

true or false: only one head can read/write at one time

A

true, can’t have more than one head working at a time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is a track in a disk structure

A

A disk drive track is a circular path on the surface of a platter where you read and write information to.

A track is a physical division of data in a disk drive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is a sector in a disk structure

A

A disk sector is the smallest storage unit on a disk

Typically It’s a subdivision of a track. it acts as a division of storage on a disk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is a page in a disk structure

how do you locate a page on a disk?

A

a continuous set of sectors of a track.

locate by: page id
where

page id = (b, s, c, d)
block b of surface s of cylinder c of disk d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

the time to access (read/write) a disk page is determined by 3 actions. What are they?

If you want to speed up the access time, which of the three actions can you affect to actually speed it up?

A

seek time (moving arm to position disk head on track)

rotational delay (waiting for page to rotate under head)

transfer time (actually moving data from main memory and disk)

if you want to speed things up you can only speed up the first two processes. transfer time cannot be further sped up.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
  • Seek time varies from about 1 to 20msec
    – Rotational delay varies from 0 to 10msec
    – Transfer rate is about 1msec per 4KB page

transfer rate time varies so much, why is this? How can we reduce this

A

it varies so much b/c it depends on the best and worst case of where the disk head is currently positioned.

Best: the disk head is already above the track you want
Worse: we need to move very far to get to the next needed track

can reduce the time by reducing the seek and rotation time by careful arrangement of data pages on the disk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

describe what the ‘next page’ concept is

A

have the data organized in a way where you always have the next page ready/prepped/next in line. Pages should be stored sequentially to minimize seek/rotation delay.

the order:
Pages on same track, followed by pages on
same cylinder, followed by pages on
adjacent cylinder

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In practice, data pages are not always sequentially stored on the disk even though it will minimize seek/rotational delays, why?

A

If we delete data, it will create an unusable hole

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

There are layers to the calls a DBMS will make on a DB. What is the hierarchy?

A
  1. application, like sql (highest)
  2. Memory Management
  3. Disk Management
  4. DB on Disk (lowest)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

files mean two different things depending on whether you are a user or DBMS. What is the difference?

A

user: a file is a collection of logical records

DBMS: a file is a collection of disk pages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Memory managers/disk managers must translate
user’s search or update into disk page

how is this translation done

A

by indexing

20
Q

The memory manager relies on the disk manager to do 3 jobs, what are they?

A

–Allocate/de-allocate a page for a file
– Read/write a page
– Keep track of data pages and free pages

21
Q

true or false: data must be in RAM to operate on it

A

true

22
Q

After a page request is made from higher level application programs, describe the sequence of steps/checks we must make before we can operate on the page

Alternatively, what is done after a page request is made

A
  1. check if the needed page is already in main memory
    (specifically, in the buffer pool, also note the buffer pool has frames that are either free or contain a page)
  2. if we are not in main memory we must bring it in from the database
23
Q

How do we know if a requested page already exists in main memory? How do we know we don’t need to access the DB to obtain the page?

A

We do a table lookup that tells us what page is in what frame of the buffer pool <frame# , page id>.

If the page id is in this table you can find it in frame# of the buffer pool.

Otherwise, it does not exist in main memory and you will need to retrieve it from the disk

24
Q

Assuming a full buffer pool, what policy dictates which frame in the buffer pool to replace when introducing a new frame?

A

the replacement policy decided which frame to replace first

25
Q

what is the term for when you need to request a page and it does not currently exist in main memory. ie. not found in the table look up

A

page fault

26
Q

describe the process we go through after we page faulted

A
  1. choose a frame for replacement (by replacement policy)
  2. If frame is dirty, write it to the disk
  3. read in requested page into the selected frame
  4. update look up table with new page id
  5. pin the page and return its address
27
Q

what does it mean if a frame is “dirty”

A

the page in this frame is modified but not yet written to the disk.

there are two copies, one in main memory and one in disk

28
Q

what does it mean to pin a page when it is brought into main memory and given its own frame in the buffer pool

A

we pin a page to indicate that it is needed by someone/some process

29
Q

define
1. dirty bit
2. whose job it is to unpin a page
3. pin count
4. when a page is a candidate for replacement

A
  1. used to indicate if a frame/page is dirty
  2. the requestor of the page will unpin when it no longer needs the page
  3. pages may be wanted by many requestors, the pin count will tell us how many processes still need it
  4. when pin count is 0, nobody needs it, we can consider replacing it
30
Q

does the type of replacement policy have a big or small impact on the # of I/O operations? Why

A

big, because some policies work better with different access patterns

31
Q

what is sequential flooding?

A

a bad situation caused by the replacement policy: least recently used (LRU) with repeated sequential scans

32
Q

DBMS and an OS File System both do disk space & buffer mgmt too. why not let OS manage these tasks?

A

OS cannot
1. support multiple OS platforms
2. span disks
3. pin pages in a buffer pool
4. force a page to a disk
5. adjust replacement policies

33
Q

(smallest) fields, records, pages, and files (largest)

how do they relate

A

A field is the smallest unit of data in a database. It represents a single attribute of an entity. For example, in a database of employees, fields could include “Name,” “Employee ID,” “Salary,” etc.

Records represent single rows in the table. it represents a complete set of information about an entity. Using the employee database example, a record might contain all the information about a specific employee. Records represent single rows in the table.

Pages are the basic building block of files. A file is generally made up of many pages. It is a fixed-size block of data that is read from or written to the disk as a single I/O operation.

Files are stored on the disk and represent one table. it is a collection of pages with related records. The file contains multiple records, each with its fields.

34
Q

Describe the record format for a fixed-length

A
  • because everything is a fixed size, we don’t need to scan the record to find the i’th field, we can jump to it
  • length of the fields in each record has been set to be a certain maximum number of characters long
35
Q

Describe the record format for a variable-length

A

there are two formats

  1. fields eliminated by special symbols
    - needs to do sequential access
    - separated by symbols
    - has a field count
  2. array of field offsets
    - can directly access i’th field
    - have a small directory overhead to identify different fields
    - no field count
36
Q

Describe the page format for a fixed-length records

A

Every record has the same length therefore one slot is able to store one record.
Record id within a page is identified by <page id, slot #(where on the page)>

there are two formats for fixed length-length

  1. packed
    - all the free space/slots are at the bottom.
    -If you delete something you need to shuffle all records up to fill the hole
    - frequently update the record ids of everything moved
    - one pointer to the top of the free space
    - has a number of record count
  2. unpacked, bitmap
    - If you delete something, you leave the space empty and indicate with he bitmap that the spot is free.
    - bitmap has one bit for every slot to indicate if it is occupied or not
    - has a number of slots count
    - will maintain multiple pointers to point to all the free space holes
37
Q

Describe the page format for variable-length records

A

a page that contains multiple records will be structured as follows

  • a slot directory for N slots. The slot directory contains the offset of all records from the front of the page and will also contain how many bytes each record has
  • will have a number of slots counter
  • all records are pushed to the top of the page, and will be shuffled up to keep everything compact, all free space is at the bottom
  • must update the slot directory to indicate unused slot
  • are able to shuffle the records in the event of deletion without changing the slot directory, the only thing that changes is the offset of the record from the front of the page after a deletion
  • pointer to the start of the free space
    record id = <page i, slot directory #>
38
Q

why do we shuffle all the records up after a deletion in a page that has variable-length records?

A

If we keep holes, since the lengths of each record vary, we can’t guarantee that new entries can fit the old holes. we need to maximize the space used

39
Q

There are two types of file structures, what are they?

A

heap file

index file

40
Q

one of the types of file structures is heap files. Describe the details of a heap file

A
  1. information
    - Contain records in no particular order
    - support file scan
    - support search/delete by rid
    - keep track of pages with data and free space
    - cannot find records by field values without a file scan
  2. structure
    - heap file can be as a LINKED LIST
    - where there is a header page that points to a list of full pages and free-spaced pages
    - every page in the linked list has 2 pointers (front/back)
  3. structure
    -heap file can be as a PAGE DIRECTORY
    - keep a directory of header pages, where each directory entry points to a page and the number of free bytes on the page
    - directory is a linked list
    - quicker to add something into a heap file this way than if the heap file was a linked list
41
Q

one of the types of file structures is indexed files. Describe the details of a indexed file

A

we want to be able to search based of value, not just rid
- want to be able to do value-based search, we can use indexed files to do this

  • indexed file can be a sorted file
    – where records are sorted by search key
    – good for equality and range search.
    – Supported by sorted index.
  • indexed file can be a hashed file
    – records are grouped by hash value of search key. Good for equality search.
    – Supported by hashed index
42
Q

what is the difference between equality search and range search

A

equality search: search based of a value being equal to the requested
range search: search based of all values that satisfy the range

43
Q

explain how you can be sorted but not stored sequentially

A

the order will be sorted but the pages will not be, therefore, you need pointers to point to the next page needed to maintain sorted order

44
Q

what is a system catalog in a DBMS

A

a collection of tables and views that contain important information about a database (metadata)

45
Q

what does the system catalog keep metadata on?

A

For each file:
– name, structure(heap {Linked list/Page direcotry} / indexed {sorted/hashed}), attributes and types, indexes, integrity constraints, etc

For each index:
- Name, structure (e.g., sorted or hashed) and search key

For each view:
- view name and definition

  • statistics, authorization, buffer pool size, page size, etc.
46
Q

catalogs themselves are stored as _____________?

A

Catalogs are themselves stored as relations