Storage Flashcards

1
Q

Hard Disk

A
  • Persistent storage; all data is preserved when the power is turned off
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Disk Inefficiency

A
  • In the time it takes to access the disk, the processor can execute millions of instructions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Char vs Varchar

A
  • Char: a fixed-length string of exactly n characters, padded with spaces as needed
  • Varchar: a variable-length string of up to n characters with no padding
  • Values will be truncated if they are too long
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Fixed-Length Records

A
  • Always allocate the maximum possible length
  • Use a delimiter if the value is shorter than the max
  • Every record has the same length, and fields are in a consistent location
  • Potential waste of space
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Variable-Length Records

A
  • Only allocate the space that each record needs
  • Option 1: terminate field values with a special delimiter character
  • Option 2: precede each field by its length
  • Option 3: put offsets and other metadata in a record header (-1 for NULL), and include an offset for the end of the record
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

B-Tree (Order m)

A
  • Each node has between m and 2m items
  • Each internal node has between m + 1 and 2m + 1 children (except the root node)
  • Has a perfect balance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Search in B-Trees

A
  • Never need to search more than one of the subtrees of a node
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Insertion in B-Trees

A
  1. Search for k until you reach a leaf node
  2. If the leaf node has fewer than 2m items, add the new item to the leaf node, else split up the node, and insert the middle key into the parent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

B-Tree Efficiency

A
  • # of nodes accessed <= tree height + 1
  • Tree height <= log_{m}(n) where n is the number of items
  • Search and insertion are O(log_{m}(n))
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

B+Tree (Order m)

A
  • Data items are only found in the leaf nodes
  • Shorter tree requires less disk I/O
  • Leaves are linked together to improve efficiency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Search in B+Trees

A
  • Keep searching until a leaf node is reached, even if the key is seen in an internal node
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Inserting in B+Trees

A
  1. Search for k until we reach a leaf node, even if k is an internal key
  2. If the leaf node has more than 2m items, split the leaf node
  3. Put the first m items in the left node split, and the remaining m + 1 items in the right node split (including the middle item)
  4. Copy the key of the middle item into the parent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Deletion in B-Trees and B+Trees

A
  • Search for the item and remove it
  • When a node has fewer than m items, if a sibling node has more than m items, take items from it, otherwise merge it with the sibling
  • If the key of the removed item is in an internal node, don’t remove the node
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Static Hashing

A
  • The number of buckets never changes
  • Use overflow buckets when full
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Simple Dynamic Hashing

A
  • When the hash table gets too full, double the number of buckets and rehash all existing items
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Linear Hashing

A
  • Treats the hash value as a binary number, using the i rightmost bits of that number
  • i = ceil(log_{2}(n))
  • If there is a bucket with the index given by the i rightmost bits put the key there, else use the bucket specified by the rightmost i - 1 bits
17
Q

Linear Hashing - Adding a Bucket

A
  • Add only one new bucket
  • If the new bucket’s binary index is 1xyz, only rehash the bucket with binary index 0xyz
18
Q

Choosing an Index Structure

A
  • If the set of frequently accessed pages can fit in the cache, use a B-tree/B+tree
  • If subsequent queries are for nearby keys, use a B-Tree/B+tree
  • If the set of frequently accessed pages is very large, use a hash table