Indexing Flashcards

Question

What is the range of children that an internal node can have in a B+ tree?

Answer 1

Between the ceiling of n/2 and n children

Answer 2

Ceiling of (n-1)/2 and n-1 values

Answer 3

At least 2

Answer 4

Records or pointer to next leaf node

Answer 5

A multi level sparse index

Answer 6

Logarithmic

Answer 7

Floor of logn/2(K)

Answer 8

Add the record to main file and create a bucket if necessary
If there is room in relevant leaf node, add new key value,pointer pair
Otherwise, split the node and add the new key value pointer pair

Answer 9

Sort entries first and insert in sorted order
Sort entries first and create tree layer by layer, starting with leaf level

Answer 10

Sort the values
Create the linked list representing the leaf nodes
Create the next level, left to right

Answer 11

Log structured merge tree
Buffer tree

Answer 12

Use bottom up construction

Answer 13

Insert multiple ordered indices at once

Answer 14

Compromises multiple LSM trees of increasing size with at least L0 fitting in memory

Answer 15

Values are inserted into L0 until L0 is full. Records are then moved onto L1 and into the disk, with bottom up construction done

Answer 16

If data permenantly matures over time, no additional changes to come back and make

Answer 17

Handled by special "delete" entries, which are dummy values to say it has been edleted. When trees are merged, entries are omitted if they contain the delete value.

Answer 18

Inserts are done using only sequential I/O operations, minimal block access
Leaves are full, no space wastage
Reduced number of I/O operations per record insert

Answer 19

Queries have to search multiple trees
Entire copy of each level made multiple times

Answer 20

Each internal node of the B+ tree has a buffer to store inserts, when buffer is full, records are sorted on search key and moved to appropriate child

Answer 21

Less overhead on queries
Can be used with any tree based structure. flexible

Answer 22

More random I/O than LSM trees

Answer 23

Unit of storage containing one or more entries

Answer 24

Function from the set of all search key values to the set of all bucket addresses

Answer 25

Used to locate entries for access, insertion and deletion

Answer 26

Entries with pointers to records

Answer 27

Map all the search keys to the same bucket

Answer 28

Uniform
Pseudo-random

Answer 29

Where each bucket is assigned the same number of search key values from the set of all possible values

Answer 30

Each bucket will have the same number of records irrespective of the actual distribution of search key values

Answer 31

The internal binary representation of the search key

Answer 32

Maps search key values to a fixed set of bucket addresses

Answer 33

Space is wasted, too many buckets

Answer 34

Too many values map to a given bucket, performance suffers as linear search and overflow buckets are created

Answer 35

(number of records that we think we will have)/(number of records that fit in a bucket) * (1+d) where d is a "fudge factor"

Answer 36

Insufficient buckets
Skew in actual distribution of records

Answer 37

Use of overflow buckets and overflow chaining

Answer 38

Linked list

Answer 39

Hash function can't easily be changed once database contains data, can't account for changes in the number of records

Answer 40

Periodically restructuring by changing the hash function but this is very expensive and disrupts normal operation

Answer 41

Allows the database to change size by splitting and combining buckets. Reorganisation performed one bucket at a time, so overhead is incremental.

Answer 42

Hash function generates values over a large range represented by b bit integers Buckets created on demand Not all b bits of the hash used

Answer 43

An index into an additional table of bucket addresses

Answer 44

Calculate the hash value, take the i most significant bits and use decimal value of these bit as a numerical index into the bucket address table

Answer 45

Lookup to find bucket, if there is enough room insert it, or if there is not, split the bucket and redistribute entries

Answer 46

Compute the hash function, use the first i bits as a decimal index into the bucket address table, follow pointer to correct bucket and do sequential search

Answer 47

ij of the desired bucket and the number of bits used for i

Answer 48

ij < i
ij = i

Answer 49

Can split the bucket without increasing the size of the bucket address table

Answer 50

Allocae new bucket and change ij and iz to ij+1. Rehash all records in bucket j, alllocate them to correct bucket

Answer 51

Can't split the bucket without expanding the size of BAT

Answer 52

Increment i by one (doubling size of BAT)
Each BAT cell is replaced by two cells with pointers to the same bucket
Split each cell into two cells which both point to the same bucket that they would have been in originally
Restart the insertion

Answer 53

Does not degrade with growth of file
Minimal space overhead
Rehasing is incremental and one bucket at a time, very local changes

Answer 54

Extra level of indirection to find desired record (BAT)
BAT may itself become very large

Answer 55

B+ structure to locate desired record in BAT

Answer 56

Can you deal with having to reorganise hash?
How often will you be inserting or deleting?
Is average or worst case time more important?
What type of queries are most common?

Answer 57

Ordered index

Average time and worst case proportional to log number of values of that attribute

``` Hash

Average time constant
Worse case proportional to number of values of that attribute

```

Answer 58

Ordered index

Answer 59

No easy way to find the next bucket in order

Indexing Flashcards

(87 cards)