Indexing Flashcards

Question

is an ISAM a dynamic or static tree structure?

Answer 1

static, insert and delete will only affect over flow pages, the tree itself will never change

Answer 2

dynamic, it will change and respond to changes in data size

Answer 3

the tree must be - height balanced - maintain 50% occupancy in each node except for the root

Answer 4

d is the order of the tree. Use d to determine the minimum occupancy allowed. Minimum occupancy is : d <= m <= 2d

Answer 5

to not waste space

Answer 6

the height of the tree and whether or not you are alt 1 or alt 2/3 and if you are clustered or unclustered

Answer 7

trees are fatter than they are tall

Answer 8

1. find correct leaf 2. put data into L 2a. if L has enough room we are done 2b. not enough room in leaf, must split the leaf into L and L2. - redistribute the entries between L and L2 and copy the middle key to the parent -check i parents also need to be split and follow the same logic

Answer 9

in parent redistribute entries evenly but push up the middle key in leaf, redistribute entries evenly but copy up the middle key

Answer 10

1. start at the room and find the leaf L where the entry belongs 2. remove the entry - if L is still half full, done 3. redistribute by borrowing entries from your sibling node 4. if redistribution fails, merge L and it's siblings 5. if merged, must delete the entry pointing to L or sibling from the parent

Answer 11

when you create a tree, loading one record one at a time takes a while and will not give you sequential storage for your leaves. Instead you bulk load where you can efficiently create a B+ tree. When initializing you need to, sort all data entries, insert a pointer to the first leaf page in the root page and continue to insert into the right most index page. when you fill up the page, split along the right most path

Answer 12

index entries don't have *, remember that * tells you that we are looking at a data entry. This data entry will contain a pointer to the data and a rid which is comprised of page id and slot number. Files with * just need to hold more information

Answer 13

they could possibly change rids = page number, slot number if you insert/delete/merge/split, you could be changing the page numbers

Answer 14

fanout is equal to number of values + 1. ex. if you have 9 values, you have 10 pointers

Answer 15

- # of primary pages is fixed - allocated sequentially - primary pages are never deallocated - will be overflow pages if needed - buckets are identified with h(k) mod N = bucket - h(k) is a hash function that works on the search key k - possible to develop long overflow chains which will degrade performance - if we are looking for an entry that has a long overflow chain, we will need to look at all the overflow pages to find the entry

Answer 16

EH uses a directory of pointers to point to buckets. We will split and rehash a bucket that overflowed, this may or may not lead to directory doubling good b/c - cheaper to double only the directory - only one page is split and rehash - no overflow pages need to keep track of global and local dept, we will need to double the directory when an insert causes local depth > global depth after an insert for delete - if we delete and it leaves an empty bucket, the bucket is merged with its 'split image' (pairs that only differ in the left most bit)

Answer 17

if we have skewed distribution, where many entries have the same hash value it can lead to an overflow chain because the directory can grow large and data entries remain unsplit

Answer 18

we splits the bucket in round-robin fashion, without using directory. we have a collection of hash functions that we will swap in depending on how many bits we need to represent a difference in buckets. there will be a next pointer that will point to the next bucket in line to be split. regardless of which bucket overflowed, we will split only the bucket indicated. will degrade in performance if distribution skewed

Indexing Flashcards

(46 cards)