Indexing Flashcards
What are the two methods for reducing access costs?
<ol>
<li>Ordered indexing</li>
<li>Hash indexing</li>
</ol>
Describe the basic overview of indices
Speeds up access to desired data by creating an index file consisting of records containing a search key and a pointer
What is the search key?
Attribute or set of attributes used to look up records in a file
What are the factors in index design?
<ul> <li>Access types (specific vs range)</li> <li>Access time</li> <li>Insertion time</li> <li>Deletion time</li> <li>Space taken by the index</li> </ul>
What is the primary index?
The index whose search key specifies the order of records in the data file
What is a primary index also knowna as
Clustering index
Describe dense primary indices
Index record appears for every search key value in the file
Describe sparse primary indices
A sparse index has index records for only some of the key values
How are records found with sparse primary indices?
Choose the index record with the greatest value <= search key and scan sequentially from that record
What is a benefit of sparse indices?
Takes less space and requires less maintenance, can insert without much bother
What is a benefit of dense indices?
Gives faster lookups, can go directly to the record
If the primary index doesn’t fit in memory, how can sparse indices be used?
Treat primary index on a disk as a sequential file and construct a sparse index on it
How is insertion done for dense indices?
If the search key value does not appear in the index, insert it
How is insertion done for sparse indices?
No changes need to be made unless a new block is created, and if new block created then first search key value appearing in the block is inserted into the index
How does deletion happen for dense indices?
If only one record at that search key then delete, otherwise update the pointer to point towards the values that need to continue to be stored only
How does deletion happen for sparse indices if the deleted value was the only record in the file with it’s search key value?
If an entry for the next search key value already exists in the index, delete the index entry for the deleted record’s search key
Otherwise
Update the index entry to point to the first record with the next search key
How does deletion happen for sparse indices if the deleted value was not the only record in the file with it’s search key value?
Update the index entry to point to the next record with the same search key value
Where do pointers in secondary index point to?
Buckets
What are the buckets in terms of secondary index referencing?
Grouping of pointers to records of the same search key
Why do secondary indices need to be dense?
File is not stored in the search key (secondary index) order
What is the main disadvantage of index sequential file organisation?
Performance degrades over time, periodic reorganisation may be necessary to maintain performance
What is the advantage of a B+ tree?
Automatically reorganised with small, local changes
What are the disadvantages of B+ trees?
Insertion and deletions are costlier, space overhead
Describe the length of paths from root to leaf in a B+ tree
All paths from root to leaf are of the same length
What is the range of children that an internal node can have in a B+ tree?
Between the ceiling of n/2 and n children
How many values can a leaf node have between?
Ceiling of (n-1)/2 and n-1 values
If a root is not a leaf how many children must it have?
At least 2
What do the pointers point to in a non leaf node?
Children
What do the pointers point to in a leaf node?
Records or pointer to next leaf node
What do non leaf nodes form in a B+ tree?
A multi level sparse index
What order of magnitude is the number of levels in a B+ tree?
Logarithmic
What is the maximum path length in lookup in a B+ tree?
Floor of logn/2(K)
What are the steps for insertion in B+ tree if search key is not present?
<ol><li>Add the record to main file and create a bucket if necessary</li><li>If there is room in relevant leaf node, add new key value,pointer pair</li><li>Otherwise, split the node and add the new key value pointer pair</li></ol>
What are the two strategies to bypass the write inefficiency of B+ trees?
<ol>
<li>Sort entries first and insert in sorted order</li>
<li>Sort entries first and create tree layer by layer, starting with leaf level</li>
</ol>