Exam 2 Flashcards
(41 cards)
Database file index
similar in theme to an index in a textbook
data structure allowing fast lookups for data
Types of indexes
Ordered Indexes
Hash Indexes
ordered indexes
values are stored sequentially within the file
Hash indexes
values uniformly distributed across a range of buckets (blocks) by a hash function
Why is the word values used instead of records when talking about ordered and hash indexes
because it is the values which determine the location of the records
What is an index
metadata that helps access record data
stored in blocks. The blocks contain pointers into other blocks ( which contain records)
what are the 2 orderings of indexes
primary and secondary
primary index
for a sequentially ordered file, the index whose search key specifies record order
also called clustering index
the search index is usually the primary key
secondary index
an index whose search key specifies an order different than the file order
records are not sorted by the search key
also called non-clustering index
what are the 2 Formats for indexes
Dense and sparse
Dense indexes
the index points to sequential records stored within a file.
one index entry per search key
Sparse index
contains index records for only some of the search-key values
only applicable when records are sequentially ordered by the search key (clustering indexes)
pros and cons for dense and sparse indexes
dense
-entry per key value
-faster lookup time
-more space required
-more maintenance
sparse
-fewer entries
-slower lookup time
-less space required
-less maintenance
good tradeoff is one sparse index entry per block
B+ Tree
Multilevel index structure
A balanced, tree-based data structure
High fanout
automatically reorganizes itself on inserts/deletions
reorganizations of the entire file is not needed to maintain performance
structure of B+ Trees
n pointers
n-1 search keys
search keys are organized such that
if i < j then Ki < Kj
internal nodes - an internal level of nodes provide a sparse index on the level of nodes below it
leaf nodes - leaf nodes provide a dense index on the file
what are the rules for internal nodes in a B+ Tree
may hold up to n pointers
must hold at least n/2 pointers (round up)
what are the rules for leaf nodes in a B+ Tree
may hold up to n-1 pointers
must hold at least (n-1) / 2 values (round up)
what happens when a node is at capacity when trying to insert
split up the node
what happens when a leaf node falls below the (n-1)/2 (round up) values
we coalesce nodes by either
merging the node with a sibling node
redistribute the entries between the nodes to ensure that each is half full
cost of select - full file scan - no index, a heap
cost = br * tT + tS
br denoting all block in relation r
cost of select - linear search on a sorted file
cost = br/2 * tT + tS
read half the blocks on average (br/2)
cost of lookups on B+ Tree indexes
hi + 1
hi defines the tree height
+1 for the data block beyond the leaf node
cost of select on a B+ Tree ordered by search key
cost = (hi + 1) * (tT + tS)
what are the steps of query processing
1 parsing & translation
2 optimization
3 evaluation