Week 7 Flashcards
Tree structures - a reminder
What is a Search tree?
Special type of tree used to guide search for a record, given a value of one of a field’s records (aka the search field)
- Consider search tree, order p
- Node contains at most p - 1 search values (K) and p pointers (P)
Search tree Example
To search for value X we follow the appropriate pointer P, according to the constraints
- Tree here is of order p=3
Search tree characteristics
- Can be used as a mechanism to search for records stored in a disk file
a. Values in tree are values of search fields
for file (e.g. an indexing field)
i. Each value is associated with a pointer to
a record or a disk block containing record - Search tree can itself be stored on disk
a. Algorithms required for inserting and deleting values whilst maintaining constraints
b. Such algorithms typically do not ensure tree is balanced
Unbalanced tree
- Does not have all leaf nodes at the same level of the tree
- Depth of tree is not minimised for a given set of keys
- As records updated and deleted, tree may become skewed, with some branches going much deeper than others
a. May result in wasted storage space
b. Unpredictable search times as some keys
may require far more tree traversal than
others
What is a B-tree?
- Specialised case of search tree with additional constraints to:
a. Ensure tree always remains balanced
b. Minimise space wasted by deletion
What are the properties that need to be satisfied for a B-tree of order m
- Every node has at most m children
- Every internal node has at least m/2 children
- Every non-leaf node has at least 2 children
- All leaves on same level
- Non-leaf node with k children contains k-1 keys
Is B-tree a binary tree?
B-tree is NOT a binary tree
- Binary tree is a special case of tree structure with at most 2 child nodes per node
- B stands for balanced.
B-tree Formal Definition
B-tree of order p, when used as access structure on a key field to search for records can be defined as:
- Each internal node is of the form:
a. <P1, <K1, Pr1 >, P2, <K2, Pr2>, … , <Kq-1, Prq-1>, Pq>
- Where q <= p
- P1 is a tree pointer to another node in the tree
- Pri is a data pointer to record with search key value = Ki (or to data block containing record)
- Within each node K1 < K2 < … < Kq-1
- For all search key values X in subtree Pi (the ith subtree), we have:
a. Ki-1 < X < Ki for 1 < i < q; X < Ki for I = 1; and Ki-1 < X for i=q
- Each node has at most p tree pointers
-
Node in a B-tree with q - 1 search values:
B-tree, order p = 3
Compare with original search tree
1. Tree is balanced all leave nodes on level 1
2. All tree pointers at level 1 are NULL
Root node insertion
- B-tree starts with a single root node (which is also a leaf)
- When the root is full with p-1 search key values and insertion attempted
a. Root splits into two nodes at level 1
b. Middle value is kept in root node
c. Rest of values split evenly between other two ndoes
Non-root node insertion
When non-root node is full and new entry inserted:
1. Node split into two nodes at the same level
2. Middle entry is moved to parent node along with two pointers to new split nodes
3. If parent node is full it is also split
4. Splitting can thus propagate all the way to root node creating a new level if root is split
Deletion on B-tree
If deleting values causes node to be less than half full
a. It is combined with neighbouring nodes
b. Combination of nodes can also propagate to root
c. Node deletion can reduce number of tree levels
d. This allows more efficient usage of storage space
What is a B+trees
- Variation of the B-tree
- Commonly used for implementing dynamic, multi-level indexes
- In B-tree, every value of search field appears at some level in tree along with data pointer
- In B+tree, data pointers only stored at leaf nodes of tree along with a value of search field ∴
a. Structure of leaf nodes differs from structure of internal nodes - Leaf nodes usually linked to provide ordered access on search fields to records
- Thus leaf nodes similar to base level of an index
- Internal nodes of B+ tree correspond to other levels of multi-level index
- Some search field values from leave nodes may be repeated in internal nodes to guide search
B+ trees internal nodes
- Each internal node has at most P pointers
- Each internal node (excluding root) has at leas [p/2] tree pointers. Root node has at least two tree pointers if it is an internal node
- An internal node with q tree pointers q<=p, has q-1
Internal node structure
B+ trees leaf nodes.
- No more P pointers in node structure pointing at subtrees (it is a leaf after all)
- Each leaf node has at least [p/2] values
- All leaf nodes at same level
- A Pprevious pointer can also be included