DBMS - B-Trees Flashcards
Tree
Data structure whose components are a set of nodes and edges. Each node has values, each edge connects two nodes.
Binary Tree
Directed edge (i.e child to parent), each child has only one parent, each node is either a parent of one or two nodes or a leaf. The root has no parents.
Search Tree Properties
- Always one more pointer than keys
- Each key value represents an indexing key
- Each non-null pointer refers to a sub-tree of node structures.
Structure of Binary Search Tree
Two pointers, one key value
B-Tree structure
Large number of pointers and keys in a node. The values of ki in a page are ordered. Duplicate key values are allowed.
Strict data structure and data placement leads to
overflows
Requirements and Issues of Tree
- Balance
- Leafs must be on same level
- Block Based
- Allow direct and sequential reads
- High space utilisation per block
- Resilient to some abuse
- Activities have similar cost basis
- Sharing mechanism is possible and reasonable
Why do we use B-Trees over BSTs
Can choose to build trees from bottom up instead of top down. Allow better root to emerge from the key’s parent while keeping the structure balanced.
B-Trees and Paging
B-Trees are perfect for paging. Performance enhanced when root page is kept in ultra fast memory storage.
B-Trees Computational Requirement
Deterministic side
- Can work out index size and no. of levels given no. of keys and page size.
- Can work out min/max time to compute operations over B-Tree index data structure.
Order (in B-Tree)
Maximum number of descendants a node or page should have
Root (in B-Tree)
First page to read and where first multi-way split. Higher order = more partitions possible.
Leaves (In B-Tree)
Are the pages at the lowest level of the B-Tree, with no descendants?
B-Tree Properties for an M-Order Tree
1) Every page has a maximum of M descendants
2) Root has at least 2 descendants unless its a leaf
3) All leaves are on same level
4) A nonleaf page with k descendants contains k-1 keys
5) A leaf page has at least (ceiling(m/2)-1) keys and not more than m-1 keys
6)Every page beside root and leaves has at least ceiling(m/2).
7) Any keys present in a page are in order
8) For all search keys of value X in sub-tree pointed by P: Ki-1<xKi for 1<q, x<Ki for i=1, Ki-1<x for i=q
B-Trees
- Balanced
- Shallow and require less disk reads as traversal is short
- Random inserts/deletes accommodated at reasonable and predictable cost
- At least load 50% use of claimed space - average 66%
- Sequential search not implicit but possible
- Flexible to accommodate non-key search field values
Searching
Efficient as each page is a multi-way discriminator (at best order m). Could be implemented by reading a block into single main memory buffer.
Insertion + Splitting
Relatively involved procedure, especially when page becomes saturated and needs to be split into two pages with the medium key promoted.
Promoted key might entail other splits recursively.
Small number of disk pages need to be concurrently read into main memory and at worst there can be as many writes as tree depth.
Deletion
If non leaf key must be pages, the next key must raise to fill the gap. If a page’s key count is below acceptable level, then re-grouping to the preceding page must happen.
Small number of pages must be read concurrently into main memory
Number of writes in delete
= number of inserts
All Split and Promote Operations
Searching, Insert+Splitting and Deletion all take O(logn m) time in worst case.
Given B-Tree order m with n keys.
Observation 1 (Operations)
Number of descendants a certain tree level has is equal to number of keys present in that level and all other higher levels + 1.
Observation 2 (Operations)
Minimum number of descendants that any level can have; permutation yields a B-Tree that has max height and min breadth.
Specifically: - Root Page has 2 as min
- 2nd level has ceiling(m/2) per page i.e 2.
- The nth level has ceiling(m/2)^(d-1)*2
Create Index in Postgres
CREATE Table emp(…) ;
CREATE INDEX idx_emp_job ON emp(job);
CREATE INDEX idx_emp_deptno_job ON emp(deptno,job);
Index Selection Issues
Indexes help if lots of data present even in non-key fields. Indexes need updating.