DSA - B-Trees Flashcards
DEFINE B-Tree:
B-Tree of order m is a type of n-ary tree
- A Multi array tree each node has many children
- A memory data structure that is ALWAYS balanced
NOT BINARY TREE !!
What are the Properties of a B-TREE
- Every node has at MOST m children (IF ORDER m)
- Every NON leaf node, bare root, has at least m/2 children
- Root Node has at least 2 Children
- All leaf nodes appear on same Level
- ALWAYS HEIGHT BALANCED
- Update & Search are O(log(n))
- NON- leaf with c children have c-1 search key values, that act as separators or discriminators, which guide searches down correct subtree
Why is this:
“Every NON leaf node, bare root, has at least m/2 children” Ideal?
Guarantees:
-Not having long paths from Root => Leaf
- Order log(n) ==> Guarantees a Balanced Tree
Why is this:
“All leaf nodes appear on same Level”
Ideal?
Guarantees:
- EVERY Path From Root => Leaf is the same Length
- This also guarantees order log(n)
What is the Purpose of :
“ NON- leaf with c children have c-1 search key values which guide searches down correct subtree” ?
Decides which path we take by comparing values we are searching for with these values.
Thus no need to correspond to actual records we are storing
(DIRECTION THINGS)
What causes the Definition of B-Tree to vary amongst Authors?
m
as it could be:
- no. children
- min no. children
- max no. children
- max/min no. keys in NODE
Does B-Tree have any SIGNIFICANT advantages over AVL or other Height Balanced Binary trees, when used as in-memory data structure?
NO
- Its just more complex to implement
What variant of B-Tree is used as the main-stay of Databases and the most comment external data structure in use?
B+ Tree
What does Data Record mean?
An element of data information to be managed by the B+Tree
- It is what we are storing and what we want managed by the Tree
Define Key Value:
Value by which we identify the record
- B+ Tree will use the Key value to allow us to find the block
our data is in - SIMILARY to PRIMARY KEY
NOTE: can have key values with no records
Define Discriminator:
Value used to decide which path to take down the tree in searching for record
- ALMOST always the key value of some record, but doesn’t
need to be
– Take KEY value and put into index to provide discriminator
NOTE: can have key values with no records
Define Disk Address:
Offset from the start of the disk file to a particular block in file
THINK of it as a pointer to Disk Memory
- Tells us which block number on disc are we storing our block
Describe the External Data Structure?
Is one which is Stored on eternal or secondary memory rather than internal memory RAM.
MEANING that the Data Structure:
- Persists beyond the end of the program execution without
the programmer having to explicitly save/ load the
structure to/from disk (Program Crashes the DATA is still in
B+ Tree)
- Grows to Size larger than can fit in internal memory, being
limited only by the amount of secondary storage available - Accessed & Updated by multiple different programs at the
same time so long as the suitable coordination protocols
are observed.
What happens if Program ENDs?
It will remain in the disc so next time all data in B+ Trees is still there
What is Record-Embedded?
- Store the data and records in the leaf node of B+ tree
- Suitable DS when you only need to find record by 1 search
- Self-contained ds with all index info & data records in one file
- DELETE THIS WOULD ALSO DELETE RECORDS
What is an Index B+ Tree?
Stores only key values & disk addresses in the leaf nodes, Identifies the disk block in the separate file of data where the record with corresponding key value resides
- Has 2 files : Record & internal Nodes of B+ tree
- Throw away index without deleting record
- Can have 2 index files that index to same set of records
- 2 Types - Primary & secondary
Primary Index B+ Trees
- Records are kept sorted by key value used in B+ Tree in data file containing the block of records
1 Pointer for each block on lowest level
E.g 1000 discriminator values then you have 1001 disc pointers to 1001 record files
Secondary Index B+ Trees
- Records are not necessarily sorted in the data file of blocks
- Need a pointer that jumps to each record
- For each record in the record file you going to have a pointer to point to block record is in
- 1 Pointer for evey record on lowest level
- Key values may appear higher in the tree
Search Operations in RECORD EMBEDDING?
SEARCH :
Start at root & follow path down indicated by discriminator values.
RANGE:
Search for record with a key at one end of the range and iterate using next/pre disk addresses in leaf level, over all records in range
Complexity of B+ Trees?
n-ary tree, with height balanced, so search & Insertion is going to be O(log n)
Cost of the disk read is much large than cost of the processing of in-memory aspects of the data structure that we can ignore this cost