Search Trees Flashcards
Ordered Data Sets
hashing stores items in random order
data should be ordered by key to support insert, remove, lookup
operations: lower bound, upper bound, range scan
Standard Binary Search (arrays)
logn steps if array is sorted
0.5 branch misses per comparison on avg for random searches
access pattern is not cache friendly
on large arrays is slower than in B-tree
Branch-Free Binary Search
conditional branch can be transformed to data dependency
no branch misses for small arrays
Interpolation Search
assumes keys are uniformly distributed(distance between Elems is same/close to it) 10 13 15 26 28 30
a single, very large or very small outlier will destroy interpolation
can be combined with binary search
Linear Search
O(n) steps
can be improved using SIMD instructions
good locality
array doesn’t have to be sorted for point search
best solution for small arrays
Blocking Layout
store hot keys together in blocks
recursive structure
improves locality
allows use of SIMD instructions (k-ary search)
Growing Arrays
better locality than linked lists
one downsize is growth because of limited size
standard approach: exponential growth plus copy
Growing Arrays without Copying
64 pointers pointing to array of values
good locality
Comparison-based Search Trees (binary search trees)
compare one elem at time
each comparison rules out half the elems (if tree is balanced)
log2n comparisons and height
Binary Search Trees
each node stores one key/value pair and two child pointers
complex rebalancing logic
bad cache locality
pointer based: elem doesnt have to be moved physically during rebalancing
Adv:
if you insert an element, you just allocate a new entry and point into tree
if tree rebalanced: no need to copy, you just change pointer structure of the tree
Skip List
elems in sorted linked list + additional shortcuts on top
probabilistic data structure (no worst case guarantees)
all items are stored in a sorted order by the key
B+-tree
variant of B tree where inner nodes only store pointers, no values
leaf nodes are often chained to simplify range scan
Sorted Arry
worst case for n elems:
lookup log2n
insert n
Sorted Blocks of Fixed Size
split array into small sorted arrays of b elems, b is fixed
Inner Nodes
indicate path to leaf node that has actual data
tree height: logb(n)
comparisons: log2n