Search Trees Flashcards by Oleksandra Talalaieva

Ordered Data Sets

hashing stores items in random order
data should be ordered by key to support insert, remove, lookup
operations: lower bound, upper bound, range scan

How well did you know this?

Not at all

Perfectly

Standard Binary Search (arrays)

logn steps if array is sorted
0.5 branch misses per comparison on avg for random searches
access pattern is not cache friendly
on large arrays is slower than in B-tree

How well did you know this?

Not at all

Perfectly

Branch-Free Binary Search

conditional branch can be transformed to data dependency
no branch misses for small arrays

How well did you know this?

Not at all

Perfectly

Interpolation Search

assumes keys are uniformly distributed(distance between Elems is same/close to it) 10 13 15 26 28 30
a single, very large or very small outlier will destroy interpolation
can be combined with binary search

How well did you know this?

Not at all

Perfectly

Linear Search

O(n) steps
can be improved using SIMD instructions
good locality
array doesn’t have to be sorted for point search
best solution for small arrays

How well did you know this?

Not at all

Perfectly

Blocking Layout

store hot keys together in blocks
recursive structure
improves locality
allows use of SIMD instructions (k-ary search)

How well did you know this?

Not at all

Perfectly

Growing Arrays

better locality than linked lists
one downsize is growth because of limited size
standard approach: exponential growth plus copy

How well did you know this?

Not at all

Perfectly

Growing Arrays without Copying

64 pointers pointing to array of values
good locality

How well did you know this?

Not at all

Perfectly

Comparison-based Search Trees (binary search trees)

compare one elem at time
each comparison rules out half the elems (if tree is balanced)
log2n comparisons and height

How well did you know this?

Not at all

Perfectly

Binary Search Trees

each node stores one key/value pair and two child pointers
complex rebalancing logic
bad cache locality
pointer based: elem doesnt have to be moved physically during rebalancing

Adv:
if you insert an element, you just allocate a new entry and point into tree
if tree rebalanced: no need to copy, you just change pointer structure of the tree

How well did you know this?

Not at all

Perfectly

Skip List

elems in sorted linked list + additional shortcuts on top
probabilistic data structure (no worst case guarantees)
all items are stored in a sorted order by the key

How well did you know this?

Not at all

Perfectly

B+-tree

variant of B tree where inner nodes only store pointers, no values
leaf nodes are often chained to simplify range scan

How well did you know this?

Not at all

Perfectly

Sorted Arry

worst case for n elems:
lookup log2n
insert n

How well did you know this?

Not at all

Perfectly

Sorted Blocks of Fixed Size

split array into small sorted arrays of b elems, b is fixed

How well did you know this?

Not at all

Perfectly

Inner Nodes

indicate path to leaf node that has actual data
tree height: logb(n)
comparisons: log2n

How well did you know this?

Not at all

Perfectly

B-tree logic decomposition

Study These Flashcards

two parts:
1. structure-modification operations (split, merge, new root) determine shape of B tree
2. per-node operations (lower bound, insert, delete, iterate) work on nodes locally
(2) can be changed without affecting high level logic (1)

Physical Node Layouts

Study These Flashcards

most common layout: array of sorted key/value pairs
binary search within each node
insert/delete moves half of elems on avg(to the right)

alternatives: store keys and values separately, unsorted leaf nodes but sorted immer nodes, encode binary search tree, trie or hash table within node

what is the best node size?

Study These Flashcards

in main memory, node size (b) can be freely chosen
node size should be at least cache line size
optimal page size for lookup is bigger that for inserts

B trees with variable-length data

Study These Flashcards

bad cache locality, because need ti dereference the pointer which generally will not be in cache
lots of memory allocation
memmory fragmentation from memory allocator

Slotted page layout

Study These Flashcards

enables variable-lenght data in sorted order
slots decouple logical from physical ordering
slots store offset and length

Compaction

Study These Flashcards

deletion in slotted page leaves holes
keep holes after deletion, but keep track of how many bytes are wasted
when running out of space, one can trigger compaction if its beneficial
best implemented by copying all keys to new empty page

Separator Choice

Study These Flashcards

when splitting node: choose shortest key around median as separator

Suffix Truncation of Separator

Study These Flashcards

keys in inner nodes are only used as signposts, do not have to be stored in leaf nodes
if separator is ABCDE and next key is AXYZ, separator can be truncated to AX

Prefix Truncation, its difficulty

Study These Flashcards

extract common prefix between smallest and largest key on node
on lookup, compare prefix first
speeds up comparison and saves space

issue: insertion of key may cause prefix to shrink -> all keys need to be enlarged, may not fit into node anymore, causing a split
solution: store fence keys on each page

Fence keys for prefix truncation

use common prefix of fence keys those are immutable after page was created by split and merge can be stored on page or derived on demand

key head in slot optimization

with slotted page layout, binary search requires loading key for every comparison extract or copy first 4 bytes of key into slot

Hints

extract a fixed number of equi-distant heads and store them consecutively

Textbook merge (deletion and space management in b trees)

to avoid underfull nodes after deletion, one must merge or rebalance if node is underfull (<50% fill factor): find a neighboring node if neighboring node has low enough fill factor: merge two nodes into one, remove separator in parent, if parent is underfull merge parent the same way else rebalance elems between two nodes

B*tree

guarantees 66% fill factor. insert and delete rebalance between 3 neighboring nodes

Xmerge

insead of balancing during insert and delete, increase fill factor in background if fill factor is low, merge X neighboring nodes into X-1 nodes

B-tree: Cascading Insert

Scenario: * leaf node is full, have to split and insert separator into parent * parent is also full, have to split and insert separator into its parent * repeat… → Insert can cascade upwards until a new root has to be created Alternative: eager split * when walking the tree during insert, split full inner nodes eagerly * thus, when hitting the root node, the parent always has space * simpler, can be faster, but average tree fill factor a bit lower

why is a B-Tree usually faster than a binary tree

Instead of storing one key and having two children, B-tree nodes have n keys and n+1 children, where n can be large. This shortens the tree (in terms of height) and requires much less disk access than a binary search tree would.

* in which scenario can binary search be slower than linear search?

if data is unsorted

what data structure is a probabilistic alternative to a binary tree?

Skip lists

* what is the main difference between a B+ -Tree and a classical B-Tree?

insert and delete rebalance between 3 neighboring nodes

* how many comparisons does binary search require on an array with N entries?

logN

* how can you speed up lookup in a B-Tree?

link leaf nodes together, because range queries are faster

Search Trees Flashcards

(37 cards)