Indexing Flashcards
What is the purpose of database indices?
To enable efficient query processing by avoiding full relation scans
What are the two basic types of indices?
Ordered indices and Hash indices
What are the key factors for evaluating index techniques?
Access type support, lookup time, insertion time, deletion time, and space overhead
What is a clustering/primary index?
An index where the search key defines the sequential order of the file
What is a secondary index?
An index whose search key specifies an order different from the sequential order of the file
What is a dense index?
An index with entries for every search-key value in the file
What is a sparse index?
An index with entries for only some search-key values, usable only when records are sorted by search key
What is a multilevel index?
An index that creates a sparse outer index on the original (inner) index to improve search efficiency
How does B+-tree handle duplicates?
By either storing multiple entries or using buckets to store pointers to all records with the same key
What is the main advantage of B+-tree over static indexing?
Maintains efficiency despite insertions and deletions without requiring reorganization
What are the key properties of B+-trees?
All paths from root to leaf are same length, nodes have between ⌈n/2⌉ and n children
What is bulk loading in B+-trees?
Efficient technique to build index by sorting entries first, then building tree bottom-up
What is the main limitation of hash indices?
They don’t support range queries effectively
How does overflow chaining work in hash indices?
Records that hash to a full bucket are stored in overflow buckets linked to the original bucket
What is an LSM tree?
Write-optimized index with multiple B+-trees starting from in-memory to on-disk structures
What is the key idea of buffer trees?
Associate buffers with internal B+-tree nodes to reduce write operations
What are bitmap indices used for?
Efficient querying on multiple keys using bit arrays for each possible value
What is the main advantage of learned indices?
Can transform log(n) lookup cost to constant time by learning data distribution
What is ALEX?
An updatable learned index that combines ML models with traditional indexing techniques
How does ALEX handle dynamic workloads?
Uses Gapped Arrays for efficient insertions and Model-based Inserts for placement prediction
Why is sorting important for indexing?
Enables efficient index creation and maintenance, especially for B+-trees
What is the complexity difference between comparison-based and distribution-based sorting?
Comparison-based: O(N log N), Distribution-based: O(NW) where W is the number of digits
What is prefix compression in B+-trees?
Technique to increase fanout by storing only distinguishing prefixes of search keys in nonleaf nodes
What happens during B+-tree node splitting?
When a node becomes too full, it’s split into two nodes with ⌈n/2⌉ entries in each
How does a B+-tree maintain balance?
Through splitting when nodes are too full and merging when too empty
What is the cost of B+-tree operations?
Proportional to the height of the tree, which is log⌈n/2⌉(N)
What’s the difference between B-tree and B+-tree?
B-tree eliminates redundant storage of search keys but requires additional pointer fields
How does an R-tree handle spatial data?
Uses rectangular bounding boxes to index spatial objects in a hierarchical structure
What are the limitations of ALEX?
Cannot handle adversarial workloads, duplicate keys, or concurrent updates
What is the k-d tree structure?
A spatial index that recursively partitions space along different dimensions
How does bitmap index handle multiple keys?
Uses logical AND/OR operations between relevant bitmaps
What is temporal data indexing?
Uses specialized indices or R-trees to handle time intervals in data
Why use stepped-merge in LSM trees?
Reduces insert cost by allowing multiple trees per level but increases query cost
How does recursive model index improve over naive ML approach?
Creates hierarchy of smaller models, each responsible for specific key ranges
What is the purpose of Gapped Arrays in ALEX?
Provides space for future insertions without requiring frequent reorganizations
How does counting sort work?
Builds histogram of data and uses it to place elements in final position
What determines the choice between sparse and dense indices?
Trade-off between space overhead and lookup efficiency
Why are secondary indices always dense?
Because records with intermediate search-key values could be anywhere in file
How does flash storage affect indexing?
Requires special consideration for write operations due to page erasure costs
What are the benefits of in-memory indexing?
Allows for smaller nodes that fit in cache lines for better performance
How does quadtree differ from k-d tree?
Divides 2D space into four quadrants instead of one dimension at a time
When should bitmap indices be used?
When attributes have low cardinality and multiple-key queries are common