Indexing Flashcards

1
Q

What is the purpose of database indices?

A

To enable efficient query processing by avoiding full relation scans

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two basic types of indices?

A

Ordered indices and Hash indices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the key factors for evaluating index techniques?

A

Access type support, lookup time, insertion time, deletion time, and space overhead

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a clustering/primary index?

A

An index where the search key defines the sequential order of the file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a secondary index?

A

An index whose search key specifies an order different from the sequential order of the file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a dense index?

A

An index with entries for every search-key value in the file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a sparse index?

A

An index with entries for only some search-key values, usable only when records are sorted by search key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a multilevel index?

A

An index that creates a sparse outer index on the original (inner) index to improve search efficiency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does B+-tree handle duplicates?

A

By either storing multiple entries or using buckets to store pointers to all records with the same key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the main advantage of B+-tree over static indexing?

A

Maintains efficiency despite insertions and deletions without requiring reorganization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the key properties of B+-trees?

A

All paths from root to leaf are same length, nodes have between ⌈n/2⌉ and n children

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is bulk loading in B+-trees?

A

Efficient technique to build index by sorting entries first, then building tree bottom-up

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the main limitation of hash indices?

A

They don’t support range queries effectively

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does overflow chaining work in hash indices?

A

Records that hash to a full bucket are stored in overflow buckets linked to the original bucket

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is an LSM tree?

A

Write-optimized index with multiple B+-trees starting from in-memory to on-disk structures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the key idea of buffer trees?

A

Associate buffers with internal B+-tree nodes to reduce write operations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are bitmap indices used for?

A

Efficient querying on multiple keys using bit arrays for each possible value

18
Q

What is the main advantage of learned indices?

A

Can transform log(n) lookup cost to constant time by learning data distribution

19
Q

What is ALEX?

A

An updatable learned index that combines ML models with traditional indexing techniques

20
Q

How does ALEX handle dynamic workloads?

A

Uses Gapped Arrays for efficient insertions and Model-based Inserts for placement prediction

21
Q

Why is sorting important for indexing?

A

Enables efficient index creation and maintenance, especially for B+-trees

22
Q

What is the complexity difference between comparison-based and distribution-based sorting?

A

Comparison-based: O(N log N), Distribution-based: O(NW) where W is the number of digits

23
Q

What is prefix compression in B+-trees?

A

Technique to increase fanout by storing only distinguishing prefixes of search keys in nonleaf nodes

24
Q

What happens during B+-tree node splitting?

A

When a node becomes too full, it’s split into two nodes with ⌈n/2⌉ entries in each

25
How does a B+-tree maintain balance?
Through splitting when nodes are too full and merging when too empty
26
What is the cost of B+-tree operations?
Proportional to the height of the tree, which is log⌈n/2⌉(N)
27
What's the difference between B-tree and B+-tree?
B-tree eliminates redundant storage of search keys but requires additional pointer fields
28
How does an R-tree handle spatial data?
Uses rectangular bounding boxes to index spatial objects in a hierarchical structure
29
What are the limitations of ALEX?
Cannot handle adversarial workloads, duplicate keys, or concurrent updates
30
What is the k-d tree structure?
A spatial index that recursively partitions space along different dimensions
31
How does bitmap index handle multiple keys?
Uses logical AND/OR operations between relevant bitmaps
32
What is temporal data indexing?
Uses specialized indices or R-trees to handle time intervals in data
33
Why use stepped-merge in LSM trees?
Reduces insert cost by allowing multiple trees per level but increases query cost
34
How does recursive model index improve over naive ML approach?
Creates hierarchy of smaller models, each responsible for specific key ranges
35
What is the purpose of Gapped Arrays in ALEX?
Provides space for future insertions without requiring frequent reorganizations
36
How does counting sort work?
Builds histogram of data and uses it to place elements in final position
37
What determines the choice between sparse and dense indices?
Trade-off between space overhead and lookup efficiency
38
Why are secondary indices always dense?
Because records with intermediate search-key values could be anywhere in file
39
How does flash storage affect indexing?
Requires special consideration for write operations due to page erasure costs
40
What are the benefits of in-memory indexing?
Allows for smaller nodes that fit in cache lines for better performance
41
How does quadtree differ from k-d tree?
Divides 2D space into four quadrants instead of one dimension at a time
42
When should bitmap indices be used?
When attributes have low cardinality and multiple-key queries are common