Indexing Flashcards

1
Q

What is the purpose of database indices?

A

To enable efficient query processing by avoiding full relation scans

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two basic types of indices?

A

Ordered indices and Hash indices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the key factors for evaluating index techniques?

A

Access type support, lookup time, insertion time, deletion time, and space overhead

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a clustering/primary index?

A

An index where the search key defines the sequential order of the file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a secondary index?

A

An index whose search key specifies an order different from the sequential order of the file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a dense index?

A

An index with entries for every search-key value in the file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a sparse index?

A

An index with entries for only some search-key values, usable only when records are sorted by search key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a multilevel index?

A

An index that creates a sparse outer index on the original (inner) index to improve search efficiency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does B+-tree handle duplicates?

A

By either storing multiple entries or using buckets to store pointers to all records with the same key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the main advantage of B+-tree over static indexing?

A

Maintains efficiency despite insertions and deletions without requiring reorganization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the key properties of B+-trees?

A

All paths from root to leaf are same length, nodes have between ⌈n/2⌉ and n children

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is bulk loading in B+-trees?

A

Efficient technique to build index by sorting entries first, then building tree bottom-up

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the main limitation of hash indices?

A

They don’t support range queries effectively

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does overflow chaining work in hash indices?

A

Records that hash to a full bucket are stored in overflow buckets linked to the original bucket

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is an LSM tree?

A

Write-optimized index with multiple B+-trees starting from in-memory to on-disk structures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the key idea of buffer trees?

A

Associate buffers with internal B+-tree nodes to reduce write operations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are bitmap indices used for?

A

Efficient querying on multiple keys using bit arrays for each possible value

18
Q

What is the main advantage of learned indices?

A

Can transform log(n) lookup cost to constant time by learning data distribution

19
Q

What is ALEX?

A

An updatable learned index that combines ML models with traditional indexing techniques

20
Q

How does ALEX handle dynamic workloads?

A

Uses Gapped Arrays for efficient insertions and Model-based Inserts for placement prediction

21
Q

Why is sorting important for indexing?

A

Enables efficient index creation and maintenance, especially for B+-trees

22
Q

What is the complexity difference between comparison-based and distribution-based sorting?

A

Comparison-based: O(N log N), Distribution-based: O(NW) where W is the number of digits

23
Q

What is prefix compression in B+-trees?

A

Technique to increase fanout by storing only distinguishing prefixes of search keys in nonleaf nodes

24
Q

What happens during B+-tree node splitting?

A

When a node becomes too full, it’s split into two nodes with ⌈n/2⌉ entries in each

25
Q

How does a B+-tree maintain balance?

A

Through splitting when nodes are too full and merging when too empty

26
Q

What is the cost of B+-tree operations?

A

Proportional to the height of the tree, which is log⌈n/2⌉(N)

27
Q

What’s the difference between B-tree and B+-tree?

A

B-tree eliminates redundant storage of search keys but requires additional pointer fields

28
Q

How does an R-tree handle spatial data?

A

Uses rectangular bounding boxes to index spatial objects in a hierarchical structure

29
Q

What are the limitations of ALEX?

A

Cannot handle adversarial workloads, duplicate keys, or concurrent updates

30
Q

What is the k-d tree structure?

A

A spatial index that recursively partitions space along different dimensions

31
Q

How does bitmap index handle multiple keys?

A

Uses logical AND/OR operations between relevant bitmaps

32
Q

What is temporal data indexing?

A

Uses specialized indices or R-trees to handle time intervals in data

33
Q

Why use stepped-merge in LSM trees?

A

Reduces insert cost by allowing multiple trees per level but increases query cost

34
Q

How does recursive model index improve over naive ML approach?

A

Creates hierarchy of smaller models, each responsible for specific key ranges

35
Q

What is the purpose of Gapped Arrays in ALEX?

A

Provides space for future insertions without requiring frequent reorganizations

36
Q

How does counting sort work?

A

Builds histogram of data and uses it to place elements in final position

37
Q

What determines the choice between sparse and dense indices?

A

Trade-off between space overhead and lookup efficiency

38
Q

Why are secondary indices always dense?

A

Because records with intermediate search-key values could be anywhere in file

39
Q

How does flash storage affect indexing?

A

Requires special consideration for write operations due to page erasure costs

40
Q

What are the benefits of in-memory indexing?

A

Allows for smaller nodes that fit in cache lines for better performance

41
Q

How does quadtree differ from k-d tree?

A

Divides 2D space into four quadrants instead of one dimension at a time

42
Q

When should bitmap indices be used?

A

When attributes have low cardinality and multiple-key queries are common