1 - 145: Query Evaluation Flashcards

1
Q

What is CLOCK replacement strategy and how does it work?

A

CLOCK is a page replacement strategy that modifies FIFO by giving pages a “second chance.” Each page has a “used” bit that is set to 1 when accessed. When looking for a page to evict, if a page’s bit is 1, it’s set to 0 and skipped. If a page’s bit is 0, it’s evicted. The strategy uses a circular buffer with a pointer moving like a clock hand.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the main differences between clustering and non-clustering indices?

A

Clustering indices: Data records are physically ordered according to the index key. A table can only have one clustering index.

Non-clustering indices: Data records are not physically ordered by the index key. A table can have multiple non-clustering indices.
Clustering indices provide more efficient sequential access and range queries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is fanout important in B+ trees, and what determines it? (Multiple Choice)
A) It determines storage requirements only
B) It affects CPU processing time
C) It determines the number of disk accesses needed to reach leaf nodes
D) It only impacts insertion operations

A

C - Fanout is crucial because it determines the height of the tree and thus the number of disk accesses needed to reach leaf nodes. The fanout is determined by the disk page size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the significance of the 69% space utilization in B+ trees?

A

This is the expected (average) space utilization in B+ tree nodes. This means that on average, nodes will be about 69% full, which represents a balance between efficient space usage and room for insertions/deletions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When creating a composite key index, why does the order of columns matter? (Multiple Choice)
A) It only affects storage space
B) It determines which queries can efficiently use the index
C) It only impacts update operations
D) It has no practical impact

A

B - The order determines which queries can efficiently use the index. Queries are most efficient when they have equality conditions on leftmost columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What makes B+ trees particularly suitable for database systems compared to binary trees?

A

-Higher fanout reduces disk I/O by decreasing tree height
-Leaf nodes are linked for efficient sequential access
-Node size matches disk page size
-All data is stored in leaf nodes, making searches predictable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In buffer management, what is the “5-minute rule” and why is it important?

A

The 5-minute rule states that pages referenced every five minutes should be memory resident. This helps in determining which pages should be kept in memory vs. stored on disk, optimizing buffer management decisions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the three possibilities for information contained in an index? (Multiple Choice)
A) Only primary keys, foreign keys, and unique constraints
B) Only links to data records, actual records, and metadata
C) The actual data record, a link to the data record, or a list of links to data records
D) Only hash values, tree structures, and binary data

A

C - An index can contain the actual data record, a single link (rid) to the data record, or a list of links to data records.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When is a non-clustering index scan less efficient than sorting the entire relation?

A

-A large portion of the table needs to be accessed
-The physical order of records differs significantly from the logical order
-Each record access requires a random I/O
-The cost of random I/Os exceeds the cost of sequential scan and sort

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Block Nested Loop Join and what is its cost formula?

A

Block Nested Loop Join reads n₍B₎-2 blocks from the outer relation and then goes block-wise over the inner relation. Cost formula: N₍outer₎ + (⌈N₍outer₎/(n₍B₎-2)⌉) * N₍inner₎
where N₍outer₎ and N₍inner₎ are number of blocks, and n₍B₎ is buffer size in blocks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When implementing a Hash Join, why do we need two different hash functions?

A

The first hash function is used to partition the relations into buckets that fit in memory. The second hash function is used during the probing phase to build and probe the in-memory hash table. Using the same hash function would defeat the purpose of partitioning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Multiple Choice: Which join implementation can handle any type of join condition?
a) Hash Join
b) Merge Join
c) Nested Loop Join
d) Index-based Nested Loop Join

A

c) Nested Loop Join - it can handle any type of join condition but may be expensive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In Blocked I/O for external sorting, what is the formula for the number of runs that can be merged per phase?

A

⌊(n₍B₎ - b)/b⌋ runs can be merged per phase, where n₍B₎ is the buffer size in blocks and b is the size of each buffer block.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the three main phases of Sort-Merge Join?

A

1) Sort the first relation on join attribute
2) Sort the second relation on join attribute
3) Merge the sorted relations while matching on join attribute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Multiple Choice: What is the main advantage of Index-based Nested Loop Join over regular Nested Loop Join?
a) It works with any join condition
b) It reduces the number of disk accesses for the inner relation
c) It automatically sorts the result
d) It uses less memory

A

b) It reduces the number of disk accesses for the inner relation by using an index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In external sorting with Blocked I/O, what is the tradeoff between buffer block size and merge efficiency?

A

Larger buffer blocks reduce cost per block access (better I/O efficiency), but decrease the number of runs that can be merged simultaneously, potentially requiring more merge phases.

17
Q

Multiple Choice: Which type of join requires equality as the join condition?
a) Nested Loop Join
b) Merge Join
c) Hash Join
d) Both b and c

A

d) Both Merge Join and Hash Join require equality conditions

18
Q

What are the two main conditions where indices over multiple attributes should be considered?

A

1) When the WHERE clause has criteria involving multiple attributes
2) When “index only” processing is possible (the index contains all needed attributes)

19
Q

In Grace Hash Join, what happens if the partitions are too large to fit in memory?

A

The partitioning phase may need to be repeated recursively (repartitioning) until the partitions are small enough to fit in memory. This is known as recursive partitioning.