8. Iterators and Joins Flashcards

Question 1

Q

Describe a join in terms of tuples operations

Answer

A

take two relations, R and S, and create one new relation out of their matches on the join condition – that is, for each record r_i in R, find all records s_j in S that match the join condition we have specified and write < r_i , s_j > as a new row in the output (all the fields of r followed by all the fields of s).

Question 2

Q

Simple Nested Loop Join - describe

Analyze cost

Answer

A

take each record in R, search for all its matches in S, and then we yield each match

Cost: [R]+|R|[S], where [R] is the number of pages in R and |R| is the number of records in R

i.e. scan R once ([R]) and then scan the whole S for each record of R

Question 3

Q

Page Nested Loop Join - describe, analyze cost

Answer

A

for a page of R, take all the records and match them against each record in S, and do this for every page of R.

Cost: [R] + [R][S]

[R] - number of pages in R

i.e. Scan all pages of R ([R]) and then for each page of R, scan all pages of S.

Question 4

Q

Chunk Nested Loop Join - describe, analyze cost

Answer

A

B - number of pages in a memory buffer

We can have B-2 pages of R and 1 page of S in a buffer at the same time (the last 1 page is used as an output buffer). So we can join R by such chunks instead of pages or records.

For such a chunk, take all the records and match them against each record in S, and do this for every chunk.

Cost: [R] + ([R] / B−2) * [S]

[R] - number of pages in R

i.e. scan all pages of R, then for each chunk (there are [R] / B−2 of them) scan all pages of S.

Question 5

Q

Index Nested Loop Join - describe, analyze cost

Answer

A

if we have an index on S that is on the appropriate field (i.e. the field we are joining on), it can be very fast to look up matches of r_i in S.

Cost: [R] + |R|∗(cost to look up matching records in S).

[R] - # pages in R

|R| - # records in R

Question 6

Q

Naive Hash Join - describe, analyze cost

Answer

A

B - number of pages in a memory buffer

construct a hash table that is B-2 pages big on the records of R, fit it into memory, and then read in each record of S and look it up in R’s hash table to see if we can find any matches on it.

Cost: [R] + [S] I/Os

However, it relies on R <= B - 2, which is very often not true

Question 7

Q

Grace Hash Join - describe

the main drawback

Answer

A

B - number of pages in the in-memory buffer.

repeatedly hash R and S into B-1 buffers so that we can get partitions that are ≤ B−2 pages big
Use naive hash join on partitions

Drawback: sensitive to key skew. Key skew is when we try to hash but many of the keys go into the same bucket.

Question 8

Q

Sort-Merge Join - describe, the average cost

Answer

A

sort R and S first
we begin at the start of R and S and advance one or the other until we get to a match
let’s assume we’ve gotten to a match. We mark this spot in S as marked(S) and check each subsequent record in S until we find something that is not a match.
Now, go to the next record in R and go back to the marked spot in S and begin again at step 1. Idea: Because R and S are sorted, any match for any future record of R cannot be before the marked spot in S.

Average cost: cost to sort R + cost to sort S + ([R] + [S])