Chapter 28 Flashcards
What is purpose of join
Used to retrieve data from multiple tables.
What are 3 fast join algorithms
1- sort-merge
2- hash-based
3- index-based
What is nested-loop join
Nested-loop join works like a nested loop
What is the time-complexity of the nested loop
O(RS)
Does time complexity should be independent of the order of tables. i.e. O(RS) is same as O(SR)
Yes
What are 3 Variants of Nested-Loop Join
- Temporary index nested-loop join
- Index nested-loop join
- Naive nested-loop join
What is Naive nested-loop join
There are many variants of the traditional nested-loop join. The simplest case is when an entire table is scanned; this is called a naive nested-loop join.
What is Index nested-loop join
If there is an index, and that index is exploited, then it is called an index nested-loop join.
What is Temporary index nested-loop join
If the index is built as part of the query plan and subsequently dropped, it is called as a temporary index nested-loop join.
What is sort-merge join
A “sort merge” join is performed by sorting the two data sets to be joined according to the join keys and then merging them together. The merge is very cheap, but the sort can be prohibitively expensive.
Does Sort-Merge Join very fast
Yes
What is hash join
A hash join is performed by hashing one data set into memory based on join columns and reading the other one and probing the hash table for matches. The hash join is very low cost when the hash table can be held entirely in memory. It is good for VLDB.
What are 2 phases of hash join
hashed (build) phase and probed phase
What is partitioning fan-out
In hash join, initially, the two tables are entirely consumed and partitioned (using a hash function on the hash keys) into multiple partitions. The number of such partitions is sometimes called the partitioning fan-out.
What is Hash-Based Join: Partition Skew problem and its solution
Partition skew can become a problem in hash-join. In the first step of hash join, records are hashed into the main memory into their corresponding bucket. This being done based on the hash function used. However, an attribute being hashed may not be uniformly distributed within the relation, and some buckets may then contain more records than other buckets. When this difference becomes large, the corresponding bucket may no longer fit in the main memory. As a consequence, hash-based join degrades into performance of a nested-loop join. The only possible solution is to make available other hash functions to be chosen by the optimizer; that better distribute the input.