9. Query Optimization Flashcards

Question 1

Q

The main goal of query optimization

Answer

A

finding the query plan that minimizes the number of I/Os it takes to execute the query.

Question 2

Q

What is a query plan? Which language is used to express it?

Answer

A

A query plan is just a sequence of operations that will get us the correct result for a query. We use relational algebra to express it.

Question 3

Q

Operator’s selectivity - def.

Why do we need to know it?

Answer

A

The selectivity of an operator is an approximation for what percentage of pages will make it through the operator onto the operator above it.

We need to count it to estimate a query plan’s cost.

Question 4

Q

Selectivity formulas for the following selection conditions:

X=a, X=Y, X>a, cond1 AND cond2

capital letters are for columns and lowercase letters represent constants

Answer

A

X=a: 1/(unique vals in X)
X=Y: 1/max(unique vals in X, unique vals in Y)
X>a: (max(X) - a) / (max(X) - min(X) + 1)
cond1 AND cond2: Selectivity(cond1) * Selectivity(cond2)

Question 5

Q

Selectivity and the total number of tuples of equijoin

(tables A and B on the condition A.id = B.id)

Answer

A

Total number: |A|∗|B| / max(unique vals f or A.id, unique vals f or B.id)

Selectivity: 1 / max(unique vals f or A.id, unique vals f or B.id)

Question 6

Q

3 main heuristics to prune non-optimal query plans

Answer

A

Push down projects (π) and selects (σ) as far as they can go
Only consider left deep plans
Do not consider cross joins unless they are the only option

Question 7

Q

Why query optimizers commonly consider only left deep plans?

Answer

A

these plans can be fully pipelined, meaning that we can pass the pages up one at a time to the next join operator – we don’t actually have to write the result of a join to disk.

Question 8

Q

What is a System R?

Answer

A

The System R Optimizer is the cost-based query optimizer. It pioneered several optimization techniques, including using dynamic programming for bottom-up join tree construction, and the concept of interesting orderings for exploiting ordering in intermediate results.

Question 9

Q

Pass 1 of System R -

goal, result

Answer

A

The first pass of System R determines how to access separate tables optimally or interestingly.

Optimally = min number of I/Os

Interestingly = produce output in an interesting (useful for further computations) order

Result: For each table produce a list of access patterns that are either optimal or interesting and prune out others

Question 10

Q

two options for how to access tables during the pass 1 of system R

Answer

A

Full Scan
Index Scan (for every index the table has built on it)

For both of these scans, we only return a row if it matches all of the single table conditions pertaining to its table because of the push-down selects heuristic.

Question 11

Q

Which orders are considered interesting in analyzing table access plans in System R pass 1?

Answer

A

An interesting order is when the table is sorted on a column that is either:

Used in an ORDER BY
Used in a GROUP BY
Used in a downstream join (a join that hasn’t yet been evaluated. For pass 1, this is all joins).

Question 12

Q

Can a full table scan produce an interesting order for System R pass 1?

Answer

A

A full scan will never produce an interesting order because its output is not sorted.

Question 13

Q

Let’s say we are evaluating the following query:

SELECT * FROM players

INNER JOIN teams ON players.teamid = teams.id

ORDER BY fname;

And we have the following potential access patterns:

Full Scan players (100 I/Os)
Index Scan players.age (90 I/Os)
Index Scan players.teamid (120 I/Os)
Full Scan teams (300 I/Os)
Index Scan teams.record (400 I/Os)

Which access patterns will be pruned during System R pass 1 and which will move on?

Answer

A

Patterns 2, 3, and 4 will move on. Patterns 2 and 4 are the optimal pattern for their respective table, and pattern 3 has an interesting order because teamid is used in a downstream join.

Question 14

Q

System R passes 2..n -

the goal, which data do they use as input?

Answer

A

The rest of the passes of the System R algorithm are concerned with joining the tables together. For each pass i, we attempt to join i tables together, using the results from pass i-1 and pass 1.

For example, on pass 2 we will attempt to join two tables together, each from pass 1.

On pass 5, we will attempt to join a total of 5 tables together. We will get 4 of those tables from pass 4 (which figured out how to join 4 tables together), and we will get the remaining table from pass 1.

Question 15

Q

What is produced by R system pass i?

Answer

A

Pass i will produce at least one query plan for all sets of tables of length i that can be joined without a cross join (assuming there is at least one such set). Just like in pass 1, it will advance the optimal plan for each set, and also the optimal plan for each interesting order for each set (if one exists).

Question 16

Q

SELECT * FROM A

INNER JOIN B ON A.aid = B.bid

INNER JOIN C ON b.did = c.cid ORDER BY c.cid;

Which sets of tables will we return query plans for in pass 2?

Answer

Study These Flashcards

A

The only sets of tables we will consider on this pass are {A, B} and {B, C}. We do not consider {A, C} because there is no join condition and our heuristics tell us not to consider cross joins.

9. Query Optimization Flashcards

(16 cards)