9. Query Optimization Flashcards

1
Q

The main goal of query optimization

A

finding the query plan that minimizes the number of I/Os it takes to execute the query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a query plan? Which language is used to express it?

A

A query plan is just a sequence of operations that will get us the correct result for a query. We use relational algebra to express it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Operator’s selectivity - def.

Why do we need to know it?

A

The selectivity of an operator is an approximation for what percentage of pages will make it through the operator onto the operator above it.

We need to count it to estimate a query plan’s cost.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Selectivity formulas for the following selection conditions:

X=a, X=Y, X>a, cond1 AND cond2

capital letters are for columns and lowercase letters represent constants

A
  • X=a: 1/(unique vals in X)
  • X=Y: 1/max(unique vals in X, unique vals in Y)
  • X>a: (max(X) - a) / (max(X) - min(X) + 1)
  • cond1 AND cond2: Selectivity(cond1) * Selectivity(cond2)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Selectivity and the total number of tuples of equijoin

(tables A and B on the condition A.id = B.id)

A

Total number: |A|∗|B| / max(unique vals f or A.id, unique vals f or B.id)

Selectivity: 1 / max(unique vals f or A.id, unique vals f or B.id)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

3 main heuristics to prune non-optimal query plans

A
  1. Push down projects (π) and selects (σ) as far as they can go
  2. Only consider left deep plans
  3. Do not consider cross joins unless they are the only option
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why query optimizers commonly consider only left deep plans?

A

these plans can be fully pipelined, meaning that we can pass the pages up one at a time to the next join operator – we don’t actually have to write the result of a join to disk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a System R?

A

The System R Optimizer is the cost-based query optimizer. It pioneered several optimization techniques, including using dynamic programming for bottom-up join tree construction, and the concept of interesting orderings for exploiting ordering in intermediate results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Pass 1 of System R -

goal, result

A

The first pass of System R determines how to access separate tables optimally or interestingly.

Optimally = min number of I/Os

Interestingly = produce output in an interesting (useful for further computations) order

Result: For each table produce a list of access patterns that are either optimal or interesting and prune out others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

two options for how to access tables during the pass 1 of system R

A
  1. Full Scan
  2. Index Scan (for every index the table has built on it)

For both of these scans, we only return a row if it matches all of the single table conditions pertaining to its table because of the push-down selects heuristic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which orders are considered interesting in analyzing table access plans in System R pass 1?

A

An interesting order is when the table is sorted on a column that is either:

  • Used in an ORDER BY
  • Used in a GROUP BY
  • Used in a downstream join (a join that hasn’t yet been evaluated. For pass 1, this is all joins).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Can a full table scan produce an interesting order for System R pass 1?

A

A full scan will never produce an interesting order because its output is not sorted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Let’s say we are evaluating the following query:

SELECT * FROM players

INNER JOIN teams ON players.teamid = teams.id

ORDER BY fname;

And we have the following potential access patterns:

  1. Full Scan players (100 I/Os)
  2. Index Scan players.age (90 I/Os)
  3. Index Scan players.teamid (120 I/Os)
  4. Full Scan teams (300 I/Os)
  5. Index Scan teams.record (400 I/Os)

Which access patterns will be pruned during System R pass 1 and which will move on?

A

Patterns 2, 3, and 4 will move on. Patterns 2 and 4 are the optimal pattern for their respective table, and pattern 3 has an interesting order because teamid is used in a downstream join.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

System R passes 2..n -

the goal, which data do they use as input?

A

The rest of the passes of the System R algorithm are concerned with joining the tables together. For each pass i, we attempt to join i tables together, using the results from pass i-1 and pass 1.

For example, on pass 2 we will attempt to join two tables together, each from pass 1.

On pass 5, we will attempt to join a total of 5 tables together. We will get 4 of those tables from pass 4 (which figured out how to join 4 tables together), and we will get the remaining table from pass 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is produced by R system pass i?

A

Pass i will produce at least one query plan for all sets of tables of length i that can be joined without a cross join (assuming there is at least one such set). Just like in pass 1, it will advance the optimal plan for each set, and also the optimal plan for each interesting order for each set (if one exists).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

SELECT * FROM A

INNER JOIN B ON A.aid = B.bid

INNER JOIN C ON b.did = c.cid ORDER BY c.cid;

Which sets of tables will we return query plans for in pass 2?

A

The only sets of tables we will consider on this pass are {A, B} and {B, C}. We do not consider {A, C} because there is no join condition and our heuristics tell us not to consider cross joins.