L15:Query Optimization(2019) Flashcards

1
Q

Evaluation Plan

A

An Evaluation Plan

defines what algorithm is used for each operation,

and how execution of these operations is coordinated.

  • There can be equivalent plans for a single query
  • Represented as a Tree of Operations on relations
    • Tree uses relational algebra symbols as nodes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Cost-Based

Query Optimization

Steps (3)

A
  1. Generate logically equivalent expressions using equivalence rules
  2. Annotate resultant expressions to get alternative query plans
  3. Choose the cheapest plan based on estimated cost
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Query Costs:

Factors

A

There are multiple statistics on a DBMS:

  • Statistical information about relations
  • Statistical Estimation for Intermediate Results
  • Cost of Algorithms, using the above statistics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Equivalence of

Relational Algebra Statements

A

Two RA statements are said to be Equivalent IF:

They produce the same tuples on every database instance.

  • If the statements are equivalent, we can use them interchangably
  • Equivalence can be quickly determined through the equivalence rules
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Relational Algebra Equivalence Rules:

Set of Rules

A
  • Conjunctive Selections
  • Commutative Selection Operations
  • Series of Projections
  • Selections combined with Theta Join or Cartesian Product
  • Join Commutative Property
  • Natural Join Associativity
  • Theta Join Conditioin Associativity
  • Set Union
  • Set Intersection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Relational Algebra Equivalence Rules:

3 Most Useful Join Rules

A
  • Join Commutativity
  • Natural Join Associativity
  • Theta Join Associativity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Cost Estimation:

Basics

A

Goal:

Compare an execution plan’s cost,

not exactly measure computation time.

*More of an art, not exact.

Variables and Symbols:

  • S - a relation
  • B(S) - Blocks of S
  • T(S) - Tuples in S
  • V(S, a) - number of distinct values of a in relation S
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

General

Query Optimization

Rules

A
  • Push Selection** and **Projection operations as far as possible down the tree
  • Put Joins on the left
    • Allows pipelining without using intermediate files
  • Avoid Cartesian Products
    • If they cannot be avoided, delay as long as possible
      • Push UP the tree
      • Performed on smaller relations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Cost Estimation:

Selection:

Cost of Scanning Disk with Index,

  • Clustered
  • Unclustered
A

If indexed over an attribute, use a function F(R,a) to determine how many Input/Output operations required.

The Value of F(R,a) depends on the condition in the selection

Costs:

If Data is clustered:

Cost = F(R,a) * B(R)

(Reading Blocks at a time)

Unclustered:

Cost = F(R,a) * T(R)

(Reading Tuples at a time)

B(R) = Blocks in the relation

T(R) = Tuples in the relation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Cost Estimation:

Selection:

Cost of Scanning Disk without Index

A

Cost = B(R) + Seek Time

B(R) = Blocks in the relation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Cost Estimation:

Selection Cost Function, F(R,a):

Functions for common predicates

A

The Cost Function used depends on the Condition(Predicate) in the Selection Operation

Equals : σa=v(R)

F(R,a) = 1 / V(R,a)

V(R, a) = number of distinct values in R

Less than : σa<v>(R)</v>

F(R,a) = (v - min(a) ) /

( max(a)-min(a) )

Range : σx<a><span>(R)</span></a>

F(R,a) = (v - x ) /

( max(a)-min(a) )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Cost Estimation:

Selection Cost Function, F(R,a):

Equals Predicate :

σa=v(R)

A

F(R,a) = 1 / V(R,a)

V(R, a) = number of distinct values in R

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Cost Estimation:

Selection Cost Function, F(R,a):

Less than predicate:

σa<v>(R)</v>

A

σa<v>(R)</v>

F(R,a) = (v - min(a) ) /

( max(a)-min(a) )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Cost Estimation:

Selection Cost Function, F(R,a):

Range Predicate:

σx<a><span>(R)</span></a>

A

σx<a><span>(R)</span></a>

F(R,a) =

(v - x ) / ( max(a)-min(a) )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Cost Estimation:

Joins

A
  • Assumption: Containment of Values
    • If V(S,a) <= V(R,a) , then A values in S is a subset of A values in R
    • Less Tuples in S, R contains all those tuples
  • With this assumption, say that each tuple “t” joins with “x” tuples in R
    • x = T(R) / V(R,a)
  • Therefore:
    • T(S joina R) = T(S) * T(R) / V(R,a)
  • Consider the cost of joins as simply the NUMBER of TUPLES OUTPUT, for simplicity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Optimizers:

Enumerating Equivalences

A

Optimizers use Equivalence Rules to systematically generate Equivalent Expressions

Process

  • Repeat until no new expressions are found
  • Apply all applicable equivalence rules to every subexpression of every equivalent expression found so far
  • Add newly generated expressions to set of Equivalent Expressions

This process is extremely expensive, optimized with:

  • Optimized generation based on some rules
  • Special Cases for queries with only Selection, Projection and/or Joins
17
Q

Equivalence Rules:

Conjunctive Selections

A

Conjunctive Selections are

Selection Operations with Multiple Conditions:

σθ1 ^ θ2 (R)

Is equivalent to a

Sequence of Individual Selections

σθ1 ( σθ2 (R) )

18
Q

Equivalence Rules:

Commutative Selection Operations

A

Selection Operations are Commutative:

The Selections can be performed in any order to get the same result:

σθ1 ( σθ2 (R) )

=

σθ2 ( σθ1 (R) )

19
Q

Equivalence Rules:

Series of Projections

A

Within a series of Projections,

only the LAST Projection is needed.

All earlier Projections can be omitted:

πθn ( πθn-1 (…( πθ1 (R)…))

=

πθ1 (R)

20
Q

Equivalence Rules:

Combining Selection with

Cartesian Product

A

Performing a Selection after a Cartesian Product is equivalent to a Theta Join with the same condition:

σθ (R1 X R2)

=

R1 ⋈θ R2

21
Q

Equivalence Rules:

Join Commutative Property

A

Both Theta Joins and Natural Joins are Commutative:

The joins can be performed in any order

R1 ⋈θ R2

=

R2 ⋈θ R1

22
Q

Equivalence Rules:

Natural Join Associativity

A

Natural Joins are Associative:

Can be optimized by performing the SMALLER join first

R1 ⋈ (R2 ⋈ R3)

=

(R1 ⋈ R2 ) ⋈ R3

23
Q

Equivalence Rules:

Theta Join Associativity

-Special Case

A

Theta Joins are associative in the following case:

(R1 ⋈θ1 R2 ) ⋈θ23 R3

Where the second condition, θ2 , ONLY involves attributes from R2 and R3

=

R1⋈θ13 (R2 ⋈θ2 R3 )

24
Q

Equivalence Rules:

Set Union

Set Intersection

A

Both operations are Commutative:

E1 ∪ E2 = E2 ∪ E1

E1 ∩ E2 = E2 ∩ E1

Both operations are Associative:

(E1∪E2)∪E3 = E2∪(E1 ∪ E3)

(E1∩E2)∩E3 = E2∩(E1 ∩ E3)

25
Q

Equivalence Rules:

Selection and Set Union/Intersection/Difference

A

In Set Union, Intersection and Difference operations,

A Selection Operation is Distributive:

σθ (E1 - E2)

=

σθ(E1 - σθ(E2)

26
Q

Query Optimization:

Basic RA Commutators

(Operations on Tree)

A
  • Can “push” Projections through:
    • Selection
    • Join
  • Push Selection through:
    • Selection
    • Projection
    • Join
  • Joins can be re-ordered