L15:Query Optimization(2019) Flashcards
Evaluation Plan
An Evaluation Plan
defines what algorithm is used for each operation,
and how execution of these operations is coordinated.
- There can be equivalent plans for a single query
- Represented as a Tree of Operations on relations
- Tree uses relational algebra symbols as nodes
Cost-Based
Query Optimization
Steps (3)
- Generate logically equivalent expressions using equivalence rules
- Annotate resultant expressions to get alternative query plans
- Choose the cheapest plan based on estimated cost
Query Costs:
Factors
There are multiple statistics on a DBMS:
- Statistical information about relations
- Statistical Estimation for Intermediate Results
- Cost of Algorithms, using the above statistics
Equivalence of
Relational Algebra Statements
Two RA statements are said to be Equivalent IF:
They produce the same tuples on every database instance.
- If the statements are equivalent, we can use them interchangably
- Equivalence can be quickly determined through the equivalence rules
Relational Algebra Equivalence Rules:
Set of Rules
- Conjunctive Selections
- Commutative Selection Operations
- Series of Projections
- Selections combined with Theta Join or Cartesian Product
- Join Commutative Property
- Natural Join Associativity
- Theta Join Conditioin Associativity
- Set Union
- Set Intersection
Relational Algebra Equivalence Rules:
3 Most Useful Join Rules
- Join Commutativity
- Natural Join Associativity
- Theta Join Associativity
Cost Estimation:
Basics
Goal:
Compare an execution plan’s cost,
not exactly measure computation time.
*More of an art, not exact.
Variables and Symbols:
- S - a relation
- B(S) - Blocks of S
- T(S) - Tuples in S
- V(S, a) - number of distinct values of a in relation S
General
Query Optimization
Rules
- Push Selection** and **Projection operations as far as possible down the tree
- Put Joins on the left
- Allows pipelining without using intermediate files
- Avoid Cartesian Products
- If they cannot be avoided, delay as long as possible
- Push UP the tree
- Performed on smaller relations
- If they cannot be avoided, delay as long as possible
Cost Estimation:
Selection:
Cost of Scanning Disk with Index,
- Clustered
- Unclustered
If indexed over an attribute, use a function F(R,a) to determine how many Input/Output operations required.
The Value of F(R,a) depends on the condition in the selection
Costs:
If Data is clustered:
Cost = F(R,a) * B(R)
(Reading Blocks at a time)
Unclustered:
Cost = F(R,a) * T(R)
(Reading Tuples at a time)
B(R) = Blocks in the relation
T(R) = Tuples in the relation
Cost Estimation:
Selection:
Cost of Scanning Disk without Index
Cost = B(R) + Seek Time
B(R) = Blocks in the relation
Cost Estimation:
Selection Cost Function, F(R,a):
Functions for common predicates
The Cost Function used depends on the Condition(Predicate) in the Selection Operation
Equals : σa=v(R)
F(R,a) = 1 / V(R,a)
V(R, a) = number of distinct values in R
Less than : σa<v>(R)</v>
F(R,a) = (v - min(a) ) /
( max(a)-min(a) )
Range : σx<a><span>(R)</span></a>
F(R,a) = (v - x ) /
( max(a)-min(a) )
Cost Estimation:
Selection Cost Function, F(R,a):
Equals Predicate :
σa=v(R)
F(R,a) = 1 / V(R,a)
V(R, a) = number of distinct values in R
Cost Estimation:
Selection Cost Function, F(R,a):
Less than predicate:
σa<v>(R)</v>
σa<v>(R)</v>
F(R,a) = (v - min(a) ) /
( max(a)-min(a) )
Cost Estimation:
Selection Cost Function, F(R,a):
Range Predicate:
σx<a><span>(R)</span></a>
σx<a><span>(R)</span></a>
F(R,a) =
(v - x ) / ( max(a)-min(a) )
Cost Estimation:
Joins
- Assumption: Containment of Values
- If V(S,a) <= V(R,a) , then A values in S is a subset of A values in R
- Less Tuples in S, R contains all those tuples
- With this assumption, say that each tuple “t” joins with “x” tuples in R
- x = T(R) / V(R,a)
- Therefore:
- T(S joina R) = T(S) * T(R) / V(R,a)
- Consider the cost of joins as simply the NUMBER of TUPLES OUTPUT, for simplicity
Optimizers:
Enumerating Equivalences
Optimizers use Equivalence Rules to systematically generate Equivalent Expressions
Process
- Repeat until no new expressions are found
- Apply all applicable equivalence rules to every subexpression of every equivalent expression found so far
- Add newly generated expressions to set of Equivalent Expressions
This process is extremely expensive, optimized with:
- Optimized generation based on some rules
- Special Cases for queries with only Selection, Projection and/or Joins
Equivalence Rules:
Conjunctive Selections
Conjunctive Selections are
Selection Operations with Multiple Conditions:
σθ1 ^ θ2 (R)
Is equivalent to a
Sequence of Individual Selections
σθ1 ( σθ2 (R) )
Equivalence Rules:
Commutative Selection Operations
Selection Operations are Commutative:
The Selections can be performed in any order to get the same result:
σθ1 ( σθ2 (R) )
=
σθ2 ( σθ1 (R) )
Equivalence Rules:
Series of Projections
Within a series of Projections,
only the LAST Projection is needed.
All earlier Projections can be omitted:
πθn ( πθn-1 (…( πθ1 (R)…))
=
πθ1 (R)
Equivalence Rules:
Combining Selection with
Cartesian Product
Performing a Selection after a Cartesian Product is equivalent to a Theta Join with the same condition:
σθ (R1 X R2)
=
R1 ⋈θ R2
Equivalence Rules:
Join Commutative Property
Both Theta Joins and Natural Joins are Commutative:
The joins can be performed in any order
R1 ⋈θ R2
=
R2 ⋈θ R1
Equivalence Rules:
Natural Join Associativity
Natural Joins are Associative:
Can be optimized by performing the SMALLER join first
R1 ⋈ (R2 ⋈ R3)
=
(R1 ⋈ R2 ) ⋈ R3
Equivalence Rules:
Theta Join Associativity
-Special Case
Theta Joins are associative in the following case:
(R1 ⋈θ1 R2 ) ⋈θ2^θ3 R3
Where the second condition, θ2 , ONLY involves attributes from R2 and R3
=
R1⋈θ1^θ3 (R2 ⋈θ2 R3 )
Equivalence Rules:
Set Union
Set Intersection
Both operations are Commutative:
E1 ∪ E2 = E2 ∪ E1
E1 ∩ E2 = E2 ∩ E1
Both operations are Associative:
(E1∪E2)∪E3 = E2∪(E1 ∪ E3)
(E1∩E2)∩E3 = E2∩(E1 ∩ E3)
Equivalence Rules:
Selection and Set Union/Intersection/Difference
In Set Union, Intersection and Difference operations,
A Selection Operation is Distributive:
σθ (E1 - E2)
=
σθ(E1 - σθ(E2)
Query Optimization:
Basic RA Commutators
(Operations on Tree)
- Can “push” Projections through:
- Selection
- Join
- Push Selection through:
- Selection
- Projection
- Join
- Joins can be re-ordered