L15:Query Optimization(2019) Flashcards
Evaluation Plan
An Evaluation Plan
defines what algorithm is used for each operation,
and how execution of these operations is coordinated.
- There can be equivalent plans for a single query
- Represented as a Tree of Operations on relations
- Tree uses relational algebra symbols as nodes
Cost-Based
Query Optimization
Steps (3)
- Generate logically equivalent expressions using equivalence rules
- Annotate resultant expressions to get alternative query plans
- Choose the cheapest plan based on estimated cost
Query Costs:
Factors
There are multiple statistics on a DBMS:
- Statistical information about relations
- Statistical Estimation for Intermediate Results
- Cost of Algorithms, using the above statistics
Equivalence of
Relational Algebra Statements
Two RA statements are said to be Equivalent IF:
They produce the same tuples on every database instance.
- If the statements are equivalent, we can use them interchangably
- Equivalence can be quickly determined through the equivalence rules
Relational Algebra Equivalence Rules:
Set of Rules
- Conjunctive Selections
- Commutative Selection Operations
- Series of Projections
- Selections combined with Theta Join or Cartesian Product
- Join Commutative Property
- Natural Join Associativity
- Theta Join Conditioin Associativity
- Set Union
- Set Intersection
Relational Algebra Equivalence Rules:
3 Most Useful Join Rules
- Join Commutativity
- Natural Join Associativity
- Theta Join Associativity
Cost Estimation:
Basics
Goal:
Compare an execution plan’s cost,
not exactly measure computation time.
*More of an art, not exact.
Variables and Symbols:
- S - a relation
- B(S) - Blocks of S
- T(S) - Tuples in S
- V(S, a) - number of distinct values of a in relation S
General
Query Optimization
Rules
- Push Selection** and **Projection operations as far as possible down the tree
- Put Joins on the left
- Allows pipelining without using intermediate files
- Avoid Cartesian Products
- If they cannot be avoided, delay as long as possible
- Push UP the tree
- Performed on smaller relations
- If they cannot be avoided, delay as long as possible
Cost Estimation:
Selection:
Cost of Scanning Disk with Index,
- Clustered
- Unclustered
If indexed over an attribute, use a function F(R,a) to determine how many Input/Output operations required.
The Value of F(R,a) depends on the condition in the selection
Costs:
If Data is clustered:
Cost = F(R,a) * B(R)
(Reading Blocks at a time)
Unclustered:
Cost = F(R,a) * T(R)
(Reading Tuples at a time)
B(R) = Blocks in the relation
T(R) = Tuples in the relation
Cost Estimation:
Selection:
Cost of Scanning Disk without Index
Cost = B(R) + Seek Time
B(R) = Blocks in the relation
Cost Estimation:
Selection Cost Function, F(R,a):
Functions for common predicates
The Cost Function used depends on the Condition(Predicate) in the Selection Operation
Equals : σa=v(R)
F(R,a) = 1 / V(R,a)
V(R, a) = number of distinct values in R
Less than : σa<v>(R)</v>
F(R,a) = (v - min(a) ) /
( max(a)-min(a) )
Range : σx<a><span>(R)</span></a>
F(R,a) = (v - x ) /
( max(a)-min(a) )
Cost Estimation:
Selection Cost Function, F(R,a):
Equals Predicate :
σa=v(R)
F(R,a) = 1 / V(R,a)
V(R, a) = number of distinct values in R
Cost Estimation:
Selection Cost Function, F(R,a):
Less than predicate:
σa<v>(R)</v>
σa<v>(R)</v>
F(R,a) = (v - min(a) ) /
( max(a)-min(a) )
Cost Estimation:
Selection Cost Function, F(R,a):
Range Predicate:
σx<a><span>(R)</span></a>
σx<a><span>(R)</span></a>
F(R,a) =
(v - x ) / ( max(a)-min(a) )
Cost Estimation:
Joins
- Assumption: Containment of Values
- If V(S,a) <= V(R,a) , then A values in S is a subset of A values in R
- Less Tuples in S, R contains all those tuples
- With this assumption, say that each tuple “t” joins with “x” tuples in R
- x = T(R) / V(R,a)
- Therefore:
- T(S joina R) = T(S) * T(R) / V(R,a)
- Consider the cost of joins as simply the NUMBER of TUPLES OUTPUT, for simplicity