An Evaluation Plan defines what algorithm is used for each operation, and how execution of these operations is coordinated. There can be equivalent plans for a single query Represented as a Tree of Operations on relations Tree uses relational algebra symbols as nodes

There are multiple statistics on a DBMS: Statistical information about relations Statistical Estimation for Intermediate Results Cost of Algorithms, using the above statistics

L15:Query Optimization(2019) Flashcards by Emma Frost

Evaluation Plan

An Evaluation Plan

defines what algorithm is used for each operation,

and how execution of these operations is coordinated.

There can be equivalent plans for a single query
Represented as a Tree of Operations on relations
- Tree uses relational algebra symbols as nodes

How well did you know this?

Not at all

Perfectly

Cost-Based

Query Optimization

Steps (3)

Generate logically equivalent expressions using equivalence rules
Annotate resultant expressions to get alternative query plans
Choose the cheapest plan based on estimated cost

How well did you know this?

Not at all

Perfectly

Query Costs:

Factors

There are multiple statistics on a DBMS:

Statistical information about relations
Statistical Estimation for Intermediate Results
Cost of Algorithms, using the above statistics

How well did you know this?

Not at all

Perfectly

Equivalence of

Relational Algebra Statements

Two RA statements are said to be Equivalent IF:

They produce the same tuples on every database instance.

If the statements are equivalent, we can use them interchangably
Equivalence can be quickly determined through the equivalence rules

How well did you know this?

Not at all

Perfectly

Relational Algebra Equivalence Rules:

Set of Rules

Conjunctive Selections
Commutative Selection Operations
Series of Projections
Selections combined with Theta Join or Cartesian Product
Join Commutative Property
Natural Join Associativity
Theta Join Conditioin Associativity
Set Union
Set Intersection

How well did you know this?

Not at all

Perfectly

Relational Algebra Equivalence Rules:

3 Most Useful Join Rules

Join Commutativity
Natural Join Associativity
Theta Join Associativity

How well did you know this?

Not at all

Perfectly

Cost Estimation:

Basics

Goal:

Compare an execution plan’s cost,

not exactly measure computation time.

*More of an art, not exact.

Variables and Symbols:

S - a relation
B(S) - Blocks of S
T(S) - Tuples in S
V(S, a) - number of distinct values of a in relation S

How well did you know this?

Not at all

Perfectly

General

Query Optimization

Rules

Push Selection** and **Projection operations as far as possible down the tree
Put Joins on the left
- Allows pipelining without using intermediate files
Avoid Cartesian Products
- If they cannot be avoided, delay as long as possible
  - Push UP the tree
  - Performed on smaller relations

How well did you know this?

Not at all

Perfectly

Cost Estimation:

Selection:

Cost of Scanning Disk with Index,

Clustered
Unclustered

If indexed over an attribute, use a function F(R,a) to determine how many Input/Output operations required.

The Value of F(R,a) depends on the condition in the selection

Costs:

If Data is clustered:

Cost = F(R,a) * B(R)

(Reading Blocks at a time)

Unclustered:

Cost = F(R,a) * T(R)

(Reading Tuples at a time)

B(R) = Blocks in the relation

T(R) = Tuples in the relation

How well did you know this?

Not at all

Perfectly

Cost Estimation:

Selection:

Cost of Scanning Disk without Index

Cost = B(R) + Seek Time

B(R) = Blocks in the relation

How well did you know this?

Not at all

Perfectly

Cost Estimation:

Selection Cost Function, F(R,a):

Functions for common predicates

The Cost Function used depends on the Condition(Predicate) in the Selection Operation

Equals : σ_a=v(R)

F(R,a) = 1 / V(R,a)

V(R, a) = number of distinct values in R

Less than : σ_a<v>(R)</v>

F(R,a) = (v - min(a) ) /

( max(a)-min(a) )

Range : σ_{x<a><span>(R)</span></a>}

F(R,a) = (v - x ) /

( max(a)-min(a) )

How well did you know this?

Not at all

Perfectly

Cost Estimation:

Selection Cost Function, F(R,a):

Equals Predicate :

σ_a=v(R)

F(R,a) = 1 / V(R,a)

V(R, a) = number of distinct values in R

How well did you know this?

Not at all

Perfectly

Cost Estimation:

Selection Cost Function, F(R,a):

Less than predicate:

σ_a<v>(R)</v>

F(R,a) = (v - min(a) ) /

( max(a)-min(a) )

How well did you know this?

Not at all

Perfectly

Cost Estimation:

Selection Cost Function, F(R,a):

Range Predicate:

σ_{x<a><span>(R)</span></a>}

F(R,a) =

(v - x ) / ( max(a)-min(a) )

How well did you know this?

Not at all

Perfectly

Cost Estimation:

Joins

Assumption: Containment of Values
- If V(S,a) <= V(R,a) , then A values in S is a subset of A values in R
- Less Tuples in S, R contains all those tuples
With this assumption, say that each tuple “t” joins with “x” tuples in R
- x = T(R) / V(R,a)
Therefore:
- T(S join_a R) = T(S) * T(R) / V(R,a)
Consider the cost of joins as simply the NUMBER of TUPLES OUTPUT, for simplicity

How well did you know this?

Not at all

Perfectly

Optimizers:

Enumerating Equivalences

Study These Flashcards

Optimizers use Equivalence Rules to systematically generate Equivalent Expressions

Process

Repeat until no new expressions are found
Apply all applicable equivalence rules to every subexpression of every equivalent expression found so far
Add newly generated expressions to set of Equivalent Expressions

This process is extremely expensive, optimized with:

Optimized generation based on some rules
Special Cases for queries with only Selection, Projection and/or Joins

Equivalence Rules:

Conjunctive Selections

Study These Flashcards

Conjunctive Selections are

Selection Operations with Multiple Conditions:

σ_{θ1 ^ θ2} (R)

Is equivalent to a

Sequence of Individual Selections

σ_θ1( σ_θ2 (R) )

Equivalence Rules:

Commutative Selection Operations

Study These Flashcards

Selection Operations are Commutative:

The Selections can be performed in any order to get the same result:

σθ₁ ( σθ₂ (R) )

σθ₂ ( σθ₁ (R) )

Equivalence Rules:

Series of Projections

Study These Flashcards

Within a series of Projections,

only the LAST Projection is needed.

All earlier Projections can be omitted:

πθ_n ( πθn-1 (…( πθ₁ (R)…))

πθ₁ (R)

Equivalence Rules:

Combining Selection with

Cartesian Product

Study These Flashcards

Performing a Selection after a Cartesian Product is equivalent to a Theta Join with the same condition:

σθ (R1 X R2)

R1 ⋈θ R2

Equivalence Rules:

Join Commutative Property

Study These Flashcards

Both Theta Joins and Natural Joins are Commutative:

The joins can be performed in any order

R1 ⋈θ R2

R2 ⋈θ R1

Equivalence Rules:

Natural Join Associativity

Study These Flashcards

Natural Joins are Associative:

Can be optimized by performing the SMALLER join first

R1 ⋈ (R2 ⋈ R3)

(R1 ⋈ R2 ) ⋈ R3

Equivalence Rules:

Theta Join Associativity

-Special Case

Study These Flashcards

Theta Joins are associative in the following case:

(R1 ⋈θ₁ R2 ) ⋈θ₂^θ₃ R3

Where the second condition, θ₂ , ONLY involves attributes from R₂ and R₃

R1⋈θ₁^θ₃ (R2 ⋈θ₂ R3 )

Equivalence Rules:

Set Union

Set Intersection

Study These Flashcards

Both operations are Commutative:

E₁ ∪ E₂ = E₂ ∪ E₁

E₁ ∩ E₂ = E₂ ∩ E₁

Both operations are Associative:

(E₁∪E₂)∪E₃ = E₂∪(E₁∪ E₃)

(E₁∩E₂)∩E₃ = E₂∩(E₁∩ E₃)

Equivalence Rules: Selection and Set Union/Intersection/Difference

In Set Union, Intersection and Difference operations, A Selection Operation is Distributive: σθ (E1 - E2) = σθ(E1 - σθ(E2)

Query Optimization: Basic RA Commutators (Operations on Tree)

* Can "push" **Projections** through: * Selection * Join * Push **Selection** through: * Selection * Projection * Join * **Joins** can be re-ordered

L15:Query Optimization(2019) Flashcards

(26 cards)