- Query optimisation for select needed for conjunctive conditions when different simple conditions offer an access path - Access path that results in the fewest record retrieval operations should be chosen

If include a key of R, then result has same number of tuples as R but only the attributes given in - If not, we must detect duplicate tuples and eliminate them - Two possible elimination techniques are: - Sort the result and then eliminated duplicates (which will now be duplicate tuples) - Hash each projected tuple,check against those already in the bucket and discard if a double - Remember in SQL, we use the DISTINCT keyword to force the elimination of project duplicates

- Composite (or concatenated) index is one based on more than one attribute. It can be clustering or otherwise - E.g. Person (NI_No, Surname, Forename, DOB...) - We can specify a composite index on surname, forename - Advantages over single attribute indexes: a. Consider the query "how many people are there with last name 'Smith' and first name 'John'? b. A query on all attributes of a compound query will likely return far fewer records than a query on only some of those attributes.

What happens when we execute a query on a database? Like all languages, high level statements must be interpreted (or compiled) prior to execution Runtime database processor performs sequence of operations before returning query result

WEEK 8 Flashcards by Unknown Unknown

Scenario S1

Access Routine S1

How well did you know this?

Not at all

Perfectly

Scenario S2

Access Routine S2

How well did you know this?

Not at all

Perfectly

Scenario S3

Access Routine S3

Can use hash search
- Since we are doing equality search
- On a key field
  a) Hash the search value to find pointer
  to the bucket
  b) May involve search of overflow
  buckets

How well did you know this?

Not at all

Perfectly

Scenario S4

In our sample data, all values for age are unique, but in a realistic data set we would not expect this to be so
In this case we may be retrieving multiple records
S4 Primary index, may be multiple records if condition involves <, <=, =>, > on the search field then we find record satisfying the equality condition (age = 36) and then retrieve all subsequent records in ordered file

How well did you know this?

Not at all

Perfectly

Scenario S5

Access Routine S5
* Clustering index search
* One index entry for each distinct value of clustering field
* Binary search of index file to find first block of data containing matching search record

How well did you know this?

Not at all

Perfectly

Scenario S6

Secondary index (B+ tree) equality and others;
- multiple records may match
Operates in much the same way as S4
Use index to find required record(s) in main file
- This time traverse tree rather than search
  ordered file
Retrieval behaviour depends on operator
< : Find first occurrence of search value, retrieve all preceding
records
> : Find last occurrence of value, retrieve all subsequent records
<= : Find last occurrence of value, retrieve it and all preceding
records
> = : Find first occurrence of value, retrieve it and all subsequent
records

How well did you know this?

Not at all

Perfectly

S6 example - consider leaf nodes

Remember a B+ tree usually has pointer from each leaf block to next leaf block
- Once first matching record located, can traverse as a linked list to obtain other records for >= comparison
- If nodes have back pointer, can traverse as doubly linked-list for <= comparison

How well did you know this?

Not at all

Perfectly

What is the Conjunctive Selection Conditions

Boolean expressions involving AND operator
General form
- (simple condition) AND (simple condition) AND…
S7: If we can use one of methods S2-S6 for an attribute in one of the simple conditions; do so; examine the records thus retrieved to see if they meet other conditions

-S8: Composite index: if two or more attributes involved in equality conditions also appear in a composite index, use the index directly

How well did you know this?

Not at all

Perfectly

Scenario S9

Access Routine S9

Intersection of record pointers: If secondary indices available on all fields involved in equality conditions and indices include record pointers (i.e. inverted files): retrieve set of record pointers from each index involved and take intersection as final result set

How well did you know this?

Not at all

Perfectly

Optimisation

Query optimisation for select needed for conjunctive conditions when different simple conditions offer an access path
Access path that results in the fewest record retrieval operations should be chosen

How well did you know this?

Not at all

Perfectly

Disjunctive conditions (OR)

Little can be done to optimise
If any one of the simple conditions involve an attribute with no access path, we must use serial access (brute force)

How well did you know this?

Not at all

Perfectly

Join Operations

One of the most time consuming operations in query processing
M*N operation
a. Cartesian product followed by a select

How well did you know this?

Not at all

Perfectly

Methods for implementing Joins

J1: Nested-loop join
- Default (brute force) algorithm
- Used when no access paths available on
either file in the join
- For each record t in R (outer loop)
- Retrieve every record s from S (inner loop)
- Test whether the two records satisfy the join condition
- t[A] = s[B]

How well did you know this?

Not at all

Perfectly

Explain J2: Single loop join

Using an access structure to retrieve the matching records
a) If index (or hash key) exists for one of the two join attributes e.g. B or file S
- Retrieve each record of t in R (loop over file R)
- Use access structure for B to retrieve all matching records from S that satisfy s[B] = t[A]

How well did you know this?

Not at all

Perfectly

Explain J3: Sort-merge join

Possible if records of R and S are physically ordered by the value of the join attribute A, B
Can implement join in most efficient way possible
Both files are scanned concurrently in order of the join attributes
a. Match up records that have same values
for A and B
Pairs of file blocks copied into memory buffers in order and records of each file only scanned once

How well did you know this?

Not at all

Perfectly

Explain J4: Hash-join

Records (or record pointers) of R and S are hashed to same hash file using same hash function on attributes R[A] AND S[B] as hash keys
Single pass through each file fills the hash buckets
Each bucket then examined from records R and S that match to produce the join result

How well did you know this?

Not at all

Perfectly

PROJECT Operations

If <attribute> include a key of R, then result has same number of tuples as R but only the attributes given in <attribute></attribute></attribute>

If not, we must detect duplicate tuples and eliminate them
Two possible elimination techniques are:
- Sort the result and then eliminated duplicates (which will now be duplicate tuples)
- Hash each projected tuple,check against those already in the bucket and discard if a double
Remember in SQL, we use the DISTINCT keyword to force the elimination of project duplicates

How well did you know this?

Not at all

Perfectly

Composite index

Composite (or concatenated) index is one based on more than one attribute. It can be clustering or otherwise
E.g. Person (NI_No, Surname, Forename, DOB…)
We can specify a composite index on surname, forename
Advantages over single attribute indexes:
a. Consider the query “how many people are there with last name ‘Smith’ and first name ‘John’?
b. A query on all attributes of a compound query will likely return far fewer records than a query on only some of those attributes.

How well did you know this?

Not at all

Perfectly

Query processing

Study These Flashcards

What happens when we execute a query on a database?
Like all languages, high level statements must be interpreted (or compiled) prior to execution
Runtime database processor performs sequence of operations before returning query result

Query processing
Query process operations

Study These Flashcards

Scanning, parsing and validation

Study These Flashcards

Scanner
Identifies language tokens such as SQL keywords, attribute names, relations etc in text of query
Parser
Checks query syntax to determine if it is formulated according to the grammar rules of
the query language
Validation
Check all attribute and relation names are valid and semantically meaningful names in
the database schema
Internal representation of query
Usually a query tree, may be a query graph

Query processing
Query Optimiser, Code Generator, Runtime database processor

Study These Flashcards

Query optimiser
- Responsible for devising an execution plan for retrieving the result of the query from the DB
- Query can have several possible execution plans, optimiser chooses a suitable one
Code generator
- Generates the code to execute the plan produced by the optimiser
Runtime database processor
- Runs query code to produce the query result
- Code may be executed in compiled or interpreted mode
- Error message generated if runtime error occurs

Query optimisation
Process of selecting a suitable execution plan for the query, Three Main Techniques

Study These Flashcards

Process of selecting a suitable execution plan for the query
- May or may not be ‘optimal’ plan
- Seeks to provide a reasonably efficient strategy
- Finding optimal strategy is usually too costly except for simplest of queries
Three main techniques
- Cost based
- Semantics based
- Heuristic based
- May be used in conjunction with each other

Cost based optimisation

Study These Flashcards

Search solution space to find the solution that minimises cost
- Compare costs of executing a query using different execution strategies and choose the lowest cost estimate
- Cost functions are estimates, so a solution that is not the optimal may be chosen

Cost based optimisation * What is cost and how do we calculate it?

Cost: The amount of time to complete an operation

Cost based optimisation Cost components

* Access cost to secondary storage * Storage cost * Computation cost * Memory usage cost * Communication cost

Cost components Access cost to secondary storage

* Cost of searching for, reading and writing data blocks residing on secondary storage * Depends on type of access structures on the datafile * Also on whether block allocation is contiguous (more relevant for HDD than SSD)

Cost components Storage cost

Cost of storing intermediate files generated by an execution strategy

Cost components Computation cost

* Cost of performing in memory operations on data buffers during execution * Include operations such as searching for and sorting records, merging records for JOIN and performing computations on field values

Cost components Memory cost

Cost pertaining to number of memory buffers needed during execution

Cost components Communication cost

* Cost of shipping query and it’s results between originating client and database (and any inter-database communication)

Cost minimisation aims for differing scenarios

* Large databases * Minimise access cost to secondary storage * Minimise no of block transfers between disk and main memory : reduce volume of data moved * Smaller databases * Minimise computation cost * Most of data involved in query can be stored completely in main memory * Distributed databases * Minimise communication cost

Semantic query optimisation

* Use constraints specified on the schema to inform query execution * E.g. unique attributes, nullable fields * Consider: * SELECT * FROM Customers WHERE CName IS NULL * If Cname is declared as NOT NULL then this query will never return any results * Semantic optimisation says don’t run the query at all! * As well as schema constraints, custom constraints can be implemented

Heuristic based optimisation Query process operations

Intermediate query form is usually a ‘query tree’ whose leaves are the relations involved in the query and whose nodes are the RA operators

Query tree example - sample data

Canonical query tree

Query tree notation

* Leaf nodes: input relations of query * Internal nodes: RA operations * Execution of query tree * Execute an internal node operation whenever it’s operands are available * Replace internal node by resulting relation * Execution terminates when route node executed and query result is obtained

Relational operators as a query mechanism

Multi relational queries * Data we require may not be present in any single existing relation. * The relational operators allow us to transform and combine existing relations in order to produce a single relation which holds the answer to our question * "What are the names of the crew members on board the C-37D?”

* "What are the names of the crew members on board the C37D?” * This involves the Pnames and Snames fields; we do not currently have a single relation holding these fields. In order to answer this query, we must combine all three relations in the database portion. * Personnel -> PID, Pname, Rank, Bplace * Ship -> SID, Sname, Class, Crew

Optimisation of high level query language expressions

* Low level navigational languages (hierarchic and network)  hand optimised by the programmer. * High level relational languages  Declarative (what rather than how) requires optimisation

Heuristic optimisation of query trees

* Main (most obvious?) heuristic is to apply select and project before joins * Select and project usually reduce the size of a relation, they never increase it * As join is M * N operation where M,N are size of relations involved, decreasing M and/or N is an optimisation

Heuristic Optimisation of query trees Example

Intermediate step

* Second two splits are more complex * They refer to two relations each * Can’t apply these directly * Have to first combine the two relations they refer to

End of step 1

Step2

Final steps * Finally we can replace any cartesian product operations with JOIN operations

General heuristic rules

* Select and Project operations moved down the tree * Most restrictive joins and selects also moved down * Project optimisations are also applied (not shown)

Heuristic query optimisation Summary

* Previous example shows that a query tree can be transformed step by step into another query tree that is more efficient to execute * Must make sure that transformation steps always lead to an equivalent query tree * Query optimiser needs equivalence preserving transformation rules * See Elmasri and Navathe for list of a number of transformation rules and a simple optimisation algorithm employing those rule

WEEK 8 Flashcards

(47 cards)