WEEK 8 Flashcards
Scenario S1
Access Routine S1
Scenario S2
Access Routine S2
Scenario S3
Access Routine S3
- Can use hash search
- Since we are doing equality search
- On a key field
a) Hash the search value to find pointer
to the bucket
b) May involve search of overflow
buckets
Scenario S4
- In our sample data, all values for age are unique, but in a realistic data set we would not expect this to be so
- In this case we may be retrieving multiple records
- S4 Primary index, may be multiple records if condition involves <, <=, =>, > on the search field then we find record satisfying the equality condition (age = 36) and then retrieve all subsequent records in ordered file
Scenario S5
Access Routine S5
* Clustering index search
* One index entry for each distinct value of clustering field
* Binary search of index file to find first block of data containing matching search record
Scenario S6
- Secondary index (B+ tree) equality and others;
- multiple records may match
- Operates in much the same way as S4
- Use index to find required record(s) in main file
- This time traverse tree rather than search
ordered file
- This time traverse tree rather than search
- Retrieval behaviour depends on operator
- < : Find first occurrence of search value, retrieve all preceding
records - > : Find last occurrence of value, retrieve all subsequent records
- <= : Find last occurrence of value, retrieve it and all preceding
records - > = : Find first occurrence of value, retrieve it and all subsequent
records
S6 example - consider leaf nodes
- Remember a B+ tree usually has pointer from each leaf block to next leaf block
- Once first matching record located, can traverse as a linked list to obtain other records for >= comparison
- If nodes have back pointer, can traverse as doubly linked-list for <= comparison
What is the Conjunctive Selection Conditions
- Boolean expressions involving AND operator
- General form
- (simple condition) AND (simple condition) AND…
- S7: If we can use one of methods S2-S6 for an attribute in one of the simple conditions; do so; examine the records thus retrieved to see if they meet other conditions
-S8: Composite index: if two or more attributes involved in equality conditions also appear in a composite index, use the index directly
Scenario S9
Access Routine S9
- Intersection of record pointers: If secondary indices available on all fields involved in equality conditions and indices include record pointers (i.e. inverted files): retrieve set of record pointers from each index involved and take intersection as final result set
Optimisation
- Query optimisation for select needed for conjunctive conditions when different simple conditions offer an access path
- Access path that results in the fewest record retrieval operations should be chosen
Disjunctive conditions (OR)
- Little can be done to optimise
- If any one of the simple conditions involve an attribute with no access path, we must use serial access (brute force)
Join Operations
- One of the most time consuming operations in query processing
- M*N operation
a. Cartesian product followed by a select
Methods for implementing Joins
J1: Nested-loop join
- Default (brute force) algorithm
- Used when no access paths available on
either file in the join
- For each record t in R (outer loop)
- Retrieve every record s from S (inner loop)
- Test whether the two records satisfy the join condition
- t[A] = s[B]
Explain J2: Single loop join
Using an access structure to retrieve the matching records
a) If index (or hash key) exists for one of the two join attributes e.g. B or file S
- Retrieve each record of t in R (loop over file R)
- Use access structure for B to retrieve all matching records from S that satisfy s[B] = t[A]
Explain J3: Sort-merge join
- Possible if records of R and S are physically ordered by the value of the join attribute A, B
- Can implement join in most efficient way possible
- Both files are scanned concurrently in order of the join attributes
a. Match up records that have same values
for A and B - Pairs of file blocks copied into memory buffers in order and records of each file only scanned once
Explain J4: Hash-join
- Records (or record pointers) of R and S are hashed to same hash file using same hash function on attributes R[A] AND S[B] as hash keys
- Single pass through each file fills the hash buckets
- Each bucket then examined from records R and S that match to produce the join result
PROJECT Operations
If <attribute> include a key of R, then result has same number of tuples as R but only the attributes given in <attribute></attribute></attribute>
- If not, we must detect duplicate tuples and eliminate them
- Two possible elimination techniques are:
- Sort the result and then eliminated duplicates (which will now be duplicate tuples)
- Hash each projected tuple,check against those already in the bucket and discard if a double
- Remember in SQL, we use the DISTINCT keyword to force the elimination of project duplicates
Composite index
- Composite (or concatenated) index is one based on more than one attribute. It can be clustering or otherwise
- E.g. Person (NI_No, Surname, Forename, DOB…)
- We can specify a composite index on surname, forename
- Advantages over single attribute indexes:
a. Consider the query “how many people are there with last name ‘Smith’ and first name ‘John’?
b. A query on all attributes of a compound query will likely return far fewer records than a query on only some of those attributes.