Cost Estimation of Selection Algorithms Flashcards

Question 1

Q

Describe measures of query cost

Answer

A

Cost is generally measured as the total time it takes to answer a query. Several factors contribute to query costs eg disk accesses, CPU, and network communication

Question 2

Q

Discuss disk access as measure of query cost

Answer

A

Disk access is the most predominant measure of query cost.
It is calculated by
no of disk accessesdisk access cost
no of blocks readaverage block-read-cost
no of blocks written*average block-read cost

Question 3

Q

Compare the cost to read and write a block

Answer

A

The cost to write a block is higher than the cost to read a block since once a block is written it must be read back for confirmation

Question 4

Q

What things will our cost formulae ignore for simplicity

Answer

A

The cost formulae will just have the number if block transfers form the disk as the cost measure
It will ignore CPU costs and Cost of writing output to disk.
It will also ignore the difference between sequential and random IO

Question 5

Q

Relate cost and the size of the buffer

Answer

A

-An increased size of memory buffer reduces the number of disk accesses required hence reducing the cost.
-The Amount of real memory available to buffer available changes with time based on the concurrent processes, however, we often use the worst case when calculating costs

Question 6

Q

What is a file scan

Answer

A

Search algorithms that locate and retrieve records that fulfil a selection condition

Question 7

Q

Describe Algorithm S1, explaining how it works, and its approximate costs

Answer

A

Also known as linear search
Searches every file and tests all records to see if they fulfill the search condition
The cost of S1 is br (which represents the total number of blocks containing relation r)
If the search condition is on a key attribute then the cost is (br/2) since the search will stop once the record is found

Question 8

Q

What are the benefits of S1

Answer

A

It can be performed regardless of :
the presence of indexes
type of selection condition
ordering of records in the file

Question 9

Q

Do example on 17

Question 10

Q

Describe S2

Answer

A

Also known as binary search
Applicable only when the condition is an equality condition

*See notes for cost estimate
But basically:
cost= log2(br)+ (number of matches for equality condition/br)-1
All decimals are rounded up

*Do questions in a document called ADQOQ in downloads p6

Question 11

Q

Describe S3a

Answer

A

Used when equality condition is on a key attribute with a primary index.
Index is used to locate the record
Cost-> Number of index levels accessed + 1

Question 12

Q

Describe S3b

Answer

A

Used when equality condition is on an attribute with a hash key.
*Retrieves a single record
The hashing algorithm is used to locate the address of the record.

If there is no overflow, the cost of S3b is 1.
If there is overflow, additional accesses may be necessary

Question 13

Q

Describe S4

Answer

A

Uses a primary index to access multiple records.
- Used when a <, <=, > or >= comparison is used on an attribute with a primary index
-The primary index is first used to locate the record satisfying the equality condition then the result is either the records before or after it it depending,
-It assumes the records are sorted.

Assuming uniform distribution, then we would expect half the tuples to satisfy the inequality, so
the estimated cost is:
Number of levels accessed + number of blocks/2

Question 14

Q

Describe S5

Answer

A

Uses the clustering index to locate multiple records
-If the condition is on an attribute A that is not the primary key but has a secondary clustering index, then the index can be used to locate the required records.

-See notes 22 for cost
-But basically, it is the number of levels accessed+ the number of blocks determined to contain the records with the cluster index
-To get the second part its records with the index/bfr
(Similar to last part of S2)

Question 15

Q

Describe S6

Answer

A

Uses a secondary (B+ tree index)
-If the index is on a key attribute, then it is used to retrieve a single attribute and if it is not on a key attribute, then it retrieves multiple values

The cost depends on whether or not it is on a key attribute.
If it is the cost is
number of levels accessed +1 (like S3a)
If it is not the cost is
number if levels accessed + number of records with index (very costly)

Question 16

Q

Which Algorithms are used for disjunctive selections

Answer

Study These Flashcards

A

These are basically selections with OR

Index/Binary- Used when all conditions have an access path (they are ordered or have an index). The conditions are processed separately and then a union is done on the results
Linear- If one or more of the conditions do not have a corresponding access path or index,

Question 17

Q

Describe S7

Answer

Study These Flashcards

A

Conjunctive (AND) selection using an individual index

-If any of the conditions in the complex condition is on an attribute that has an access path S2 to S6 is used then the remaining conditions are tested on the obtained records

Question 18

Q

Describe S8

Answer

Study These Flashcards

A

Conjunctive Selection using a composite index

-In this case two or more equality conditions make up the complex condition. These simple conditions are on attributes that have a composite index on them. For this case use the composite index for the selection

Question 19

Q

Do the question on 29

Answer

Study These Flashcards

A

kjsd

Cost Estimation of Selection Algorithms Flashcards

(19 cards)