Cost Estimation of Selection Algorithms Flashcards

1
Q

Describe measures of query cost

A

Cost is generally measured as the total time it takes to answer a query. Several factors contribute to query costs eg disk accesses, CPU, and network communication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Discuss disk access as measure of query cost

A

Disk access is the most predominant measure of query cost.
It is calculated by
no of disk accessesdisk access cost
no of blocks read
average block-read-cost
no of blocks written*average block-read cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Compare the cost to read and write a block

A

The cost to write a block is higher than the cost to read a block since once a block is written it must be read back for confirmation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What things will our cost formulae ignore for simplicity

A

The cost formulae will just have the number if block transfers form the disk as the cost measure
It will ignore CPU costs and Cost of writing output to disk.
It will also ignore the difference between sequential and random IO

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Relate cost and the size of the buffer

A

-An increased size of memory buffer reduces the number of disk accesses required hence reducing the cost.
-The Amount of real memory available to buffer available changes with time based on the concurrent processes, however, we often use the worst case when calculating costs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a file scan

A

Search algorithms that locate and retrieve records that fulfil a selection condition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe Algorithm S1, explaining how it works, and its approximate costs

A

Also known as linear search
Searches every file and tests all records to see if they fulfill the search condition
The cost of S1 is br (which represents the total number of blocks containing relation r)
If the search condition is on a key attribute then the cost is (br/2) since the search will stop once the record is found

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the benefits of S1

A

It can be performed regardless of :
the presence of indexes
type of selection condition
ordering of records in the file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Do example on 17

A

*

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe S2

A

Also known as binary search
Applicable only when the condition is an equality condition

*See notes for cost estimate
But basically:
cost= log2(br)+ (number of matches for equality condition/br)-1
All decimals are rounded up

*Do questions in a document called ADQOQ in downloads p6

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe S3a

A

Used when equality condition is on a key attribute with a primary index.
Index is used to locate the record
Cost-> Number of index levels accessed + 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe S3b

A

Used when equality condition is on an attribute with a hash key.
*Retrieves a single record
The hashing algorithm is used to locate the address of the record.

If there is no overflow, the cost of S3b is 1.
If there is overflow, additional accesses may be necessary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe S4

A

Uses a primary index to access multiple records.
- Used when a <, <=, > or >= comparison is used on an attribute with a primary index
-The primary index is first used to locate the record satisfying the equality condition then the result is either the records before or after it it depending,
-It assumes the records are sorted.

Assuming uniform distribution, then we would expect half the tuples to satisfy the inequality, so
the estimated cost is:
Number of levels accessed + number of blocks/2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe S5

A

Uses the clustering index to locate multiple records
-If the condition is on an attribute A that is not the primary key but has a secondary clustering index, then the index can be used to locate the required records.

-See notes 22 for cost
-But basically, it is the number of levels accessed+ the number of blocks determined to contain the records with the cluster index
-To get the second part its records with the index/bfr
(Similar to last part of S2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe S6

A

Uses a secondary (B+ tree index)
-If the index is on a key attribute, then it is used to retrieve a single attribute and if it is not on a key attribute, then it retrieves multiple values

The cost depends on whether or not it is on a key attribute.
If it is the cost is
number of levels accessed +1 (like S3a)
If it is not the cost is
number if levels accessed + number of records with index (very costly)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which Algorithms are used for disjunctive selections

A

These are basically selections with OR

  1. Index/Binary- Used when all conditions have an access path (they are ordered or have an index). The conditions are processed separately and then a union is done on the results
  2. Linear- If one or more of the conditions do not have a corresponding access path or index,
17
Q

Describe S7

A

Conjunctive (AND) selection using an individual index

-If any of the conditions in the complex condition is on an attribute that has an access path S2 to S6 is used then the remaining conditions are tested on the obtained records

18
Q

Describe S8

A

Conjunctive Selection using a composite index

-In this case two or more equality conditions make up the complex condition. These simple conditions are on attributes that have a composite index on them. For this case use the composite index for the selection

19
Q

Do the question on 29

A

kjsd