Efficiency, Indexing, Physical Design Flashcards
Why should we study efficiency
If databases and DBMS don’t run fast enough to be useful, then the point is missed entirely.
Main Memory (RAM)
Volatile, fast, small, and expensive
Secondary Memory (DISK)
Permanent, slow, big, and cheap
Main Memory Access Time
30ns (.3 x 10^-7 sec)
Disk Access Time
10ms (1 x 10^-2)
Only this cost (I/O) is calculated
Parts of the Disk
Read/Write Head Actuator Arm Spindle Track/Cylinder Sector Block Platters
Spanned Representation
Where a record is split up between two blocks of disk memory
Unspanned Representation
Where a record exists on a single block on disk memory
Why not fill up a block with data?
We may want to leave space at the end of a block in case we need to insert new records.
Target is 80% filled.
File
A series of blocks linked by address pointers.
Seek Time
Time it takes to find a block on disk. Costs 3-8ms
Rotation Delay
Time it takes for the disk to rotate to a block. Costs 2-3ms
Transfer Time
Time it takes for I/O to deliver the data via data bus to Main Memory. Costs .5-1ms
LRU buffer management strategy
When we run out of buffer space and need to free some, the least recently used space will be overwritten.
Excellent for Merge Joins
Kills Nested loop joins
Heap
Unsorted file of data.
Heap Lookup Time
N/2 | N = # data blocks
(ex. 200000 / 2 * .01s = 16.6 min)
Sometimes we’re lucky and the target is in the first location.
Sometimes we’re unlucky and its in the last location.
Sorted Binary Search
Inspect records halfway through file and determine if we should search before or after this record. Repeat until record is found.
Sorted Linear Search Time
N/2 | N = # data blocks
ex 200000 / 2 * .01s = 16.6 min
Sorted Binary Search Time
log2(N) | N = # data blocks
ex log2(200000) * .01s = 18 * .01s = .18s
Primary Index
Create a copy of the key of a record into a new block, and create a pointer to the address of the block of data the key belongs to.
Block design:
K = Key
P = Pointer
+—–+——+——+——+——+——+
| K1 | P1 | K2 | P2 | K3 | P3 |
+—–+——+——+——+——+——+
Primary Index Lookup Time
log2(n) + 1 | n = # index blocks
ex: log2(n / fanout) + 1 | fanout = 60
Sparse (200,000 records):
(log2(200/000 / 60) + 1) * 0.01s = (12+1)*0.01s = 0.13s
Dense (4M):
(log2(4,000,000 / 60) + 1) * 0.01s = (16+1)*0.01s = 0.17s
Index Block Calculation
data blocks / fanout
Fanout
The number of overall blocks the total records fall into.
Sparse (Single Level) Index
index records are not created for every search key value.
Dense Index
an index record is created for every search key value in the database. Requires more space.
Multi Level Index
Where we have an index for the indices of our blocks of data.
Multi Level Lookup Time
log(fanout)(n) + 1 | n = # index blocks
Multi Level Index B-Tree
All data is at the bottom, and all nodes above that are index node.
If the tree deteriorates over time, so does the search time.
It’s very important that the distance from the root to the base level never change over time.
Static Hashing
Hashing function hashes keys from a keyspace which resolves an address in a designated address space. This address holds a pointer for a linked data block.
Properties of a Good Hash Function
1) Distribute values uniformly over the address space
2) Fill buckets as much as possible
3) Avoid collisions
Properties of Good Static Hashing
Linked data blocks are uniform, address space is appropriate size so linked lists don’t get too deep.
Blocking Factor
Number of blocks in a file