Efficiency, Indexing, Physical Design Flashcards

1
Q

Why should we study efficiency

A

If databases and DBMS don’t run fast enough to be useful, then the point is missed entirely.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Main Memory (RAM)

A

Volatile, fast, small, and expensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Secondary Memory (DISK)

A

Permanent, slow, big, and cheap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Main Memory Access Time

A

30ns (.3 x 10^-7 sec)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Disk Access Time

A

10ms (1 x 10^-2)

Only this cost (I/O) is calculated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Parts of the Disk

A
Read/Write Head
Actuator
Arm
Spindle
Track/Cylinder
Sector
Block
Platters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Spanned Representation

A

Where a record is split up between two blocks of disk memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Unspanned Representation

A

Where a record exists on a single block on disk memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why not fill up a block with data?

A

We may want to leave space at the end of a block in case we need to insert new records.
Target is 80% filled.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

File

A

A series of blocks linked by address pointers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Seek Time

A

Time it takes to find a block on disk. Costs 3-8ms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Rotation Delay

A

Time it takes for the disk to rotate to a block. Costs 2-3ms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Transfer Time

A

Time it takes for I/O to deliver the data via data bus to Main Memory. Costs .5-1ms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

LRU buffer management strategy

A

When we run out of buffer space and need to free some, the least recently used space will be overwritten.

Excellent for Merge Joins
Kills Nested loop joins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Heap

A

Unsorted file of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Heap Lookup Time

A

N/2 | N = # data blocks
(ex. 200000 / 2 * .01s = 16.6 min)

Sometimes we’re lucky and the target is in the first location.
Sometimes we’re unlucky and its in the last location.

17
Q

Sorted Binary Search

A

Inspect records halfway through file and determine if we should search before or after this record. Repeat until record is found.

18
Q

Sorted Linear Search Time

A

N/2 | N = # data blocks

ex 200000 / 2 * .01s = 16.6 min

19
Q

Sorted Binary Search Time

A

log2(N) | N = # data blocks

ex log2(200000) * .01s = 18 * .01s = .18s

20
Q

Primary Index

A

Create a copy of the key of a record into a new block, and create a pointer to the address of the block of data the key belongs to.

Block design:
K = Key
P = Pointer
+—–+——+——+——+——+——+
| K1 | P1 | K2 | P2 | K3 | P3 |
+—–+——+——+——+——+——+

21
Q

Primary Index Lookup Time

A

log2(n) + 1 | n = # index blocks

ex: log2(n / fanout) + 1 | fanout = 60

Sparse (200,000 records):
(log2(200/000 / 60) + 1) * 0.01s = (12+1)*0.01s = 0.13s

Dense (4M):
(log2(4,000,000 / 60) + 1) * 0.01s = (16+1)*0.01s = 0.17s

22
Q

Index Block Calculation

A

data blocks / fanout

23
Q

Fanout

A

The number of overall blocks the total records fall into.

24
Q

Sparse (Single Level) Index

A

index records are not created for every search key value.

25
Q

Dense Index

A

an index record is created for every search key value in the database. Requires more space.

26
Q

Multi Level Index

A

Where we have an index for the indices of our blocks of data.

27
Q

Multi Level Lookup Time

A

log(fanout)(n) + 1 | n = # index blocks

28
Q

Multi Level Index B-Tree

A

All data is at the bottom, and all nodes above that are index node.

If the tree deteriorates over time, so does the search time.
It’s very important that the distance from the root to the base level never change over time.

29
Q

Static Hashing

A

Hashing function hashes keys from a keyspace which resolves an address in a designated address space. This address holds a pointer for a linked data block.

30
Q

Properties of a Good Hash Function

A

1) Distribute values uniformly over the address space
2) Fill buckets as much as possible
3) Avoid collisions

31
Q

Properties of Good Static Hashing

A

Linked data blocks are uniform, address space is appropriate size so linked lists don’t get too deep.

32
Q

Blocking Factor

A

Number of blocks in a file