Final Flashcards

1
Q

Linear Search (A1) Cost

A

b_r + t_S
One initial seek
b_r block transfers, where b_r is the number of blocks in a file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Linear Search (A1) Cost, Equality on Key Average Case.

A

t_s+ (b_r/2). Average case, scan can be terminated as soon as a record is found.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Primary B+-tree Index search, Equality on Key (A2)

A

(h_i + 1) * t_T
h_i - tree height.
Index traverses tree height and 1 I/O to fetch the record.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Primary B+-tree Index, Equality on Nonkey (A3)

A

h_i * t_T + b * t_T
Height of the tree and then however many blocks we get

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

SQL Query Processor Steps

A

Parser -> Translator -> Optimizer -> Evaluator -> Query Output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Predominant Cost in Evaluation

A

Disk Access

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

t_T

A

Time to transfer one block

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

t_S

A

Time for one seek

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Generic Cost Algorithm

A

bt_T + St_S

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

b_r

A

Number of blocks containing records from relation r

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Secondary Index, Equality on Key (A4)

A

(h_i+1)*t_t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

TF: Index Scans are Search Algorithms

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

b

A

Blocks with records with the specified search key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Secondary Index, Equality on Nonkey (A4)

A

(h_i + n) * t_T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Primary Index Comparison (A5)

A

h_i*t_T + b * t_T
Less than: Start at first tuple and scan until end
Greater than: Use index to find first tuple and scan onwards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Secondary Index Comparison (A6)

A

(h_i+n) * t_T
Same as A5 but you’re finding pointers to records
Linear file scan might be cheaper

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Conjunctive Selection with One Index (A7)

A

Select algos in A1-7 for lowest cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Composite Index Matching with Tree Indexes

A

Involves attributes in prefix of search key. Ex, given a,b,c.
a=1, b=2 is good
a=1, c=3 is bad

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Composite Index Matching with Hash Indexes

A

every term must have a value, order does not matter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Disjunctive Selection (A10)

A

Can use if all conditions have indices, else use Linear Scan

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Negation

A

Use linear scan

22
Q

Nested Loop Algorithm

A

for t in t_r
for u in t_s
if cond(t,u); then add to result

23
Q

Nested Loop Cost

A

n_r * b_s + b_r

24
Q

Nested Loop Cost if inner relation fits in memory

A

b_r + b_s

25
Q

Block Nested Loop Cost

A

b_r * b_s + b_r

26
Q

Improved Block Nested Loop Cost

A

[b_r / (M-2)] * b_s + b_r
3 partitions of M needed for outer, inner, and output

27
Q

TF: In a block nested-loop join, if equi-join attribute forms a key on the inner relation, stop the inner loop on the first match

A

True

28
Q

Index Nested-Loop Join Conditions

A

Join is equi or natural join
Index is available on inner relation’s join attribute

29
Q

Index Nested-Loop Join Process

A

For each tuple in t_r, use the index to look up tuples in s that satisfy the outer join condition

30
Q

Index Nested-Loop Join Process Worst Case

A

Buffer has space for one page of r and for each tuple in r, we do an index lookup on s
b_r + n_r * c
c = cost of fetching all matching s tuples for r. Cost of a single selection on s using the join condition.

31
Q

TF: If indices are available on join attributes on r and s, use the relation with fewer tuples as the inner relation

A

False, as the outer

32
Q

Merge-Join Conditions

A

Join is equi or natural join. If R and S are sorted, then join can be computed in a similar manner as the merge-sort algo

33
Q

Merge-Join Cost

A

b_r + b_s + sort cost
Worst Case Sort: M log M + N log N

34
Q

Hybrid Merge Join Process

A

One relation is sorted, and the other is unsorted but has a secondary index on the join attribute. Merge sorted relation with leaf entries of the secondary index

35
Q

Hash Join Process

A

Partition s using hash function h, reserving 1 block of memory as the output buffer for each partition
Do the same with r
for each i:
- load s_i into memory
- build an in-memory hash index on s_i using the join attribute and a different h
- Read tuples from r_i one by one
- Find the matching s_i using the hash index

36
Q

Hash Join Build Input

A

Used to build in-memory hash index (s)

37
Q

Hash Join Probe Input

A

Used to find matches of in-built index (r)

38
Q

When is recursive partitioning needed in the hash join?

A

When n_h + 1 >= M
Also sqrt(b_s) >= M

39
Q

When does hash table overflow occur?

A

If hash index on s_i is larger than main memory

40
Q

When is hash table overflow common?

A
  • Many tuples in s with same join value. This can be subverted with block nested loop join
  • Bad hash function
41
Q

Hash Join Cost

A

3(b_r + b_s). Just b_r + b_s if build input can be kept in memory.

42
Q

Hybrid Hash Join

A

Keeps first partition of build relation in memory. Most useful when M is much greater than sqrt(b_s)

43
Q

Joins with a conjunctive / disjunctive condition

A

Nested block joins

44
Q

Duplicate elimination

A

Sorting - next to each other
Hashing - in same bucket

45
Q

TF: Secondary indices must always be dense

A

True

46
Q

Difference between primary and secondary indices

A

Primary stored in sequential order, secondary are not

47
Q

TF: The Result of an SQL statement is always a relation

A

True

48
Q

Cost of External File Sort

A

2N*passes, where N is the number of pages

49
Q

Runs in first pas of external file sort

A

N/B, where N is the number of pages, and B is the number of available buffer pages

50
Q

Number of passes needed to sort a file completely

A

1+ceil(log(N/B,B-1))