Exam 2 Flashcards

1
Q

Database file index

A

similar in theme to an index in a textbook

data structure allowing fast lookups for data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Types of indexes

A

Ordered Indexes

Hash Indexes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

ordered indexes

A

values are stored sequentially within the file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Hash indexes

A

values uniformly distributed across a range of buckets (blocks) by a hash function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is the word values used instead of records when talking about ordered and hash indexes

A

because it is the values which determine the location of the records

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an index

A

metadata that helps access record data

stored in blocks. The blocks contain pointers into other blocks ( which contain records)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what are the 2 orderings of indexes

A

primary and secondary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

primary index

A

for a sequentially ordered file, the index whose search key specifies record order

also called clustering index
the search index is usually the primary key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

secondary index

A

an index whose search key specifies an order different than the file order

records are not sorted by the search key

also called non-clustering index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are the 2 Formats for indexes

A

Dense and sparse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Dense indexes

A

the index points to sequential records stored within a file.

one index entry per search key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Sparse index

A

contains index records for only some of the search-key values

only applicable when records are sequentially ordered by the search key (clustering indexes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

pros and cons for dense and sparse indexes

A

dense
-entry per key value
-faster lookup time
-more space required
-more maintenance

sparse
-fewer entries
-slower lookup time
-less space required
-less maintenance

good tradeoff is one sparse index entry per block

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

B+ Tree

A

Multilevel index structure

A balanced, tree-based data structure
High fanout

automatically reorganizes itself on inserts/deletions

reorganizations of the entire file is not needed to maintain performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

structure of B+ Trees

A

n pointers
n-1 search keys

search keys are organized such that
if i < j then Ki < Kj

internal nodes - an internal level of nodes provide a sparse index on the level of nodes below it

leaf nodes - leaf nodes provide a dense index on the file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are the rules for internal nodes in a B+ Tree

A

may hold up to n pointers

must hold at least n/2 pointers (round up)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what are the rules for leaf nodes in a B+ Tree

A

may hold up to n-1 pointers

must hold at least (n-1) / 2 values (round up)

18
Q

what happens when a node is at capacity when trying to insert

A

split up the node

19
Q

what happens when a leaf node falls below the (n-1)/2 (round up) values

A

we coalesce nodes by either

merging the node with a sibling node

redistribute the entries between the nodes to ensure that each is half full

20
Q

cost of select - full file scan - no index, a heap

A

cost = br * tT + tS

br denoting all block in relation r

21
Q

cost of select - linear search on a sorted file

A

cost = br/2 * tT + tS

read half the blocks on average (br/2)

22
Q

cost of lookups on B+ Tree indexes

A

hi + 1

hi defines the tree height
+1 for the data block beyond the leaf node

23
Q

cost of select on a B+ Tree ordered by search key

A

cost = (hi + 1) * (tT + tS)

24
Q

what are the steps of query processing

A

1 parsing & translation
2 optimization
3 evaluation

25
Q

in what order are sql queries expressed in relational algebra

A

1 cross product of relations
2 selection
3 projection

26
Q

what is the generic query plan cost

A

cost = b * tT + s * tS

tT time to transfer a block from disk
tS time to perform a seek (access time)

b number of blocks to transfer from disk
s number for seeks that must occur

27
Q

what are the 2 methods of query evaluation

A

materialized and pipelined

28
Q

materialized query eval

A

In materialized evaluation (materialization), each operator reads data page-at-a-timeand then writes each producedpage to a temporary file.

The next operator can read from thatfile (intermediate data) and produce its own materialized output fileand so on.

29
Q

Pipelined query eval

A

In pipelined evaluation, each operator will read a single block at a time and deliver its result directly to the operator above it.

30
Q

what are the operator interface methods

A

void open(datasource, args)

page next()

void close()

31
Q

what is the pseudo code for project

A

For each record in table
Keep certain attr
Return record

32
Q

what is the pseudo code for select

A

For each record in table
Test predicate
If true
Return record

33
Q

what is the pseudo code for join

A

Simple nested loop

For each record Rs in Table S
For each Record Rt in Table T
If predicate on Rs and Rt is true
Return Record(Rs + Rt)

Block Nested loop join

For each block Bs in Table S
For each block Bt in Table T
For each Rs in Block Bs
For each Rt in block Bt
If predicate on Rs and Rt is true
Return Rs + Rt

34
Q

how does the System Catalog plays into cost analysis

A

the system catalog contains statistics about the database that is used to calculate the cost of a query plan.

35
Q

what is a transaction

A

a transaction is a unit of execution in which data is read and modified. The steps within it appear to the user as indivisible

a transaction will execute entirely or not at all. If failure is imminent, changes due to partial execution will be rolled back

36
Q

what does A.C.I.D stand for

A

atomicity

consistency

isolation

durability

37
Q

atomicity

A

either all operations of the transaction are applied to the data or none are

38
Q

consistency

A

execution results in the data staying consistent/valid as relating to integrity constraints on the data

39
Q

isolaiton

A

transactions don’t “appear” to run concurrently. Any other transaction runs “before” or “after” the current transaction

Isolation guarantees that the intermediate states of a transaction are not visible to other transactions until the transaction is complete.

40
Q

durability

A

a completed transaction persists even if there are system failures

41
Q

explain Concurrent Query Execution

A

Concurrent Query Execution refers to the ability of a database system to execute multiple queries simultaneously

Concurrency: In multi-user environments, many transactions may be processed simultaneously. Isolation ensures correctness even when transactions overlap.