Exam 2 Flashcards

(41 cards)

1
Q

Database file index

A

similar in theme to an index in a textbook

data structure allowing fast lookups for data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Types of indexes

A

Ordered Indexes

Hash Indexes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

ordered indexes

A

values are stored sequentially within the file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Hash indexes

A

values uniformly distributed across a range of buckets (blocks) by a hash function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is the word values used instead of records when talking about ordered and hash indexes

A

because it is the values which determine the location of the records

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an index

A

metadata that helps access record data

stored in blocks. The blocks contain pointers into other blocks ( which contain records)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what are the 2 orderings of indexes

A

primary and secondary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

primary index

A

for a sequentially ordered file, the index whose search key specifies record order

also called clustering index
the search index is usually the primary key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

secondary index

A

an index whose search key specifies an order different than the file order

records are not sorted by the search key

also called non-clustering index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are the 2 Formats for indexes

A

Dense and sparse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Dense indexes

A

the index points to sequential records stored within a file.

one index entry per search key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Sparse index

A

contains index records for only some of the search-key values

only applicable when records are sequentially ordered by the search key (clustering indexes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

pros and cons for dense and sparse indexes

A

dense
-entry per key value
-faster lookup time
-more space required
-more maintenance

sparse
-fewer entries
-slower lookup time
-less space required
-less maintenance

good tradeoff is one sparse index entry per block

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

B+ Tree

A

Multilevel index structure

A balanced, tree-based data structure
High fanout

automatically reorganizes itself on inserts/deletions

reorganizations of the entire file is not needed to maintain performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

structure of B+ Trees

A

n pointers
n-1 search keys

search keys are organized such that
if i < j then Ki < Kj

internal nodes - an internal level of nodes provide a sparse index on the level of nodes below it

leaf nodes - leaf nodes provide a dense index on the file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are the rules for internal nodes in a B+ Tree

A

may hold up to n pointers

must hold at least n/2 pointers (round up)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what are the rules for leaf nodes in a B+ Tree

A

may hold up to n-1 pointers

must hold at least (n-1) / 2 values (round up)

18
Q

what happens when a node is at capacity when trying to insert

A

split up the node

19
Q

what happens when a leaf node falls below the (n-1)/2 (round up) values

A

we coalesce nodes by either

merging the node with a sibling node

redistribute the entries between the nodes to ensure that each is half full

20
Q

cost of select - full file scan - no index, a heap

A

cost = br * tT + tS

br denoting all block in relation r

21
Q

cost of select - linear search on a sorted file

A

cost = br/2 * tT + tS

read half the blocks on average (br/2)

22
Q

cost of lookups on B+ Tree indexes

A

hi + 1

hi defines the tree height
+1 for the data block beyond the leaf node

23
Q

cost of select on a B+ Tree ordered by search key

A

cost = (hi + 1) * (tT + tS)

24
Q

what are the steps of query processing

A

1 parsing & translation
2 optimization
3 evaluation

25
in what order are sql queries expressed in relational algebra
1 cross product of relations 2 selection 3 projection
26
what is the generic query plan cost
cost = b * tT + s * tS tT time to transfer a block from disk tS time to perform a seek (access time) b number of blocks to transfer from disk s number for seeks that must occur
27
what are the 2 methods of query evaluation
materialized and pipelined
28
materialized query eval
In materialized evaluation (materialization), each operator reads data page-at-a-time and then writes each produced page to a temporary file. The next operator can read from that file (intermediate data) and produce its own materialized output file and so on.
29
Pipelined query eval
In pipelined evaluation, each operator will read a single block at a time and deliver its result directly to the operator above it.
30
what are the operator interface methods
void open(datasource, args) page next() void close()
31
what is the pseudo code for project
For each record in table Keep certain attr Return record
32
what is the pseudo code for select
For each record in table Test predicate If true Return record
33
what is the pseudo code for join
Simple nested loop For each record Rs in Table S For each Record Rt in Table T If predicate on Rs and Rt is true Return Record(Rs + Rt) Block Nested loop join For each block Bs in Table S For each block Bt in Table T For each Rs in Block Bs For each Rt in block Bt If predicate on Rs and Rt is true Return Rs + Rt
34
how does the System Catalog plays into cost analysis
the system catalog contains statistics about the database that is used to calculate the cost of a query plan.
35
what is a transaction
a transaction is a unit of execution in which data is read and modified. The steps within it appear to the user as indivisible a transaction will execute entirely or not at all. If failure is imminent, changes due to partial execution will be rolled back
36
what does A.C.I.D stand for
atomicity consistency isolation durability
37
atomicity
either all operations of the transaction are applied to the data or none are
38
consistency
execution results in the data staying consistent/valid as relating to integrity constraints on the data
39
isolaiton
transactions don't "appear" to run concurrently. Any other transaction runs "before" or "after" the current transaction Isolation guarantees that the intermediate states of a transaction are not visible to other transactions until the transaction is complete.
40
durability
a completed transaction persists even if there are system failures
41
explain Concurrent Query Execution
Concurrent Query Execution refers to the ability of a database system to execute multiple queries simultaneously Concurrency: In multi-user environments, many transactions may be processed simultaneously. Isolation ensures correctness even when transactions overlap.