Exam 2 Flashcards by Carson Krogulski

Database file index

similar in theme to an index in a textbook

data structure allowing fast lookups for data

How well did you know this?

Not at all

Perfectly

Types of indexes

Ordered Indexes

Hash Indexes

How well did you know this?

Not at all

Perfectly

ordered indexes

values are stored sequentially within the file

How well did you know this?

Not at all

Perfectly

Hash indexes

values uniformly distributed across a range of buckets (blocks) by a hash function

How well did you know this?

Not at all

Perfectly

Why is the word values used instead of records when talking about ordered and hash indexes

because it is the values which determine the location of the records

How well did you know this?

Not at all

Perfectly

What is an index

metadata that helps access record data

stored in blocks. The blocks contain pointers into other blocks ( which contain records)

How well did you know this?

Not at all

Perfectly

what are the 2 orderings of indexes

primary and secondary

How well did you know this?

Not at all

Perfectly

primary index

for a sequentially ordered file, the index whose search key specifies record order

also called clustering index
the search index is usually the primary key

How well did you know this?

Not at all

Perfectly

secondary index

an index whose search key specifies an order different than the file order

records are not sorted by the search key

also called non-clustering index

How well did you know this?

Not at all

Perfectly

what are the 2 Formats for indexes

Dense and sparse

How well did you know this?

Not at all

Perfectly

Dense indexes

the index points to sequential records stored within a file.

one index entry per search key

How well did you know this?

Not at all

Perfectly

Sparse index

contains index records for only some of the search-key values

only applicable when records are sequentially ordered by the search key (clustering indexes)

How well did you know this?

Not at all

Perfectly

pros and cons for dense and sparse indexes

dense
-entry per key value
-faster lookup time
-more space required
-more maintenance

sparse
-fewer entries
-slower lookup time
-less space required
-less maintenance

good tradeoff is one sparse index entry per block

How well did you know this?

Not at all

Perfectly

B+ Tree

Multilevel index structure

A balanced, tree-based data structure
High fanout

automatically reorganizes itself on inserts/deletions

reorganizations of the entire file is not needed to maintain performance

How well did you know this?

Not at all

Perfectly

structure of B+ Trees

n pointers
n-1 search keys

search keys are organized such that
if i < j then Ki < Kj

internal nodes - an internal level of nodes provide a sparse index on the level of nodes below it

leaf nodes - leaf nodes provide a dense index on the file

How well did you know this?

Not at all

Perfectly

what are the rules for internal nodes in a B+ Tree

may hold up to n pointers

must hold at least n/2 pointers (round up)

How well did you know this?

Not at all

Perfectly

what are the rules for leaf nodes in a B+ Tree

Study These Flashcards

may hold up to n-1 pointers

must hold at least (n-1) / 2 values (round up)

what happens when a node is at capacity when trying to insert

Study These Flashcards

split up the node

what happens when a leaf node falls below the (n-1)/2 (round up) values

Study These Flashcards

we coalesce nodes by either

merging the node with a sibling node

redistribute the entries between the nodes to ensure that each is half full

cost of select - full file scan - no index, a heap

Study These Flashcards

cost = br * tT + tS

br denoting all block in relation r

cost of select - linear search on a sorted file

Study These Flashcards

cost = br/2 * tT + tS

read half the blocks on average (br/2)

cost of lookups on B+ Tree indexes

Study These Flashcards

hi + 1

hi defines the tree height
+1 for the data block beyond the leaf node

cost of select on a B+ Tree ordered by search key

Study These Flashcards

cost = (hi + 1) * (tT + tS)

what are the steps of query processing

Study These Flashcards

1 parsing & translation
2 optimization
3 evaluation

in what order are sql queries expressed in relational algebra

1 cross product of relations 2 selection 3 projection

what is the generic query plan cost

cost = b * tT + s * tS tT time to transfer a block from disk tS time to perform a seek (access time) b number of blocks to transfer from disk s number for seeks that must occur

what are the 2 methods of query evaluation

materialized and pipelined

materialized query eval

In materialized evaluation (materialization), each operator reads data page-at-a-timeand then writes each producedpage to a temporary file. The next operator can read from thatfile (intermediate data) and produce its own materialized output fileand so on.

Pipelined query eval

In pipelined evaluation, each operator will read a single block at a time and deliver its result directly to the operator above it.

what are the operator interface methods

void open(datasource, args) page next() void close()

what is the pseudo code for project

For each record in table Keep certain attr Return record

what is the pseudo code for select

For each record in table Test predicate If true Return record

what is the pseudo code for join

Simple nested loop For each record Rs in Table S For each Record Rt in Table T If predicate on Rs and Rt is true Return Record(Rs + Rt) Block Nested loop join For each block Bs in Table S For each block Bt in Table T For each Rs in Block Bs For each Rt in block Bt If predicate on Rs and Rt is true Return Rs + Rt

how does the System Catalog plays into cost analysis

the system catalog contains statistics about the database that is used to calculate the cost of a query plan.

what is a transaction

a transaction is a unit of execution in which data is read and modified. The steps within it appear to the user as indivisible a transaction will execute entirely or not at all. If failure is imminent, changes due to partial execution will be rolled back

what does A.C.I.D stand for

atomicity consistency isolation durability

atomicity

either all operations of the transaction are applied to the data or none are

consistency

execution results in the data staying consistent/valid as relating to integrity constraints on the data

isolaiton

transactions don't "appear" to run concurrently. Any other transaction runs "before" or "after" the current transaction Isolation guarantees that the intermediate states of a transaction are not visible to other transactions until the transaction is complete.

durability

a completed transaction persists even if there are system failures

explain Concurrent Query Execution

Concurrent Query Execution refers to the ability of a database system to execute multiple queries simultaneously Concurrency: In multi-user environments, many transactions may be processed simultaneously. Isolation ensures correctness even when transactions overlap.

Exam 2 Flashcards

(41 cards)