Exam 2 Flashcards
Database file index
similar in theme to an index in a textbook
data structure allowing fast lookups for data
Types of indexes
Ordered Indexes
Hash Indexes
ordered indexes
values are stored sequentially within the file
Hash indexes
values uniformly distributed across a range of buckets (blocks) by a hash function
Why is the word values used instead of records when talking about ordered and hash indexes
because it is the values which determine the location of the records
What is an index
metadata that helps access record data
stored in blocks. The blocks contain pointers into other blocks ( which contain records)
what are the 2 orderings of indexes
primary and secondary
primary index
for a sequentially ordered file, the index whose search key specifies record order
also called clustering index
the search index is usually the primary key
secondary index
an index whose search key specifies an order different than the file order
records are not sorted by the search key
also called non-clustering index
what are the 2 Formats for indexes
Dense and sparse
Dense indexes
the index points to sequential records stored within a file.
one index entry per search key
Sparse index
contains index records for only some of the search-key values
only applicable when records are sequentially ordered by the search key (clustering indexes)
pros and cons for dense and sparse indexes
dense
-entry per key value
-faster lookup time
-more space required
-more maintenance
sparse
-fewer entries
-slower lookup time
-less space required
-less maintenance
good tradeoff is one sparse index entry per block
B+ Tree
Multilevel index structure
A balanced, tree-based data structure
High fanout
automatically reorganizes itself on inserts/deletions
reorganizations of the entire file is not needed to maintain performance
structure of B+ Trees
n pointers
n-1 search keys
search keys are organized such that
if i < j then Ki < Kj
internal nodes - an internal level of nodes provide a sparse index on the level of nodes below it
leaf nodes - leaf nodes provide a dense index on the file
what are the rules for internal nodes in a B+ Tree
may hold up to n pointers
must hold at least n/2 pointers (round up)
what are the rules for leaf nodes in a B+ Tree
may hold up to n-1 pointers
must hold at least (n-1) / 2 values (round up)
what happens when a node is at capacity when trying to insert
split up the node
what happens when a leaf node falls below the (n-1)/2 (round up) values
we coalesce nodes by either
merging the node with a sibling node
redistribute the entries between the nodes to ensure that each is half full
cost of select - full file scan - no index, a heap
cost = br * tT + tS
br denoting all block in relation r
cost of select - linear search on a sorted file
cost = br/2 * tT + tS
read half the blocks on average (br/2)
cost of lookups on B+ Tree indexes
hi + 1
hi defines the tree height
+1 for the data block beyond the leaf node
cost of select on a B+ Tree ordered by search key
cost = (hi + 1) * (tT + tS)
what are the steps of query processing
1 parsing & translation
2 optimization
3 evaluation
in what order are sql queries expressed in relational algebra
1 cross product of relations
2 selection
3 projection
what is the generic query plan cost
cost = b * tT + s * tS
tT time to transfer a block from disk
tS time to perform a seek (access time)
b number of blocks to transfer from disk
s number for seeks that must occur
what are the 2 methods of query evaluation
materialized and pipelined
materialized query eval
In materialized evaluation (materialization), each operator reads data page-at-a-timeand then writes each producedpage to a temporary file.
The next operator can read from thatfile (intermediate data) and produce its own materialized output fileand so on.
Pipelined query eval
In pipelined evaluation, each operator will read a single block at a time and deliver its result directly to the operator above it.
what are the operator interface methods
void open(datasource, args)
page next()
void close()
what is the pseudo code for project
For each record in table
Keep certain attr
Return record
what is the pseudo code for select
For each record in table
Test predicate
If true
Return record
what is the pseudo code for join
Simple nested loop
For each record Rs in Table S
For each Record Rt in Table T
If predicate on Rs and Rt is true
Return Record(Rs + Rt)
Block Nested loop join
For each block Bs in Table S
For each block Bt in Table T
For each Rs in Block Bs
For each Rt in block Bt
If predicate on Rs and Rt is true
Return Rs + Rt
how does the System Catalog plays into cost analysis
the system catalog contains statistics about the database that is used to calculate the cost of a query plan.
what is a transaction
a transaction is a unit of execution in which data is read and modified. The steps within it appear to the user as indivisible
a transaction will execute entirely or not at all. If failure is imminent, changes due to partial execution will be rolled back
what does A.C.I.D stand for
atomicity
consistency
isolation
durability
atomicity
either all operations of the transaction are applied to the data or none are
consistency
execution results in the data staying consistent/valid as relating to integrity constraints on the data
isolaiton
transactions don’t “appear” to run concurrently. Any other transaction runs “before” or “after” the current transaction
Isolation guarantees that the intermediate states of a transaction are not visible to other transactions until the transaction is complete.
durability
a completed transaction persists even if there are system failures
explain Concurrent Query Execution
Concurrent Query Execution refers to the ability of a database system to execute multiple queries simultaneously
Concurrency: In multi-user environments, many transactions may be processed simultaneously. Isolation ensures correctness even when transactions overlap.