QueryProcessing Flashcards by Luciano Imbimbo

What is query processing?

The range of activities involved in extracting data from a database, including translation of high-level database language queries into expressions usable at the physical file system level

How well did you know this?

Not at all

Perfectly

What are the three main steps in query processing?

Parsing and translation
Optimization
Evaluation

How well did you know this?

Not at all

Perfectly

What is a query-execution plan?

A sequence of primitive operations that can be used to evaluate a query. Each operation is annotated with instructions specifying how to evaluate it

How well did you know this?

Not at all

Perfectly

What is typically the dominant cost factor in query evaluation for large databases?

I/O cost (disk access) dominates other costs like CPU time and communication

How well did you know this?

Not at all

Perfectly

How is the cost of an operation that transfers b blocks and performs S random I/O accesses calculated?

b * tT + S * tS seconds where tT is time to transfer a block and tS is block-access time

How well did you know this?

Not at all

Perfectly

Why are block writes typically more expensive than block reads on magnetic disks?

Block writes are about twice as expensive because disk systems read sectors back after writing to verify the write was successful

How well did you know this?

Not at all

Perfectly

What percentage of B+-tree nodes are typically non-leaf nodes?

About 1% of nodes are non-leaf nodes

How well did you know this?

Not at all

Perfectly

What is the difference between materialized evaluation and pipelined evaluation?

Materialized evaluation creates and stores temporary results between operations while pipelined evaluation passes results directly from one operation to the next without storing intermediate results

How well did you know this?

Not at all

Perfectly

What are the two types of pipelining?

Demand-driven pipeline: operations wait for requests to produce tuples
Producer-driven pipeline: operations eagerly generate tuples without waiting for requests

How well did you know this?

Not at all

Perfectly

What is a blocking operation in query processing?

An operation that cannot output any results until all input tuples have been examined (e.g. sorting)

How well did you know this?

Not at all

Perfectly

What are the three main algorithms for computing joins?

Nested-Loop Join
Merge Join
Hash Join

How well did you know this?

Not at all

Perfectly

What is the main advantage of Block Nested-Loop Join over basic Nested-Loop Join?

In Block Nested-Loop Join each block of the inner relation is read only once for each block of the outer relation instead of once for each tuple

How well did you know this?

Not at all

Perfectly

When can Indexed Nested-Loop Join be used?

When computing equi-join or natural join and there exists an index on the join attribute

How well did you know this?

Not at all

Perfectly

What are the two main phases of external sort-merge?

Creation of sorted runs
Merging of runs

How well did you know this?

Not at all

Perfectly

What is a bitmap index scan in PostgreSQL?

An algorithm that first creates a bitmap marking blocks containing matching records then only fetches and scans those specific blocks avoiding reading unnecessary blocks

How well did you know this?

Not at all

Perfectly

What are the two main metrics used to estimate query costs in disk-based systems?

Number of blocks transferred from storage
Number of random I/O accesses

How well did you know this?

Not at all

Perfectly

How does a hybrid merge-join algorithm work?

Study These Flashcards

It merges a sorted relation with the leaf entries of a secondary B+-tree index producing a result file containing tuples from the sorted relation and addresses for tuples of the unsorted relation

What is the basic idea behind hash join?

Study These Flashcards

Partition tuples of both relations using a hash function on the join attributes so that matching tuples will be in the same partition and only tuples in corresponding partitions need to be compared

What are the steps in implementing duplicate elimination using sorting?

Study These Flashcards

Sort the relation
Scan the sorted result
Remove all but one copy of identical tuples that appear adjacent to each other

What is partial aggregation optimization?

Study These Flashcards

Combining tuples in the same group during run generation and intermediate merges by computing partial aggregate values. Only works for commutative and associative operations (count min max sum)

What are the three functions provided by an iterator in demand-driven pipelining?

Study These Flashcards

open()
next()
close()

What happens in the parsing and translation step of query processing?

Study These Flashcards

The system translates the query into its internal form and performs syntax checking and verification of relations

What is query optimization?

Study These Flashcards

The task of constructing a query-evaluation plan that minimizes the cost of query evaluation

What special consideration is needed for joins with spatial data?

Study These Flashcards

Spatial joins require special comparison operators for containment and overlap and usually use indexed nested loops join with spatial indices since no simple sort order exists

What is the purpose of double buffering in materialized evaluation?

It allows the algorithm to execute more quickly by performing CPU activity in parallel with I/O activity

In hash join what is the build input and probe input?

The relation s is the build input (used to build in-memory hash index) and r is the probe input

How is the hash join fudge factor typically calculated?

f is typically around 1.2 used to account for some skewness of the join values

What is the key advantage of demand-driven pipelining over producer-driven pipelining?

It is easier to implement though producer-driven pipelining can be more efficient on modern CPUs

What happens in the build-probe phase of hash join?

The algorithm loads si into memory builds an in-memory hash index then reads ri tuples and locates matching tuples using the index

What is an evaluation primitive?

A relational-algebra operation annotated with instructions specifying how to evaluate it

When is producer-driven pipelining particularly useful?

In parallel processing systems

What are the two main components of disk access cost?

1. Block transfer time (tT) 2. Block access/seek time (tS)

How can joins with conjunctive conditions be implemented?

Either using hash join with multiple hash functions or by computing one simple join first and then filtering results with remaining conditions

How can set difference be implemented efficiently?

By first sorting both relations then scanning through each of the sorted relations once to produce the result

What is the main advantage of implementing aggregation with partial aggregation?

It reduces the amount of data that needs to be stored and processed in subsequent steps

What determines the number of runs in external sort-merge?

The size of the relation (br) divided by the memory available (M): ⌈br/M⌉

What is the main limitation of implementing full outer join using nested-loop join?

It is difficult to extend the nested-loop join algorithm to compute full outer join

What happens if B+-tree internal nodes don't fit in memory?

A traversal of the index will require multiple I/O operations significantly increasing access cost

How can disjunctive join conditions be implemented?

By computing the union of the records from individual joins for each condition

What is the best case for nested-loop join cost?

When one relation fits entirely in memory requiring only br + bs block transfers and two seeks

How does sorted merge join work?

Once relations are sorted tuples with same join attribute values are consecutive allowing a single pass through both files

QueryProcessing Flashcards

(41 cards)