QueryProcessing Flashcards
What is query processing?
The range of activities involved in extracting data from a database, including translation of high-level database language queries into expressions usable at the physical file system level
What are the three main steps in query processing?
- Parsing and translation
- Optimization
- Evaluation
What is a query-execution plan?
A sequence of primitive operations that can be used to evaluate a query. Each operation is annotated with instructions specifying how to evaluate it
What is typically the dominant cost factor in query evaluation for large databases?
I/O cost (disk access) dominates other costs like CPU time and communication
How is the cost of an operation that transfers b blocks and performs S random I/O accesses calculated?
b * tT + S * tS seconds where tT is time to transfer a block and tS is block-access time
Why are block writes typically more expensive than block reads on magnetic disks?
Block writes are about twice as expensive because disk systems read sectors back after writing to verify the write was successful
What percentage of B+-tree nodes are typically non-leaf nodes?
About 1% of nodes are non-leaf nodes
What is the difference between materialized evaluation and pipelined evaluation?
Materialized evaluation creates and stores temporary results between operations while pipelined evaluation passes results directly from one operation to the next without storing intermediate results
What are the two types of pipelining?
- Demand-driven pipeline: operations wait for requests to produce tuples
- Producer-driven pipeline: operations eagerly generate tuples without waiting for requests
What is a blocking operation in query processing?
An operation that cannot output any results until all input tuples have been examined (e.g. sorting)
What are the three main algorithms for computing joins?
- Nested-Loop Join
- Merge Join
- Hash Join
What is the main advantage of Block Nested-Loop Join over basic Nested-Loop Join?
In Block Nested-Loop Join each block of the inner relation is read only once for each block of the outer relation instead of once for each tuple
When can Indexed Nested-Loop Join be used?
When computing equi-join or natural join and there exists an index on the join attribute
What are the two main phases of external sort-merge?
- Creation of sorted runs
- Merging of runs
What is a bitmap index scan in PostgreSQL?
An algorithm that first creates a bitmap marking blocks containing matching records then only fetches and scans those specific blocks avoiding reading unnecessary blocks
What are the two main metrics used to estimate query costs in disk-based systems?
- Number of blocks transferred from storage
- Number of random I/O accesses