Query Processing (2019) Flashcards
Query Processing:
Basic Steps
-
Query
- User inputs a query
-
Parser and Translator
- Translates query from human readable form to something DBMS understands, such as Relational Algebra Expression
-
Optimizer
- Finds optimized equivalent to RA expression
- Creates Execution Plan
-
Evaluation Engine
- Runs the Execution Plan to perform the query
-
Query Output
- Results returned to user
Query Processing:
Diagram of the Basic Steps
*See Diagram
Query Processing Steps:
Query
User inputs a query in some
human readable language,
such as SQL.
This query is passed to the Parser and Translator
Query Processing Steps:
Parser and Translator
- Recieves Query in some human readable form, such as SQL.
- Performs Syntax Checks
- Translates to a language that the DBMS understands internally(RA)
- Generates Relational Algebra Expression
Query Processing Steps:
Optimizer
- Recieves Relational Algebra Expression from Parser/Translator
- Finds an equivalent statement that is more optimal
- Same results, with less time and resources used
- From this optimal statement, generates an Execution Plan to be run by the Evaluation Engine
Query Processing Steps:
Evaluation Engine
- Receives an Execution Plan from the Optimizer
- Performs it on the Database, obtaining results to be output to the user
- Returns Query Output to the User
Cost of a Query
- Typical Measure is Total Elapsed Time
- Factors:
- Disk Access
- CPU
- Network Communication
- Disk access is usually the “bottleneck”
Disk Access:
How Relations are Stored on Disk
- DBMS stores its data in Files
- Files are broken into Blocks
- The number of tuples per block varies
- B(R) is the number of blocks used for relation R
- T(R) is the number of tuples used in relation R
Three Factors of
Disk Access cost
- Number of Seeks
- Number of Block Reads
- Number of Block Writes
Disk Access:
Why are Block Writes
more expensive than
Block Reads?
A block must be read again after writing,
so the actual time is:
(Time to Write) + (Time to Read)
Seek Time:
Basic Idea
The time spent
spinning the disk
to get to the specified data area
Seek Time:
Two types of Indices on Disk
- Clustered Index
- Data is stored in sorted order
- Data is very close together on disk
- Less Seek Time -> Faster
- Unclustered Index
- Data is stored farther apart
- More Seek Time -> Slower
Types of
External Memory Joins
(Extended) (9)
- Nested Join
- Nested Loop Join
- Block-Nest Loop Join
- Index Nested Loop Join
- Improved Nest Loop Join
- Reservoir Sampling
- Reservoir Join Sampling
- Hash Join
- Partitioned Hash Join
Three Types of
Join Algorithms (External Memory)
Nested Loop Joins
Hash Joins
Sort-Merge Joins
External Memory Join Algorithms:
Nested Loop Joins
General Cost
Nested Loop Joins
cost = O( |R| |S| )