Query Processing (2019) Flashcards

1
Q

Query Processing:

Basic Steps

A
  1. Query
    • User inputs a query
  2. Parser and Translator
    • Translates query from human readable form to something DBMS understands, such as Relational Algebra Expression
  3. Optimizer
    • Finds optimized equivalent to RA expression
    • Creates Execution Plan
  4. Evaluation Engine
    • Runs the Execution Plan to perform the query
  5. Query Output
    • Results returned to user
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Query Processing:

Diagram of the Basic Steps

A

*See Diagram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Query Processing Steps:

Query

A

User inputs a query in some

human readable language,

such as SQL.

This query is passed to the Parser and Translator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Query Processing Steps:

Parser and Translator

A
  • Recieves Query in some human readable form, such as SQL.
  • Performs Syntax Checks
  • Translates to a language that the DBMS understands internally(RA)
  • Generates Relational Algebra Expression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Query Processing Steps:

Optimizer

A
  • Recieves Relational Algebra Expression from Parser/Translator
  • Finds an equivalent statement that is more optimal
    • Same results, with less time and resources used
  • From this optimal statement, generates an Execution Plan to be run by the Evaluation Engine
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Query Processing Steps:

Evaluation Engine

A
  • Receives an Execution Plan from the Optimizer
  • Performs it on the Database, obtaining results to be output to the user
  • Returns Query Output to the User
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Cost of a Query

A
  • Typical Measure is Total Elapsed Time
  • Factors:
    • Disk Access
    • CPU
    • Network Communication
  • Disk access is usually the “bottleneck”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Disk Access:

How Relations are Stored on Disk

A
  • DBMS stores its data in Files
  • Files are broken into Blocks
  • The number of tuples per block varies
  • B(R) is the number of blocks used for relation R
  • T(R) is the number of tuples used in relation R
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Three Factors of

Disk Access cost

A
  • Number of Seeks
  • Number of Block Reads
  • Number of Block Writes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Disk Access:

Why are Block Writes

more expensive than

Block Reads?

A

A block must be read again after writing,

so the actual time is:

(Time to Write) + (Time to Read)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Seek Time:

Basic Idea

A

The time spent

spinning the disk

to get to the specified data area

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Seek Time:

Two types of Indices on Disk

A
  • Clustered Index
    • Data is stored in sorted order
    • Data is very close together on disk
    • Less Seek Time -> Faster
  • Unclustered Index
    • Data is stored farther apart
    • More Seek Time -> Slower
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Types of

External Memory Joins

(Extended) (9)

A
  • Nested Join
  • Nested Loop Join
  • Block-Nest Loop Join
  • Index Nested Loop Join
  • Improved Nest Loop Join
  • Reservoir Sampling
  • Reservoir Join Sampling
  • Hash Join
  • Partitioned Hash Join
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Three Types of

Join Algorithms (External Memory)

A

Nested Loop Joins

Hash Joins

Sort-Merge Joins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

External Memory Join Algorithms:

Nested Loop Joins

General Cost

A

Nested Loop Joins

cost = O( |R| |S| )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

External Memory Join Algorithms:

Hash Joins

General Cost

A

Hash Joins

Cost = O( |S| )

if |R| < |S|

*I believe R = read cost, S=search Cost

17
Q

External Memory Join Algorithms:

Sort-Merge Joins

General Cost

A

Sort-Merge Joins

Cost = O( |S| * log(|s|) )

if |R| < |S|

*I believe R = read cost, S=search Cost

18
Q

Query Costs:

External Memory Joins

A
  • Happens when at least one relation does not fit into memory
  • I/O Cost is the dominating factor
  • Call the memory limit M
  • Cost is depended on the Memory Join Algorithm used
  • Internal Memory Joins are Simple:
    • B(s) + B(s)
19
Q

Nested Loop Join:

Overview

A

For each block of a relation R, and for each tuple r in block:

For each block of a relation S, and for each tuple s in block:

Output rs if the join condition evaluates to true

  • R is called the Outer Table
  • S is called the Inner Table

Cost = B(R) +T(R)*B(S)

Memory: 4 Blocks, if double buffering

20
Q

Block-Nest Loop Join:

Overview

A

For each block of R and for each block of S:

For each tuple r of R and s of S:

Output rs if join condition evaluates to true.

R is the outer table, S is the inner table

Cost:

B(R) + B(R)*B(S)

Memory:

4 Blocks (If double buffering)

21
Q

Improved Nest Loop Join

A

Outer table R and inner table S

  • Use available memory, read M-2 blocks from table R
  • Read blocks of S sequentially, join with R in Main Memory

Cost:

B(R) + B(R)*B(S) / (M-2)

Memory:

M Blocks

22
Q

Nested Join

A

Joining Outer table R and inner table S

(Used if B(R) + B(S) < M) (fits in memory)

  • Scan R, sort in Main Memory
  • Scan S, sort in Main Memory
  • Merge R and S

Cost: B(R) + B(S)

23
Q

Index Nested Join

A

R has an index over the Join Attribute

  • Scan S:
    • For each tuple of S, find matching tuples of R

V(R,a) = number of distinct values of a in R

Cost:

  • If data clustered
    • Cost = B(S) + T(S)*B(R) / V(R,a)
  • If data unclustered
    • Cost = B(S) + T(S)*T(R) / V(R,a)
24
Q

Selection:

Two Types of Scans

+ Their Costs

A

Scanning Without Index:

Cost = B(R) + Seek Time

Scanning With Index:

If there is an index over the attribute, use a function F to determine the # of I/O operations

Data Clustered:

Cost = F(R,a)*B(R)

Data Unclustered:

Cost = F(R,a)*T(R)

25
Q

Selection on Indexed Attribute:

I/O Cost Functions for possible

selection conditions

A

Equals : σa=v (R)

F(R,a) = 1 / V(R,a)

Less Than: σa<v> (R)</v>

F(R,a) = (v - min(R,a) ) / (max(R,a) - min(R,a) )

Range: σx<a> (R)</a>

F(R,a) = (v - x ) / (max(R,a) - min(R,a) )

26
Q

Assumption about I/O Cost Function F

for most DBMS Applications

A

Most DBMSs use a B+ Tree,

so for searching on a Clustered Index,

we can use the Height of the B+ tree.

Cost = H + B(R)

27
Q

Sampling:

Overview

A
  • Can be used to save time on large Databases
  • Sample from tables, join based on the sample instead
  • Less accurate, but much faster
  • Two Types:
    • Reservoir Sampling
    • Reservoir Join Sampling
28
Q

Sampling:

Reservoir Sampling

A

W <- 0 start with 0 weight

Initialize reservoir array A[1..r] with dummy values.

While tuple streams in:

  • Get next tuple t with weight w(t)
  • W <- W + w(t) //increase the weight
  • for j=1 to r
    • set A[j] to t with probability w(t)/W
29
Q

Sampling:

Reservoir Join Sampling

A

W <- 0 start with 0 weight

Initialize reservoir array A[1..r] with dummy values.

For each Candidate Network(CN) and every tuple t in CN:

If A has dummy values,

Insert k copies of t into A

Else

W <- W + w(t)

for i = 1 to k:

Insert t into A[i] with probability w(t)/W

Terminate if copy of t is inserted

30
Q

Hash Join

Overview

A
  • Scan R, build buckets in Main Memory
  • Scan S, then Join

Notes:

  • Build the hash table on whichever table is smaller: min( B(R), B(S) )
  • Only works if min( B(R), B(S) ) <= M-2
    • It has to fit inside of main memory

Cost = B(R) + B(S)

31
Q

Partitioned Hash Join

A

Used when a relation cannot fit into memory.

  • Using the first Hash Function, H1 :
    • Hash R into M-1 buckets, store on disk
    • Hash S into M-1 buckets, store on disk
  • Read one partition(bucket) of R into Main Memory
  • Join like normal hash join, using a second, separate hash function, H2

Cost:

3B(R) + 3B(S)

Memory:

Smaller bucket must fit in memory

Bucket size = B(R) / (M-1)