DBMS - Query Optimization and Query Processing Flashcards

1
Q

Query Language

A

Dedicated, High level language for accessing and updating the contents of a database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Query Models

A

Closely associated to their underlying data model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

SELECT Statement based on what model

A

Relational model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Query Languages are either

A

Declarative or Procedural (RA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

If a query is specified in text

A

DBMS transforms it to a number of graphs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why are declarative queries good

A
  • Design the meaning of a query rather than implementation details
  • Shift job of finding a better implementation to runtime
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Disadvantage of Declarative queries

A

DBMS needs to dedicate more resources to compile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Declarative constructs must be converted to

A

Procedural constructs using RA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Are there any declarative queries that cannot be converted to procedural?

A

SELECT and GROUP BY for example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Canonical Translation from query to RA

A
  1. Run a sequence of products through all tables listed in the FROM clause
  2. Sift result with the WHERE clause conjuncts
  3. Project the output
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Can the canonical translation be proven correctly

A

Yes and easily

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Access Plan

A

Also referred to instead of the canonical translation and any RA expression in Query Processing

Access Plans are procedural and follow a data flow mechanism

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Query Optimizer’s Role

A

To manipulate access plans into better plans. Better is not neccesarily the best since QO is np hard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the output of a query optimizer called

A

Query Execution Plan

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

DBMS components of QO and QP

A

Query Language interface, buffer manager (RAM buffer), Storage Manager, File Structures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

QO Environment of Interest

A
  • One storage manager
  • If multicore, parallelism is provided by other layers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Compilation of a non procedural query have some added procedures, ex:

A
  • Typical scanning, parsing, type checking
  • QP and QO converts query into detailed representation (query graph) and choosing execution plan that safely and efficiently implements data access.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

QO Basic Plan

A

In Non-Trivial queries, there exists a number of possible solutions. For general queries it is an np problem. Heuristics and statistics may be used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

When a solution requires extensive work

A

base a quick solution on heuristics to offer an approximate solution that effectively has limited optimality and completeness.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Search space is

A

all the possible permutations of a query. Huge for most general queries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Algebraic select can be implemented with

A

serial scans, B Tree, hash, index…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How to decide which options to prune in search space

A

Use statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Why do we use heuristics and statistics

A

To evaluate potential of access plans to be selected as the optimal execution plan. QO must search and generate access plans, estimate their cost of parts and whole plan.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

No optimizer really optimizes

A

SQL QO can give good query execution plans, but not guaranteed to be the best.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Output of QO

A

passed to query processor to execute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Normalization

A

Assume the input query is restricted.
- Does Lexical and syntax analysis, grammar and type checks
- If good query is converted to normal form (PRENEX)
- Query can be transformed to CNF or DNF

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

CNF

A

Prefers AND

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

DNF

A

Prefers ORs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

PRENEX

A

A query from where all quantifiers precede a quantifier free qualification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Simplification

A

One can use different queries that will return the same result quicker for simplification. Use boolean rules.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Semantic Analysis

A

Refute incorrect queries (ex. contradictions)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

An example of wrong semantics

A

Forgetting to use JOIN in a multi table query

33
Q

How to generate canonical relational algebra expression

A

Each algebra construct shown by a relational algebraic tree. Data flow directed from leaves to root

34
Q

RAT

A

A relational algebra tree describes a query where a leaf is a base relation, an internal node represents an intermediate relation obtained by applying a relational operation, and root represents result of query.

35
Q

RAT restructuring

A

Can manipulate RATs to get an easier structure to compute an efficient access plan for. Uses algebraic restructuring rules.
Although the expression is semantically identical, the cost may be different.

36
Q

Algebraic restructuring rules

A

Based on the algebraic properties of the operands; commutativity + associativity.

37
Q

Two common meta transformations on RAT restructuring

A

Pushing Down / Pushing Up the tree

38
Q

QO aided with costs

A

Choose processing strategy that aligns with given cost function.

39
Q

Cost function

A

Uses details of physical DB and statistics about database.
- Number of basic DB ops are directly supported
- A query execution strategy dictates access to base relations, choice of algorithms for basic ops and ordering of ops.

40
Q

Process Trees

A

Can help model execution strategies. Tree based modeling representations with base tables as leaves and basic operation holding the internal nodes.

PT helps us quantify cost of intermediate relations from basic ops and potentially exclude some.

41
Q

How many graphs are created for each query in a Processing Tree

A

2-3 graphs

42
Q

What is a variant of a process tree

A

JOIN processing tree.

43
Q

Selectivity

A

Defined as the ratio of number of records that satisfy the condition to total number of records in the file relation (between 0 and 1)

44
Q

Estimates of selectivity

A

Kept in DBMS catalog and used by optimizer

FOR PK: S1 - 1/[table tuples]
FOR Secondary Key: s1 = 1/i

45
Q

Data distribution if non-PK values

A
  • Uniform is ideal
  • If not, can have serious repercussions
  • In last decade, DBMS keep statistics through histogram of attribute data’s dist.
46
Q

Selectivity cond1 and cond2

A

s1(c1) * s1(c2)

47
Q

Cost Estimation

A

Traverse PT and sum each operation’s CPU and IO cost. Cost of intermediate results based on statistical info regarding the physical relations and formulas that predict the cardinality of the result of each operation type

48
Q

Basic Statistics in cost estimation

A
  • Domain cardinality (each attribute)
  • Num. of distinct values present for each attribute
  • The min and max values of each numerical attribute
  • Cardinality of each relation
49
Q

Selectivity factor

A

Proportion of tuples in a DB that satisfy a given condition. Difficult to predict. Most optimal plans are insensitive to the inaccuracy of the join selectiveness.

50
Q

System R approach

A

Exhaustive search approach performs static optimisation based on statistical info.

51
Q

Algorithm Factors of System R

A
  • Costs CPU, IO ops and existing paths
  • Restricts push selection down heuristic
  • Considers the “interesting ordering” of the result tuple
52
Q

System R; Choosing best candidate

A
  • Ignore PT with cartesian product
  • Select PT that have the cheapest join
  • Select and join at worst are lumped together
53
Q

Cost of Formula (system R)

A

IOs + Write*instruction

54
Q

Algorithm of System R

A
  • Cost effective
  • Complex
55
Q

JOINS

A

associative and commutative. Prefer left deep processing trees.

56
Q

Sorting is misused

A

QO might choose an ordered data placement processing tree knowing that eventual operations will benefit.

57
Q

Operators Implementation (Relational)

A

Not 1-1 but useful and more common combinations of RA operands.

58
Q

Do access method implementations exist for constructs with no algebraic equivalent

A

Yes ex. aggregate queries

59
Q

Costing Operators

A
  • File scans + Index Scans (Block Based)
    Complexity defined wrt. relational cardinality. Ignore indexing, clustering and merging for now. Basic Binary and Unary algebraic operands.
60
Q

Serial File Scan of Table

A

Read each file block in main memory and extract every tuple. Check each tuple against comparison condition.

61
Q

Binary search over sequential file

A

use the binary chop if searching attribute matches the binary tree ordering attribute.

62
Q

Primary Key Index search over serial file

A

Index to pinpoint a single record if searching attribute matches primary index ordering attribute. Sequential scan for all valid records.

63
Q

Cluster based retrieval of a number of records

A

If search attribute matches the clustering key, then retrieve all blocks with same value

64
Q

What do we do to select conjunctive queries

A

If an indexed approach exists for each conjunct then go for it and compare all conjuncts results. Otherwise we need to use a full table scan.

65
Q

Nested (Inner-Outer) Loop Join

A

For each tuple in 1st table, serially access each tuple in 2nd table and form joined tuple if they both match their common attribute values.

COST: T1 + (T1*T2) block IO ops

66
Q

Sort-Merge Join

A

First sort each table on their respective matching attributes, then merge the two sorted tables by matching the values from both tables.

COST: (2T1logT1)+(2T2logT2)+T1+T2

67
Q

Hash Join

A

First phase entails hashing the first and second tables on the joining attribute. When inserting second table keys, check for key matching with first table entries

68
Q

Semi-Join

A

Replaces the join of two relations by the join of their semi-joins, same complexity as selections

69
Q

Join with an Index

A

Indexes on the join attributes of the two relations.

70
Q

Projects

A
  • If duplication elimination not needed, then projection is straightforward and expanse is proportional to the cardinality of a relation
  • If duplication elim. is needed then weed out duplicates with sorting or hashing algorithms
71
Q

Set Operands

A

Like, Cartesian Product, Union, Set Diff, Intersection. Expensive to implement as extensive sorting/hashing is needed

72
Q

External Sorts

A

Heavily used in QP. Occurs in GROUP BY, ORDER BY, JOIN…

73
Q

Popular External Sort

A

Sort-Merge algorithm, depends on RAM buffers which hold an unpacked sort block

74
Q

Costs of external sorting a file of blocks with buffer space nB is

A
  • Sorting phase and sort internally - nR=roundup(b/nB)
  • Merging phases goes over the sorted runs and merged in a number of passes. dM is the number of sorted files that can be merged in one pass
  • Total costs (2b)+(2b*(logdM(nR))
75
Q

Aggregate

A

If an aggregate is needed on an attribute that is indexed then if index is ordered and dense we can pick end points for min+max.

76
Q

Group By

A

We can use both sorting and hashing

77
Q

Observations

A
  • First RDBMS had big QP efficiency problems
  • QP goal is to reduce time taken for main memory computation and IO ops for non-procedural queries.
  • Good, up to date DB stats help QO a lot
  • QO processes involving general queries with joins, unions and aggregate functions are much harder to tackle
78
Q

Troublesome queries, How to analyze statement’s execution

A
  • Are table’s indexes being used where they should be? If not then WHERE clause may be wrong
  • Are indexes being used when they should not be? Use FULL table scan hint.
  • What is cost of SQL Statement? See the value given in the position column of the first row of the explained SQL construct in the “explain plan”
  • Amount of overhead from SELECT or UPDATE.
  • Is statement using parallel architecture.