DBMS - Query Optimization and Query Processing Flashcards

Question

Output of QO

Answer 1

passed to query processor to execute

Answer 2

Assume the input query is restricted. - Does Lexical and syntax analysis, grammar and type checks - If good query is converted to normal form (PRENEX) - Query can be transformed to CNF or DNF

Answer 3

Prefers AND

Answer 4

Prefers ORs

Answer 5

A query from where all quantifiers precede a quantifier free qualification.

Answer 6

One can use different queries that will return the same result quicker for simplification. Use boolean rules.

Answer 7

Refute incorrect queries (ex. contradictions)

Answer 8

Forgetting to use JOIN in a multi table query

Answer 9

Each algebra construct shown by a relational algebraic tree. Data flow directed from leaves to root

Answer 10

A relational algebra tree describes a query where a leaf is a base relation, an internal node represents an intermediate relation obtained by applying a relational operation, and root represents result of query.

Answer 11

Can manipulate RATs to get an easier structure to compute an efficient access plan for. Uses algebraic restructuring rules. Although the expression is semantically identical, the cost may be different.

Answer 12

Based on the algebraic properties of the operands; commutativity + associativity.

Answer 13

Pushing Down / Pushing Up the tree

Answer 14

Choose processing strategy that aligns with given cost function.

Answer 15

Uses details of physical DB and statistics about database. - Number of basic DB ops are directly supported - A query execution strategy dictates access to base relations, choice of algorithms for basic ops and ordering of ops.

Answer 16

Can help model execution strategies. Tree based modeling representations with base tables as leaves and basic operation holding the internal nodes. PT helps us quantify cost of intermediate relations from basic ops and potentially exclude some.

Answer 17

2-3 graphs

Answer 18

JOIN processing tree.

Answer 19

Defined as the ratio of number of records that satisfy the condition to total number of records in the file relation (between 0 and 1)

Answer 20

Kept in DBMS catalog and used by optimizer FOR PK: S1 - 1/[table tuples] FOR Secondary Key: s1 = 1/i

Answer 21

- Uniform is ideal - If not, can have serious repercussions - In last decade, DBMS keep statistics through histogram of attribute data's dist.

Answer 22

s1(c1) * s1(c2)

Answer 23

Traverse PT and sum each operation's CPU and IO cost. Cost of intermediate results based on statistical info regarding the physical relations and formulas that predict the cardinality of the result of each operation type

Answer 24

- Domain cardinality (each attribute) - Num. of distinct values present for each attribute - The min and max values of each numerical attribute - Cardinality of each relation

Answer 25

Proportion of tuples in a DB that satisfy a given condition. Difficult to predict. Most optimal plans are insensitive to the inaccuracy of the join selectiveness.

Answer 26

Exhaustive search approach performs static optimisation based on statistical info.

Answer 27

- Costs CPU, IO ops and existing paths - Restricts push selection down heuristic - Considers the "interesting ordering" of the result tuple

Answer 28

- Ignore PT with cartesian product - Select PT that have the cheapest join - Select and join at worst are lumped together

Answer 29

IOs + Write*instruction

Answer 30

- Cost effective - Complex

Answer 31

associative and commutative. Prefer left deep processing trees.

Answer 32

QO might choose an ordered data placement processing tree knowing that eventual operations will benefit.

Answer 33

Not 1-1 but useful and more common combinations of RA operands.

Answer 34

Yes ex. aggregate queries

Answer 35

- File scans + Index Scans (Block Based) Complexity defined wrt. relational cardinality. Ignore indexing, clustering and merging for now. Basic Binary and Unary algebraic operands.

Answer 36

Read each file block in main memory and extract every tuple. Check each tuple against comparison condition.

Answer 37

use the binary chop if searching attribute matches the binary tree ordering attribute.

Answer 38

Index to pinpoint a single record if searching attribute matches primary index ordering attribute. Sequential scan for all valid records.

Answer 39

If search attribute matches the clustering key, then retrieve all blocks with same value

Answer 40

If an indexed approach exists for each conjunct then go for it and compare all conjuncts results. Otherwise we need to use a full table scan.

Answer 41

For each tuple in 1st table, serially access each tuple in 2nd table and form joined tuple if they both match their common attribute values. COST: T1 + (T1*T2) block IO ops

Answer 42

First sort each table on their respective matching attributes, then merge the two sorted tables by matching the values from both tables. COST: (2*T1*logT1)+(2*T2*logT2)+T1+T2

Answer 43

First phase entails hashing the first and second tables on the joining attribute. When inserting second table keys, check for key matching with first table entries

Answer 44

Replaces the join of two relations by the join of their semi-joins, same complexity as selections

Answer 45

Indexes on the join attributes of the two relations.

Answer 46

- If duplication elimination not needed, then projection is straightforward and expanse is proportional to the cardinality of a relation - If duplication elim. is needed then weed out duplicates with sorting or hashing algorithms

Answer 47

Like, Cartesian Product, Union, Set Diff, Intersection. Expensive to implement as extensive sorting/hashing is needed

Answer 48

Heavily used in QP. Occurs in GROUP BY, ORDER BY, JOIN...

Answer 49

Sort-Merge algorithm, depends on RAM buffers which hold an unpacked sort block

Answer 50

- Sorting phase and sort internally - nR=roundup(b/nB) - Merging phases goes over the sorted runs and merged in a number of passes. dM is the number of sorted files that can be merged in one pass - Total costs (2*b)+(2*b*(logdM(nR))

Answer 51

If an aggregate is needed on an attribute that is indexed then if index is ordered and dense we can pick end points for min+max.

Answer 52

We can use both sorting and hashing

Answer 53

- First RDBMS had big QP efficiency problems - QP goal is to reduce time taken for main memory computation and IO ops for non-procedural queries. - Good, up to date DB stats help QO a lot - QO processes involving general queries with joins, unions and aggregate functions are much harder to tackle

Answer 54

- Are table's indexes being used where they should be? If not then WHERE clause may be wrong - Are indexes being used when they should not be? Use FULL table scan hint. - What is cost of SQL Statement? See the value given in the position column of the first row of the explained SQL construct in the "explain plan" - Amount of overhead from SELECT or UPDATE. - Is statement using parallel architecture.

DBMS - Query Optimization and Query Processing Flashcards

(78 cards)