Query processing Flashcards

Question

How can we make querying even faster if we're using an index and the condition specifies two different attributes?

Answer 1

- instead of just using an index for only one of the items then having to go through all of those rows - we can search through two different index tables, each one applying to one of the two specified attributes, then take the intersection of the union of the rows selected from both index tables.

Answer 2

Index tables that feature indexes that apply to multiple attributes

Answer 3

- B+ Trees CREATE INDEX ON students USING btree (programme, year) - Hash Tables CREATE INDEX ON students USING hash (lower(name));

Answer 4

- If the selection we are after specifies the range (and this is one of the most widely used index) - so for instance a range such as having the attributes program = G401 and year < 1

Answer 5

- B+ trees are a specific kind of index known as a multilevel index which means there are indexes over indexes. - In a multilevel index you can have any number of these levels and just branch out more and more. - Each level i index points to i+1 and the final level of index points out to the hard disk with the actual records where the information is stored.

Answer 6

- a leaf is formed of two rows - On the top row we have values and on the bottom row we have their corresponding pointers which point to tuples with that value. - We also have a next leaf pointer - These values, that was also the case for the multilevel index, are sorted increasingly in order.

Answer 7

n = chosen so that a node fits into a single disk block

Answer 8

for every value there must be a pointer and then we must add on the next leaf pointer (which doesn't have a value) - so we do: n can be 42 because 42 * (8+4)

Answer 9

- same as leaves in the fact they have two rows - however instead of the first row being values they are pointers, so the pointer point to pointers (which eventually whittle down to values) - the pointers point to other nodes instead of tuples

Answer 10

From left to right - not every field has to be filled

Answer 11

A node must have at least (n+1)/2 pointers (rounded down) used (we always count the next leaf pointer even if there are no other nodes)

Answer 12

At least two (or more)

Answer 13

(n+1)/2 = 3+1 / 2 = 2

Answer 14

max-1 - so 5 minus 1 = 4

Answer 15

We just follow the values down until we reach the lowest level (which has pointers to tuples) - once we find the value on the lowest level we can output the resulting tuple/tuples

Answer 16

Multiple tuples that all satisfy the condition will be outputted in order (so from left to right, which is smallest to biggest)

Answer 17

- What we do is traverse down the tree normally until we reach the last level - We see that the pointer is not in the tree. - So we return nothing

Answer 18

- O(h*log2n) real running time = O(H*D) - h being the height of the tree - n is number of nodes - running time is to do with disk accesses so this is the height of the tree multiplied by d which is the time for a single disk operation to be carried out.

Answer 19

First we search for that value in the table, starting at the root, we traverse down to the bottom level, we insert it there if there is space in the node - if it's the smallest value in a node, we need to update the pointers in all the nodes above

Answer 20

First we search for that value in the table, starting at the root, we traverse down to the bottom level - since there's no space in the node, we need to split it into two - to do this you take half the nodes from the old node and put them in the new node - however we now need a pointer to this node, so we update all the nodes in the levels above, add the smallest value of the new node to every level above

Answer 21

- We traverse down the tree as normal starting at the root, using a binary search to find the position of the value we want to delete - Then we remove the value. Now we need to run some checks that each node still contains at least two pointers. - if it does you only need to check that you didn't delete the smallest value in a node, if you did then you need to adjust the pointers in the higher nodes

Answer 22

- We traverse down the tree as normal starting at the root, using a binary search to find the position of the value we want to delete - Then we remove the value. We check that each node still contains at least two pointers. - since it doesn't we need to steal a value from an adjacent node if it has more than the minimum amount of pointers required

Answer 23

- We traverse down the tree as normal starting at the root, using a binary search to find the position of the value we want to delete - Then we remove the value. We check that each node still contains at least two pointers. - since it doesn't we need to steal a value from an adjacent node if it has more than the minimum amount of pointers required - then we just update the higher level nodes if necessary

Answer 24

- we remove the value as usual and are left with a node that doesn’t satisfy the leaf condition - In this case there is only one adjacent node and it is not above the minimum pointer amount. - So we pick an adjacent node and merge them together - we then remove the empty node - we then update all the nodes in the high level to remove pointers pointing to the empty node

Answer 25

- O(h*log2n) real running time = O(H*D) - h being the height of the tree - n is number of nodes - running time is to do with disk accesses so this is the height of the tree multiplied by d which is the time for a single disk operation to be carried out.

Answer 26

- that the height is log2(n) of however many records we have.

Answer 27

- because most of the B+tree can be kept in memory so it's efficient/quick to fetch from

Answer 28

if it's level 2 you do (n+1) x block size if it's level 3 you do (n+1) x (n+1) x block size if it's level 4 you do (n+1) x (n+1) x (n+1) x block size and so on

Answer 29

- The initial query plan might not be the optimal one.

Answer 30

The the longer it will take to complete the cross product. - that’s why the method of selecting the tuples with stores in Liverpool first is a faster, more efficient approach

Answer 31

Can combine a select operator together with a cross product into an equijoin then use a sort join to join them even faster.

Answer 32

- first we evaluate a query plan (bottom up) because efficiency depends on size of intermediate results - so we rewrite the query plan so intermediate results are as small as they can be, based on equivalence laws

Answer 33

you can split them up and move them around, such as moving one inside the other and this can improve the query speed (especially if you have indexes on both) - so push selections as far down the tree as possible, this removes as many irrelevant tuples earlier on in execution and we can use indexing which is very fast

Answer 34

You can move the selection into the part of R and this is quite a bit more efficient. - push protections as far down the tree as well

Answer 35

You can combine them into an equijoin

Answer 36

- A physical query plan adds information required to execute the optimised query plan. such as: - Which algorithm to use for execution of operators? - How to pass information from one operator to the other? - May also insert additional operations such as sorting, in some cases as it can make things much more efficient (if done early).

Answer 37

- first we generate many different physical query plans - then we estimate the cost of execution of each plan - we select the physical query plan with the lowest estimated cost

Answer 38

- we focus on the number of disk access operations - and the important parameters of the database

Answer 39

◦ Selection of algorithms for the individual operators (equijoin is faster than natural join etc) ◦ Method for passing information around between the different parts of your query ◦ Size of intermediate results - one of the most critical factors

Answer 40

◦ Size of relations ◦ Number of distinct items per attribute in each relation (we can estimate this using statistics or go through the whole database)

Answer 41

You can estimate the size of A = a where a is some constant of R as: - the size of R divided by the number of distinct values of this column A in R

Answer 42

- if each value in A occurs equally often

Answer 43

- A simple estimate for a natural join based on the sizes of R and S and a max number of distinct values in the common attributes could be the size of R multiplied by the size of S, divided by the maximum number of distinct values for A in R or S

Answer 44

◦ one way: explore all possibilities? So for each algorithm see how well it runs then select the best of them. This will take a long time! ◦ More sensible approache: top-down/bottom-up method, you start from the bottom and move to the top and whatever had been best so far is likely to still be the most efficient so we continue with that.

Answer 45

Because it's a variable in the size of the runtime

Answer 46

Materialisation: write intermediate results out to the hard disk, then read it in again when you need to use it Pipelining - “stream-based processing algorithm”- basically two bars where you can combine one thing with another

Answer 47

◦ First operation passes the resulting tuples directly to the next operation without reading things out to the hard disk ◦ To do with we need extra buffer space for each pair of adjacent operations to hold tuples we’re currently passing along from one relation to the other

Answer 48

- Because it doesn't involve writing to or reading from the hard disk which takes longer because it's further away than from main memory and it stores more information you have to search through

Answer 49

Better to do it on the side with the primary key as this means there's no duplicate values which results in less intermediate results

Answer 50

t’s smart to do the sort algorithm after a relation has been first queried with a select operation, as this will leave less tuples to sort through.

Answer 51

- By using pipelining (because this lessens disk accesses)

Query processing Flashcards

(75 cards)