Parallel DDS Flashcards
1
Q
What is a Database system seeks to improve performance through parallelization of various operations, such as loading data, building indexes and evaluating queries.
A
A Parallel Database
2
Q
Describe Parallelism in databases
A
- Data can be partitioned across multiple disks for parallel I/O.
- Individual relational operations (e.g., sort, join, aggregation) can be executed in parallel
- data can be partitioned and each processor can work independently on its own partition.
- Queries are expressed in high level language (SQL, translated to relational algebra)
- makes parallelization easier.
- Different queries can be run in parallel with each other. Concurrency
control takes care of conflicts. - Thus, databases naturally lend themselves to parallelism.
3
Q
Describe I/O parallelism
A
- Reduce the time required to retrieve relations from disk by
partitioning - the relations on multiple disks.
- Horizontal partitioning – tuples of a relation are divided among many disks such that each tuple resides on one disk.
- Partitioning techniques (number of disks = n):
- Round-robin:
Send the ith tuple inserted in the relation to disk i mod n. - Hash partitioning:
* Choose one or more attributes as the partitioning attributes.
* Choose hash function h with range 0…n - 1- Let i denote result of hash function h applied to the partitioning attribute
value of a tuple. Send tuple to disk i.
- Let i denote result of hash function h applied to the partitioning attribute
- Round-robin:
4
Q
Describe Range Partitioning
A
- Choose an attribute as the partitioning attribute.
- A partitioning vector [vo, v1, …, vn-2] is chosen.
- Let v be the partitioning attribute value of a tuple. Tuples such that vi vi+1 go to
disk I + 1. Tuples with v < v0 go to disk 0 and tuples with v vn-2 go to disk n-1.
5
Q
Describe the comparison of partitioning Techniques
A
- Evaluate how well partitioning techniques support the following types of data access:
1.Scanning the entire relation.
2.Locating a tuple associatively – point queries.- E.g., r.A = 25.
3.Locating all tuples such that the value of a given attribute lies
within a specified range – range queries.
* E.g., 10 r.A < 25
6
Q
Describe Round Robin
A
- Advantages
- Best suited for sequential scan of entire relation on each query.
- All disks have almost an equal number of tuples; retrieval work is thus well balanced between disks.
- Range queries are difficult to process
- No clustering – tuples are scattered across all disks
7
Q
Describe hash partitioning
A
- Good for sequential access
—- Assuming hash function is good, and partitioning attributes form a key,
tuples will be equally distributed between disks
—- Retrieval work is then well balanced between disks. - Good for point queries on partitioning attribute
—– Can lookup single disk, leaving others available for answering other queries.
—– Index on partitioning attribute can be local to disk, making lookup and
update more efficient - No clustering, so difficult to answer range queries
8
Q
Describe Range partitioning
A
- Provides data clustering by partitioning attribute value.
- Good for sequential access
- Good for point queries on partitioning attribute: only one disk needs to be
accessed. - For range queries on partitioning attribute, one to a few disks may need to be accessed
- Remaining disks are available for other queries.
- Good if result tuples are from one to a few blocks.
- If many blocks are to be fetched, they are still fetched from one to a few disks, and potential parallelism in disk access is wasted
———- Example of execution skew.