Week 4 - Parallel Data Architecture Flashcards

Question 1

Q

Two types of Parallel database system

Answer

A

1) Pipeline Parallelism

2) Partition Parallelism

Question 2

Q

What is Pipeline Parallelism

Answer

A

Many machines each doing on set in a milt-step process

Question 3

Q

What is Partition Parallelism

Answer

A

Many machines doing the same thing to different pieces of data

Question 4

Q

What is Speed up?

Answer

A

More resources means proportionally less time for a given amount of data 45 degree angle

Question 5

Q

What is scale-up?

Answer

A

If resources increased in proportion to increased data size,time, is constant (no diminishing returns )

Question 6

Q

When is scale up used in parallel databases?

1) To implement parallelism in databases for faster processing.
2) To have the same performance levels when workloads increase.
3) To break the processing in a sequential manner.

Answer

A

2) To have the same performance levels when workloads increase.

Question 7

Q

Shared Memory (SMP) means

Answer

A

multiple CPUs that can run things in parallel but they share the same memory space.

Question 8

Q

Shared Disk

Answer

A

In the shared disk architecture, you have multiple CPUs and

each one has its own memory space.

Question 9

Q

Shared Nothing

Answer

A

For the shared nothing architecture, multiple CPUs have their own memory space, not only that, they also have their own secondary storage

Question 10

Q

How do machines communicate using the share nothing

Answer

A

only way the machines communicate with each other is through the network

Question 11

Q

Advantage of Shared Memory

Answer

A

Easy to program

Question 12

Q

2 Disadvantage of Shared Memory

Answer

A

1) expensive to build

2) Difficult to scale

Question 13

Q

2 Advantage of sShared Nothing

Answer

A

1) cheaper to build

2) easier to scale up

Question 14

Q

Disadvantage of Shared Nothing

Answer

A

Harder to program

Question 15

Q

Intra-operator Parallelism

Answer

A

Get all machines working to computer a give operation

scan,sort,join

Question 16

Q

Inter-operator Parallelism

Answer

A

each operator may run concurrently on a different site

exploits pipelining

Question 17

Q

Inter-query Parallelism

Answer

A

different queries run on different sites

Question 18

Q

3 Types of data partitioning

Answer

A

1) Range
2) Hash
3) Round Robin

Question 19

Q

Range Partitioning means

Answer

A

Partitioning data on a machine and doing the processing on that machine (Partitioning based on logical sort of data) Like by Age

Question 20

Q

Hash Partitioning means

Answer

A

range partitioning runs a hash function,
and the hash function will decide which tuple,
or Retiro in the table will be assigned to which partition.

Question 21

Q

Round Robin Partitioning means

Answer

A

For each row in the table, you assign it to the first partition.
The second row you assign it to the second partition. And so on, and so forth.

Question 22

Q

3 Items Parallel Sorting

Answer

A

1) scan in parallel and range-partition as you go (sort attribute)
2) As tuples come in, begin “local” sorting on each
3) Resulting data is stored and range-partitioned

Question 23

Q

Parallel Sorting Problem

Answer

A

skew!

Some partitions will have more data than others, unbalanced load

Question 24

Q

Parallel Sorting Solution:

Answer

A

sample the data at start to determine partition points (find data distribution so data can be sorted evenly in partitions)

Question 25

Q

2 types of Parallel join

Answer

A

1) Nested loop

2) Sort Merge (plain merge join)

Question 26

Q

Nested loop

2 items

Answer

A

1) Each outer tuple must be compared with each inner tuple that might join
2) Easy for range Partitioning on join cols, hard otherwise

Question 27

Q

Sort Merge (plain merge join)

2 items

Answer

A

1) Sorting give range partitioning

2) Merging partitioned tables is local

Question 28

Q

Complex Queries:Inter-Operator parallelism

2 items

Answer

A

1) Pipeline between operators

2) Bushy Trees

Question 29

Q

What is the high-level query processing language used by database management systems?

1) SQL
2) HTML
3) XML
4) PL

Question 30

Q

Which of the following cannot be a goal in a query processing?

1) Maximizing solution space
2) Minimizing processing time
3) Maximizing throughput
4) Minimizing transfers among distributed sites

Answer

A

1) Maximizing solution space

Question 31

Q

Which of the following search algorithms takes the longest processing time?

1) Exhaustive search
2) Heuristic algorithm
3) Simulated annealing
4) Genetic algorithm

Answer

A

1) Exhaustive search

Question 32

Q

What is the correct order of tasks in a typical distributed query processing?

1) Decomposition, Localization, Optimization
2) Decomposition, Optimization, Localization
3) Localization, Decomposition, Optimization
4) Optimization, Decomposition, Localization

Answer

A

1) Decomposition, Localization, Optimization

Question 33

Q

What is the correct order of tasks in the decomposition step of the distributed query processing?

1) Normalization, Eliminating Redundancy, Algebraic Rewriting
2) Normalization, Algebraic Rewriting, Eliminating Redundancy
3) Eliminating Redundancy, Normalization, Algebraic Rewriting
4) Eliminating Redundancy, Algebraic Rewriting, Normalization

Answer

A

1) Normalization, Eliminating Redundancy, Algebraic Rewriting