Week 4 - Parallel Data Architecture Flashcards

1
Q

Two types of Parallel database system

A

1) Pipeline Parallelism

2) Partition Parallelism

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Pipeline Parallelism

A

Many machines each doing on set in a milt-step process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Partition Parallelism

A

Many machines doing the same thing to different pieces of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Speed up?

A

More resources means proportionally less time for a given amount of data 45 degree angle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is scale-up?

A

If resources increased in proportion to increased data size,time, is constant (no diminishing returns )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When is scale up used in parallel databases?

1) To implement parallelism in databases for faster processing.
2) To have the same performance levels when workloads increase.
3) To break the processing in a sequential manner.

A

2) To have the same performance levels when workloads increase.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Shared Memory (SMP) means

A

multiple CPUs that can run things in parallel but they share the same memory space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Shared Disk

A

In the shared disk architecture, you have multiple CPUs and

each one has its own memory space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Shared Nothing

A

For the shared nothing architecture, multiple CPUs have their own memory space, not only that, they also have their own secondary storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do machines communicate using the share nothing

A

only way the machines communicate with each other is through the network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Advantage of Shared Memory

A

Easy to program

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

2 Disadvantage of Shared Memory

A

1) expensive to build

2) Difficult to scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

2 Advantage of sShared Nothing

A

1) cheaper to build

2) easier to scale up

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Disadvantage of Shared Nothing

A

Harder to program

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Intra-operator Parallelism

A

Get all machines working to computer a give operation

scan,sort,join

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Inter-operator Parallelism

A

each operator may run concurrently on a different site

exploits pipelining

17
Q

Inter-query Parallelism

A

different queries run on different sites

18
Q

3 Types of data partitioning

A

1) Range
2) Hash
3) Round Robin

19
Q

Range Partitioning means

A

Partitioning data on a machine and doing the processing on that machine (Partitioning based on logical sort of data) Like by Age

20
Q

Hash Partitioning means

A

range partitioning runs a hash function,
and the hash function will decide which tuple,
or Retiro in the table will be assigned to which partition.

21
Q

Round Robin Partitioning means

A

For each row in the table, you assign it to the first partition.
The second row you assign it to the second partition. And so on, and so forth.

22
Q

3 Items Parallel Sorting

A

1) scan in parallel and range-partition as you go (sort attribute)
2) As tuples come in, begin “local” sorting on each
3) Resulting data is stored and range-partitioned

23
Q

Parallel Sorting Problem

A

skew!

Some partitions will have more data than others, unbalanced load

24
Q

Parallel Sorting Solution:

A

sample the data at start to determine partition points (find data distribution so data can be sorted evenly in partitions)

25
Q

2 types of Parallel join

A

1) Nested loop

2) Sort Merge (plain merge join)

26
Q

Nested loop

2 items

A

1) Each outer tuple must be compared with each inner tuple that might join
2) Easy for range Partitioning on join cols, hard otherwise

27
Q

Sort Merge (plain merge join)

2 items

A

1) Sorting give range partitioning

2) Merging partitioned tables is local

28
Q

Complex Queries:Inter-Operator parallelism

2 items

A

1) Pipeline between operators

2) Bushy Trees

29
Q

What is the high-level query processing language used by database management systems?

1) SQL
2) HTML
3) XML
4) PL

A

1) SQL

30
Q

Which of the following cannot be a goal in a query processing?

1) Maximizing solution space
2) Minimizing processing time
3) Maximizing throughput
4) Minimizing transfers among distributed sites

A

1) Maximizing solution space

31
Q

Which of the following search algorithms takes the longest processing time?

1) Exhaustive search
2) Heuristic algorithm
3) Simulated annealing
4) Genetic algorithm

A

1) Exhaustive search

32
Q

What is the correct order of tasks in a typical distributed query processing?

1) Decomposition, Localization, Optimization
2) Decomposition, Optimization, Localization
3) Localization, Decomposition, Optimization
4) Optimization, Decomposition, Localization

A

1) Decomposition, Localization, Optimization

33
Q

What is the correct order of tasks in the decomposition step of the distributed query processing?

1) Normalization, Eliminating Redundancy, Algebraic Rewriting
2) Normalization, Algebraic Rewriting, Eliminating Redundancy
3) Eliminating Redundancy, Normalization, Algebraic Rewriting
4) Eliminating Redundancy, Algebraic Rewriting, Normalization

A

1) Normalization, Eliminating Redundancy, Algebraic Rewriting