Module 8b - Graph Processing Flashcards

Question 1

Q

What is the motivation for Google Pregel?

Answer

A

Many important data sets look like graphs.

- Some graph sets are massive, and computations can be parallelized

Question 2

Q

A Pregel computation consists of a sequence of iterations called a superstep.

What are the 6 things that occur during a superstep?

Answer

A

Workers asynchronously execute a user-defined function on each vertex
Vertices can receive messages sent in the previous superstep
Vertices can modify their value, modify the values of their edges, as well as add/remove edges
Vertices can send messages to be received in the next superstep
Each vertex may also deactivate itself (i.e., vote to halt)
An inactive vertex is reactivated when it receives a message

Question 3

Q

The model of computation in pregel is vertex-centric. What does this mean?

Answer

A

“think like a vertex”

Question 4

Q

During a superstep in pregel, what can a vertex do?

Answer

A

A vertex can read messages sent to it in superstep S − 1,
A vertex can send messages to other vertices that will be received at superstep S + 1
A vertex can modify the state of V and its outgoing edges.

Question 5

Q

What does pregel maintain in the state for each vertex?

Answer

A

A problem specific value (ex: PageRank value)
A list of messages sent to the vertex
A list of outgoing edges
A binary active/inactive state

Question 6

Q

In pregel, supersteps are computed _______, whereas workers compute _______ within each superstep, and communicate in _______ supersteps

Answer

A

synchronously
asynchronously
between

Question 7

Q

What is a vertex partition in Pregel?

Answer

A

A subset of a directed graph’s vertices. Each worker is responsible for a vertex partition

Question 8

Q

In Pregel, vertices can be either “active” or “inactive”.

How does a vertex switch from active to inactive? vice versa?

Answer

A

If an active vertex votes to halt, then it becomes inactive
If an inactive vertex receives a message, then it becomes active

Question 9

Q

When does the pregel program stop executing?

Answer

A

A distributed execution stops when all vertices are inactive and where are no more messages to process

Question 10

Q

Do computing units (vertices) in Pregel use shared memory?

Answer

A

No. They use message passing. Each vertex uses message passing to communicate with other vertices

Question 11

Q

Upon initialization of a Pregel program, how is vertex ownership determined?

Answer

A

The vertex ownership is determined by a partitioner, which by default is a simple hash function over vertices.
This ensures a fairly even distribution of data across workers, but does not always balance the computation

Question 12

Q

In Pregel, what does a worker do upon initialization?

Answer

A

Each worker reads its section of the input
Stores vertices that belong to it
Forwards the remaining vertices to the appropriate workers

Question 13

Q

In Pregel, what does the master do upon initialization?

Answer

A

Assigns a section of the input to each worker, (similarly to the way input splits are determined in Hadoop)

Question 14

Q

Upon initialization of the Pregel program, can the user override the default partitioning scheme?

Answer

A

Yes, it can be overwritten to exploit locality by co-locating graph components or dense subgraphs

Question 15

Q

In Pregel, combiners are supported (similar to Hadoop).

What is the benefit of combiners in Pregel? What is the drawback?

Answer

A

Combiners reduce the amount of data exchanged over the network and the number of messages
This trades CPI cycles against network I/O

Question 16

Q

In Pregel, for what cases are combiners applicable?

How are combiners enabled?

Answer

Study These Flashcards

A

Combiners are applicable for when the function applied at each vertex is communicative and associative (ex: min, max, sum)

Combiners are user-defined and explicitly enabled

Question 17

Q

In Pregel, aggregators are supported.

What do aggregators do? What can they be used for?

Answer

Study These Flashcards

A

Aggregators are used to compute aggregate statistics from vertex-reported values
Aggregators can be used to evaluate a convergence criterion in an iterative computation

Question 18

Q

In Pregel, aggregators are supported.

Explain the mechanism for workers and masters using aggregators in between supersteps

Answer

Study These Flashcards

A

Workers aggregate values from their vertices during each superstep
At the end of each superstep, the values from the workers are aggregated in a tree structure, and the value from the root of the tree is sent to the master
The master shares the value with all vertices in the next superstep

Question 19

Q

As a part of fault tolerance in pregel systems, what does the master do at the start of a superstep?

How is the frequency of these operations determined?

Answer

Study These Flashcards

A

For fault tolerance, at the start of a superstep:
- The master tells the workers to save their state to a persistent storage. Just like a checkpoint

This includes vertex values, edge values, and list of incoming messages
The master also saves any aggregator values
The frequency of these operations is user-determined

Question 20

Q

Fault tolerance in Pregel systems. What happens when the master detects one or more worker failures?

Answer

Study These Flashcards

A

When the master detects a failure:

Rolls back all workers to the most recent checkpoint
Computation is repeated from this checkpoint

Question 21

Q

In a Pregel system, what how can the fault tolerance be made more effiicient?

Answer

Study These Flashcards

A

A more efficient recovery system:

Only the failed workers revert to the checkpoint
This requires deterministic replay of any messages sent to those workers at each superstep since the checkpoint

Question 22

Q

Pregel is a model for ______ distributed _____ computation

Answer

Study These Flashcards

A

scalable

graph

Module 8b - Graph Processing Flashcards

(22 cards)