Module 8b - Graph Processing Flashcards

1
Q

What is the motivation for Google Pregel?

A
  • Many important data sets look like graphs.

- Some graph sets are massive, and computations can be parallelized

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A Pregel computation consists of a sequence of iterations called a superstep.

What are the 6 things that occur during a superstep?

A
  1. Workers asynchronously execute a user-defined function on each vertex
  2. Vertices can receive messages sent in the previous superstep
  3. Vertices can modify their value, modify the values of their edges, as well as add/remove edges
  4. Vertices can send messages to be received in the next superstep
  5. Each vertex may also deactivate itself (i.e., vote to halt)
  6. An inactive vertex is reactivated when it receives a message
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The model of computation in pregel is vertex-centric. What does this mean?

A

“think like a vertex”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

During a superstep in pregel, what can a vertex do?

A
  • A vertex can read messages sent to it in superstep S − 1,
  • A vertex can send messages to other vertices that will be received at superstep S + 1
  • A vertex can modify the state of V and its outgoing edges.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does pregel maintain in the state for each vertex?

A
  1. A problem specific value (ex: PageRank value)
  2. A list of messages sent to the vertex
  3. A list of outgoing edges
  4. A binary active/inactive state
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In pregel, supersteps are computed _______, whereas workers compute _______ within each superstep, and communicate in _______ supersteps

A

synchronously
asynchronously
between

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a vertex partition in Pregel?

A

A subset of a directed graph’s vertices. Each worker is responsible for a vertex partition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In Pregel, vertices can be either “active” or “inactive”.

How does a vertex switch from active to inactive? vice versa?

A
  • If an active vertex votes to halt, then it becomes inactive
  • If an inactive vertex receives a message, then it becomes active
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When does the pregel program stop executing?

A

A distributed execution stops when all vertices are inactive and where are no more messages to process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Do computing units (vertices) in Pregel use shared memory?

A

No. They use message passing. Each vertex uses message passing to communicate with other vertices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Upon initialization of a Pregel program, how is vertex ownership determined?

A
  • The vertex ownership is determined by a partitioner, which by default is a simple hash function over vertices.
  • This ensures a fairly even distribution of data across workers, but does not always balance the computation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In Pregel, what does a worker do upon initialization?

A
  • Each worker reads its section of the input
  • Stores vertices that belong to it
  • Forwards the remaining vertices to the appropriate workers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In Pregel, what does the master do upon initialization?

A
  • Assigns a section of the input to each worker, (similarly to the way input splits are determined in Hadoop)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Upon initialization of the Pregel program, can the user override the default partitioning scheme?

A

Yes, it can be overwritten to exploit locality by co-locating graph components or dense subgraphs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In Pregel, combiners are supported (similar to Hadoop).

What is the benefit of combiners in Pregel? What is the drawback?

A
  • Combiners reduce the amount of data exchanged over the network and the number of messages
  • This trades CPI cycles against network I/O
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In Pregel, for what cases are combiners applicable?

How are combiners enabled?

A

Combiners are applicable for when the function applied at each vertex is communicative and associative (ex: min, max, sum)

Combiners are user-defined and explicitly enabled

17
Q

In Pregel, aggregators are supported.

What do aggregators do? What can they be used for?

A
  • Aggregators are used to compute aggregate statistics from vertex-reported values
  • Aggregators can be used to evaluate a convergence criterion in an iterative computation
18
Q

In Pregel, aggregators are supported.

Explain the mechanism for workers and masters using aggregators in between supersteps

A
  • Workers aggregate values from their vertices during each superstep
  • At the end of each superstep, the values from the workers are aggregated in a tree structure, and the value from the root of the tree is sent to the master
  • The master shares the value with all vertices in the next superstep
19
Q

As a part of fault tolerance in pregel systems, what does the master do at the start of a superstep?

How is the frequency of these operations determined?

A

For fault tolerance, at the start of a superstep:
- The master tells the workers to save their state to a persistent storage. Just like a checkpoint

  • This includes vertex values, edge values, and list of incoming messages
  • The master also saves any aggregator values
  • The frequency of these operations is user-determined
20
Q

Fault tolerance in Pregel systems. What happens when the master detects one or more worker failures?

A

When the master detects a failure:

  • Rolls back all workers to the most recent checkpoint
  • Computation is repeated from this checkpoint
21
Q

In a Pregel system, what how can the fault tolerance be made more effiicient?

A

A more efficient recovery system:

  • Only the failed workers revert to the checkpoint
  • This requires deterministic replay of any messages sent to those workers at each superstep since the checkpoint
22
Q

Pregel is a model for ______ distributed _____ computation

A

scalable

graph