CS7210 Flashcards

1
Q

What is a distributed system?

A

A collection of computing units that interact by exchanging messages via an interconnection network and appear to external users as a single coherent computing facility

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the CAP theorem?

A

A distributed system can be two of:

  • Consistent
  • Available
  • Partitioned
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some difficulties in client-server systems?

A
  • Discovery and bundling
  • Identifying the interface and parameter types
  • Agreeing on the data representation
  • Explicit data management
  • Unpredictable delays
  • Unknown cause of failures
  • Must be explicitly handled
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the goals of an RPC system?

A

Hide complexity of distributed programming, and make it look more similar to local node programming.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When we can look at two timestamps from a logical clock, and the timestamp of one event is greater than another event, if we can assume that the event with a greater timestamp happened later, we say that logical clock has ____ ____ ____

A

Strong clock consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What type of value do Lamport clocks for their timestamp?

A

A scalar value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Are Lamport clocks strongly consistent?

A

No, it is only consistent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In the context of logical clocks, what is the difference between consistency and strong consistency?

A

Consistent: Difference in the order of two events implies difference in timestamps, but not vice versa

Strongly Consistent: Difference in timestamps implies the order of two events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does a vector clock work?

A

Each process maintains its own view of time at other nodes.

It keeps a vector (array) of values, which represent Lamport clocks for each node.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When a vector clock receives a message, what does it do with the passed in vector value?

A

For each value in the vector, it updates its own with the maximum of the value in its own or the message vector. Then, after triggering an event, it increments its own value in its own vector.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Suppose vector clock 0 has clock values [2,1,1] and receives a message with clock values [2,2,0]. What will its vector values be after processing the message?

A

[3,2,1]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Are vector clocks consistent?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Are vector clocks strongly consistent?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Does a Lamport clock require more space as more nodes are added?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Does a vector clock require more space as more nodes are added?

A

Yes, each node has its own element in the vector (array)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

If a scalar clock represents a node’s belief about the current time, and a vector clock represents a node’s belief about each node’s belief about the current time, what does a matrix clock represent?

A

A node’s belief, about each node’s belief, about each node’s belief about the current time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What benefit does a matrix clock provide over a vector clock?

A

It allows for garbage collection. If every node knows that every node has passed a time t , then every node knows about everything that has happened prior to t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the Global State of a distributed system comprised of?

A
  • Processes and Channels
  • Process state defined by most recent event
    • Channel state defined by inflight messages
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Each event in a distributed system changes the state of at least one of the entities, therefore changing the ____ of the distributed system.

A

State

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

In the context of distributed system state, what is a “run”?

A

A sequence of events. It can be actual, or observed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

In the context of state in a distributed system, what is a cut?

A

The state of the system at a point in time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

In the context of state in a distributed system, is a cut at the same point in time for all nodes?

A

No, it might be curved like this:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

In the context of state in a distributed system, what is a consistent cut?

A

A snapshot that corresponds to some possible point in the execution which could have been represented by a consistent ordering of the events.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

In the context of state in a distributed system, what is a prerecording event?

A

Events that occurred before a cut

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

In the context of state in a distributed system, what is a postrecording event?

A

Events that happen after a cut

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Why is it difficult to capture state in a distributed system?

A
  • Instant recording is impossible
    • We can’t assume a global clock
    • Networks have delays
      • They’re not deterministic
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How does the Chandy and Lamport snapshot algorithm to capture distributed system state work?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Does the Chandy and Lamport algorithm promise to give a consistent state?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

In the context of distributed system state, what is a “stable property”?

A

Iff when it becomes true in a state S , it remains true for all states S’ reachable from S

30
Q

What are some examples of Stable Properties?

A

Deadlock

Termination

31
Q

In the context of distributed system state, what is an “unstable property”?

A

If when it becomes true in a state S , it is possibly (but not necessarily) true for a later state reachable from S

32
Q

What is consensus?

A

Agreement among distributed processes about something, e.g. the outcome of a transaction

33
Q

What are the key properties of a consensus?

A

All non-faulty processes eventually decide on a value, all processes decide on a single value, and the value that’s decided on must have been proposed by some process

34
Q

In the context of consensus in distributed systems, what is an Admissible Run?

A

Run with 1 faulty processor and all messages eventually delivered (matches system model)

35
Q

In the context of consensus in distributed systems, what is a Deciding Run?

A

An admissible run where some non faulty processors reach a decision

36
Q

In the context of consensus in distributed systems, what is a Totally Correct Consensus Protocol?

A

When all admissible runs are also deciding runs

37
Q

In the context of consensus in distributed systems, what is a uni-valent configuration?

A

When only one decision is possible

38
Q

In the context of consensus in distributed systems, what is a bi-valent configuration?

A

When more than one decision is possible

39
Q

What is the FLP theorem?

A

The Fischer/Lynch/Patterson theorem says that in a system with one fault, no consensus protocol can be totally correct.

40
Q

How do existing consensus systems not violate the FLP theorem?

A

They change some of the assumptions and system properties

41
Q

In distributed systems, what is the goal of replication?

A

To make state available at more than one node

42
Q

What is the difference between active and standby (primary-backup) replication?

A

In active, both replicas can handle requests. In stand-by, one replica takes requests and others are kept consistent in preparation for a failover.

43
Q

What are the two main techniques to implement replication?

A

State replication, and replicated state machine

44
Q

What is the difference between state replication, and a replicated state machine?

A

State replication is where one replica processes a request, and then copies the new resulting state to other replicas.

Replicated state machine is where a copy of each operation is sent to and executed at each replica, to produce the same state update.

45
Q

What are the pros and cons of the state replication approach to replication?

A

Pro: no need to re-execute multiple times

Con: state may be large or hard to identify where all updates are

46
Q

What are the pros and cons of the replicated state machine approach to replication?

A

Pro: no need to send large state, operation logs may be smaller

Con: must re-execute, and it requires that the operations be deterministic

47
Q

How does chain replication work?

A

The head node receives a write request, and forwards it down a chain of replicas. This prevents the head replica from having to interact with every other replica.

48
Q

With chain replication, which replica serves read requests?

A

The tail replica (last one in the chain), because it ensures that all of the replicas have received the updates.

49
Q

What are the pros and cons of chain replication?

A

Pros: the leader write node is scalable, it can handle a lot of writes, and it makes strong consistency possible

Cons: many workloads are read heavy, and inefficient in that middle nodes may be underutilized

50
Q

How does CRAQ improve on chain replication?

A

It keeps both the old and new versions of data at each replica while the update propagates through the chain. If both versions are present, it checks with the tail to see if the new version has finished propagating. If it hasn’t, it returns the old version. This makes it possible to send reads to the replicas in the middle of the chain.

51
Q

When a fault is ____ it leads to an error

A

activated

52
Q

When an error ____ it becomes a failure

A

propogates

53
Q

What is a transient fault?

A

A fault that appears and then disappears

54
Q

What is an intermittent fault?

A

A fault that appears on and off

55
Q

What is a permanent fault?

A

A fault that stays once it occurs

56
Q

What is a fail-stop failure?

A

One or more components of the distributed system stop working/responding, like a crash

57
Q

What is a timing failure?

A

The system components behave outside of some timing expectations, which can interfere with things like timeouts

58
Q

What is an omission failure?

A

When some action is missing

59
Q

What are some practical ways of managing failures?

A

Detection, Removal, and Recovery

60
Q

What are a couple basic techniques for tracking state change so that a node can roll back to a known good state?

A

Checkpoints and logging

61
Q

What is the checkpointing technique for tracking state change so that a node can roll back to a known good state?

A

Where the node periodically saves its state, so that it can reload from there

62
Q

What is the logging technique for tracking state change so that a node can roll back to a known good state?

A

Where the node logs information about the operations performed, so that it can undo or redo changes. The log is kept on persistent storage, but the changes are smaller so less I/O time is needed. The downside is that recovery takes longer than it does in checkpointing.

63
Q

How can we combine the checkpointing and logging techniques for recovering a failed node?

A

Checkpoint every so often, and keep logs since the last checkpoint

64
Q

What are the three different approaches to determining when to capture a checkpoint?

A

Uncoordinated, coordinated, and communication-induced

65
Q

What is uncoordinated checkpointing?

A

When processes take checkpoints independently

66
Q

What are the risks/downsides of uncoordinated checkpointing?

A

You might have to go very far back before finding a combination of processes’ checkpoints that make up a consistent cut. aka domino effect

Processes may capture checkpoints that can’t be part of a consistent cut

May need to keep more than the most recent checkpoints

Creates the need to identify obsolete checkpoints

67
Q

What is coordinated checkpointing?

A

When the processes coordinate their checkpoints so that they get a consistent cut.

68
Q

What are the two kinds of communicaton-induced checkpoints?

A

Blocking: where an initiator uses 2-phase-commit or some other consensus algorithm to stop underlying application work and take a snapshot

Non blocking: Using the global snapshot algorithm, piggybacking the info instead of using a marker to eliminate the need for FIFO networking

69
Q

What is the fundamental tradeoff of logging vs checkpointing?

A

Spending compute vs I/O

70
Q

What is pessimistic logging technique for failure recovery?

A

Logging everything to persistent storage before allowing an event to propagate and commit

71
Q

What is optimistic logging technique for failure recovery?

A

Assuming that log will be persisted before a failure occurs, but making it possible to remove effects if abort needed

72
Q

What is causality-tracking logging technique for failure recovery?

A

Ensuring causally related events are deterministically recorded