Algo Related Flashcards

Question 1

Q

Why is it impossible in distributed systems to judge the exact time of events occuring on different machines?

Answer

A

Each machine has a seperate processor(and corresponding clock)

it is not possible to sync these clocks preceisely and keep them synchronize

propagation dlauy for “start-clock” are different on different machines
Slight variations existing in all crystal frequencies
Systems crashes are difficult to deal with and would require resynchronization

Question 2

Q

What impact can lack of synchronization have in distributed sytems

Answer

A

Fairness: We can’t provide service in strict order of request

Debugging: It is not possible to judge the order of distributed events so, what about race conditions?

Question 3

Q

What is “logical time”

Answer

A

We don’t need a precise notion of time (because we can’t measure and synchronize it), we are really only interest in the relative order of events.

This leads to the notion of “logical time”, which specifys the relative order of related events of importance to some specific task (debugging for example)

Question 4

Q

What is Lamports “happens-before” relation?

Answer

A

Lamport defines the “Happens-before” relation A->B (read “A happens before B”)

Where A and B are events in the same process and A executed before B, as judged by the local system clock, then A -> B

If A is the event of sending a message to some process which receives it via event B, then A -> B. (a message cannot be received before it is sent)

Transitivity holds, A->B and B->C then A->C

If two events are not related by “happens-before” (because one depends on another), then they are said to occur “concurrently”.

We can’t judge that they don’t happen concurrently so must assume they may

Lamport’s logical clock is implemented by “piggy backing” timestamp values on outgoing messages and incorporating the maintenance of logical time into the messaing process.

This related to only certain events in processes on different machines that are communication.

Happens-before is therefore only a partial order on the events in processes running on different machines

Question 5

Q

What is logical clock and what is it for?

Answer

A

It solves the problem of judging orderings on different machines.

E.g. A occurs on M₁ and B occurs on M₂ then we want to be able to answer the question “Did A come before B, B before A, or were they potentially concurrent?

Messaging between processors indeuces specific event orderings allows us to use time stamped messages to answer the above question

Each machine will keep a count or “Logical clock” (LC_i)

The LC on each machine is incremented whenever an event of interest occurs and also adjustted when a message is recieved from a process running on another machine.

Example: M_i receives a message from machine M_j and LC_i=50 while LC_j=200 then after receipt of the message, LC_i must be set to 201

We aren’t concerned that we skipped over some LC values;: All we are interested in is even order.

Question 6

Q

What is the alternative to Lamports Happens-before relation

Answer

A

Colin Fidge correclt answers “could to events in different processes have occurred concurrently?”

However the cost is a significant increase in the amount of “logical time” information exchanged and involves piggybacking a process’ idea of time in all other machines on every message.

Question 7

Q

What is “reaching consensus”?

Answer

A

Coming to agreement (replicated servers)

Controlling when some process can do something (with the agreement of others)

Question 8

Q

What are the 3 algorithms concerns with Distributed Consensus?

Answer

A

(almost) Fully Distributed
Centralized Server
Token Passing

Question 9

Q

What are the two properties used to judge the quality of different distributed algorithms?

Answer

A

Performance and Reliability

Question 10

Q

In general how does a Fully Dsitributed Consensus algorithm work?

Answer

A

When a machine wants to do soemthing it sends a time stamped message to every other related machine and waits for all to reply before acting.

When a machine recieves a request it:

replied immediately if it agrees and has no desire to act itself, or
replies immediately iif it wants to act but the recieved timestamp is small (ie.older) than it’s timestamp
deferes its reply until it has finished acting if the received timestamp is larger (ie. younger/more-recent) than it’s timestamp.

Question 11

Q

In the distributed consensus, what is the cost of reaching consensus

Answer

A

There are N machines.

To act, a machine sents N-1 messages

This machine waits to recieve N-1 messages back

This is a total of 2N-2 messages

Question 12

Q

In general what is Central Server based Consensus?

Answer

A

When a machine wants to act it sends a request to a centralized server and waits for a reply.

Once it recieves the reply it acts and then sends a “release” message back to the server

When the centralized server recieves a request message, it check if any other machine is acting. If there is a server acting, it queues the request (behind any other request) otherwise if a server is not acting it give the go message.

When the central server recieves a release emssage, the server looks for the next queued request and gives that server the go ahead.

Question 13

Q

What is the cost for Central server based consensus?

Answer

A

Assuming N processes

Each process must send a request to a server

Must recieve a “go ahea” from the server

Must sent a release message back when it is done

That is 3 messages.

Question 14

Q

What is the algo for dsitributed consensus called :

Concensus by Token Passing

Answer

A

In order to act a machine must hold a token.

The token constanly circulates from machine to machine in a round robin fashion

A machine waits for the token before acting (if it doesn’t have the token already)

Once the machine finishes it passes the token on to the next machine in the sequence

Question 15

Q

What is the cost for concensus by token passing?

Answer

A

It’s harder to assess than the other algorithms and each process pays a fixed overhead regardless of whether or not it is accessing its critical section.

Question 16

Q

Whate the reliability / fairness concerns for each of the concensus algorithms?

Answer

Study These Flashcards

A

Fully Distributed

If a process fails, this must be detected so that any waiting on replies from it may continue
Who will do this? What about anonymity?

Central Server Based Concensus

Reliability
- If process fails, this must be detected so that the server can update its stored state information
- What if the server fails (“elect” a new server)

Token Passing

Reliability
- What if a machine fails or the token gets lost, or worse duplicated?
Fairness
- What about the service order that is enforces
- What if a machine holds the token too long?

Question 17

Q

What is a centralized server often referred to?

Answer

Study These Flashcards

A

A “coordinator”

Question 18

Q

What are election algorithms for?

Answer

Study These Flashcards

A

After detecting a failture of a distributed system component we must somehow recover, a key challenge is the centralized server.

If that fails we need to run an algorithm to elect a new one.

Question 19

Q

What are the two classic election algorithms?

Answer

Study These Flashcards

A

Bully Algo

Ring Algo

Question 20

Q

How does the bully algorithm work?

Answer

Study These Flashcards

A

Question 21

Q

What is the Ring Algorithm and how does it work?

Answer

Study These Flashcards

A

It basically goes in a ring, until all the priorities are recieved and then selects the one with the highest.

When any process notices that the coordinator is not functioning, it builds an ELECTION message containing its own process number and sends the message to its successor. If the successor is down, the sender skips over the successor and goes to the next member along the ring, or the one after that, until a running process is located. At each step, the sender adds its own process number to the list in the message.

Eventually, the message gets back to the process that started it all. That process recognizes this event when it receives an incoming message containing its own process number. At that point, the message type is changed to COORDINATOR and circulated once again, this time to inform everyone else who the coordinator is (the list member with the highest number) and who the members of the new ring are. When this message has circulated once, it is removed and everyone goes back to work.

Question 22

Q

Byzantine consensus is far more resilient than normal consensus mechanisms. How, generally speaking, is this achieved? Why might it be necessary in a distributed system?

Answer

Study These Flashcards

A

Any algorithm to solve the Byzantine general’s problem must have the following two properties:

All loyal generals must come to the same decision
A small number of traitorous generals cannot cause the loyal generals to adopt a bad plan

We need to have enough valid information that it outnumbers traitorous information so that the system can reach a consensus.

We need to make 3 assumptions

Every message is delivered correctly
- Reasonable: reliabile messaging systems exist
The reciever of a message knows who sent it
- Reasonable: cryptographically sign the messages
The abscence of a message can be deterected
- Resonable: just use timeouts

The generals need to reach a consensus decision, using enough correct information that the consensus reached is actually correct.

In short, as long as we have greater less than 1/3 of traitors to loyals (3M+1) we can reach a consensus of correct information.

Each general takes a turn as leutenant and sends the orders to each general. This goes around in a round robin type fashion, each redistributing information. As long as there are no more than 1/3 traitors the correct information will outwheigh the incorrect information and thus reaching a concensus.

This is necessary in a distributed system because we have different parts of the systems/software running on seperate remote machines. Either by a fault or intentionally, it is possible for these parts of the system to output incorrect information, thus causing a failure of the process to be achieved. depending on the implementation, it may be necessary to plan for this type of problem to ensure correct implementations of our software.

Algo Related Flashcards

(22 cards)