Algo Related Flashcards
Why is it impossible in distributed systems to judge the exact time of events occuring on different machines?
Each machine has a seperate processor(and corresponding clock)
it is not possible to sync these clocks preceisely and keep them synchronize
- propagation dlauy for “start-clock” are different on different machines
- Slight variations existing in all crystal frequencies
- Systems crashes are difficult to deal with and would require resynchronization
What impact can lack of synchronization have in distributed sytems
Fairness: We can’t provide service in strict order of request
Debugging: It is not possible to judge the order of distributed events so, what about race conditions?
What is “logical time”
We don’t need a precise notion of time (because we can’t measure and synchronize it), we are really only interest in the relative order of events.
This leads to the notion of “logical time”, which specifys the relative order of related events of importance to some specific task (debugging for example)
What is Lamports “happens-before” relation?
Lamport defines the “Happens-before” relation A->B (read “A happens before B”)
Where A and B are events in the same process and A executed before B, as judged by the local system clock, then A -> B
If A is the event of sending a message to some process which receives it via event B, then A -> B. (a message cannot be received before it is sent)
Transitivity holds, A->B and B->C then A->C
If two events are not related by “happens-before” (because one depends on another), then they are said to occur “concurrently”.
- We can’t judge that they don’t happen concurrently so must assume they may
Lamport’s logical clock is implemented by “piggy backing” timestamp values on outgoing messages and incorporating the maintenance of logical time into the messaing process.
This related to only certain events in processes on different machines that are communication.
Happens-before is therefore only a partial order on the events in processes running on different machines
What is logical clock and what is it for?
It solves the problem of judging orderings on different machines.
- E.g. A occurs on M1 and B occurs on M2 then we want to be able to answer the question “Did A come before B, B before A, or were they potentially concurrent?
Messaging between processors indeuces specific event orderings allows us to use time stamped messages to answer the above question
Each machine will keep a count or “Logical clock” (LCi)
The LC on each machine is incremented whenever an event of interest occurs and also adjustted when a message is recieved from a process running on another machine.
- Example: Mi receives a message from machine Mj and LCi=50 while LCj=200 then after receipt of the message, LCi must be set to 201
We aren’t concerned that we skipped over some LC values;: All we are interested in is even order.

What is the alternative to Lamports Happens-before relation
Colin Fidge correclt answers “could to events in different processes have occurred concurrently?”
However the cost is a significant increase in the amount of “logical time” information exchanged and involves piggybacking a process’ idea of time in all other machines on every message.
What is “reaching consensus”?
Coming to agreement (replicated servers)
Controlling when some process can do something (with the agreement of others)
What are the 3 algorithms concerns with Distributed Consensus?
- (almost) Fully Distributed
- Centralized Server
- Token Passing
What are the two properties used to judge the quality of different distributed algorithms?
Performance and Reliability
In general how does a Fully Dsitributed Consensus algorithm work?
When a machine wants to do soemthing it sends a time stamped message to every other related machine and waits for all to reply before acting.
When a machine recieves a request it:
- replied immediately if it agrees and has no desire to act itself, or
- replies immediately iif it wants to act but the recieved timestamp is small (ie.older) than it’s timestamp
- deferes its reply until it has finished acting if the received timestamp is larger (ie. younger/more-recent) than it’s timestamp.

In the distributed consensus, what is the cost of reaching consensus
There are N machines.
To act, a machine sents N-1 messages
This machine waits to recieve N-1 messages back
This is a total of 2N-2 messages
In general what is Central Server based Consensus?
When a machine wants to act it sends a request to a centralized server and waits for a reply.
Once it recieves the reply it acts and then sends a “release” message back to the server
When the centralized server recieves a request message, it check if any other machine is acting. If there is a server acting, it queues the request (behind any other request) otherwise if a server is not acting it give the go message.
When the central server recieves a release emssage, the server looks for the next queued request and gives that server the go ahead.

What is the cost for Central server based consensus?
Assuming N processes
Each process must send a request to a server
Must recieve a “go ahea” from the server
Must sent a release message back when it is done
That is 3 messages.
What is the algo for dsitributed consensus called :
Concensus by Token Passing
In order to act a machine must hold a token.
The token constanly circulates from machine to machine in a round robin fashion
A machine waits for the token before acting (if it doesn’t have the token already)
Once the machine finishes it passes the token on to the next machine in the sequence

What is the cost for concensus by token passing?
It’s harder to assess than the other algorithms and each process pays a fixed overhead regardless of whether or not it is accessing its critical section.
Whate the reliability / fairness concerns for each of the concensus algorithms?
Fully Distributed
- If a process fails, this must be detected so that any waiting on replies from it may continue
- Who will do this? What about anonymity?
Central Server Based Concensus
- Reliability
- If process fails, this must be detected so that the server can update its stored state information
- What if the server fails (“elect” a new server)
Token Passing
- Reliability
- What if a machine fails or the token gets lost, or worse duplicated?
- Fairness
- What about the service order that is enforces
- What if a machine holds the token too long?
What is a centralized server often referred to?
A “coordinator”
What are election algorithms for?
After detecting a failture of a distributed system component we must somehow recover, a key challenge is the centralized server.
If that fails we need to run an algorithm to elect a new one.
What are the two classic election algorithms?
Bully Algo
Ring Algo
How does the bully algorithm work?

What is the Ring Algorithm and how does it work?
It basically goes in a ring, until all the priorities are recieved and then selects the one with the highest.

When any process notices that the coordinator is not functioning, it builds an ELECTION message containing its own process number and sends the message to its successor. If the successor is down, the sender skips over the successor and goes to the next member along the ring, or the one after that, until a running process is located. At each step, the sender adds its own process number to the list in the message.
Eventually, the message gets back to the process that started it all. That process recognizes this event when it receives an incoming message containing its own process number. At that point, the message type is changed to COORDINATOR and circulated once again, this time to inform everyone else who the coordinator is (the list member with the highest number) and who the members of the new ring are. When this message has circulated once, it is removed and everyone goes back to work.
Byzantine consensus is far more resilient than normal consensus mechanisms. How, generally speaking, is this achieved? Why might it be necessary in a distributed system?
Any algorithm to solve the Byzantine general’s problem must have the following two properties:
- All loyal generals must come to the same decision
- A small number of traitorous generals cannot cause the loyal generals to adopt a bad plan
We need to have enough valid information that it outnumbers traitorous information so that the system can reach a consensus.
We need to make 3 assumptions
- Every message is delivered correctly
- Reasonable: reliabile messaging systems exist
- The reciever of a message knows who sent it
- Reasonable: cryptographically sign the messages
- The abscence of a message can be deterected
- Resonable: just use timeouts
The generals need to reach a consensus decision, using enough correct information that the consensus reached is actually correct.
In short, as long as we have greater less than 1/3 of traitors to loyals (3M+1) we can reach a consensus of correct information.
Each general takes a turn as leutenant and sends the orders to each general. This goes around in a round robin type fashion, each redistributing information. As long as there are no more than 1/3 traitors the correct information will outwheigh the incorrect information and thus reaching a concensus.
This is necessary in a distributed system because we have different parts of the systems/software running on seperate remote machines. Either by a fault or intentionally, it is possible for these parts of the system to output incorrect information, thus causing a failure of the process to be achieved. depending on the implementation, it may be necessary to plan for this type of problem to ensure correct implementations of our software.