Distributed Systems Flashcards

1
Q

What is a distributed system?

A

A collection of independent and dynamic components, including hardware, software components and web services which work together and appear to users as a single coherent system. They help us to share resources of components through communication.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is middleware?

A

Software technology that enables distributed system components to work together and communicate as if they were virtually non-distributed. It hides DS developers from low-level networking details, and provides abstraction and infrastructure for constructing distributed applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a remote procedure call?

A

Where we adopt the traditional paradigm of splitting a system into procedures, and mask remote function calls as being local instead. This is usually implemented with message passing. The programmer executes a remote function without coding the network communication; uses C.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does RPC work?

A

The caller sends arguments to the client-side stub, which marshals them, generates ID, starts timer, sends message to server-side stub, which unmarshals and feeds the arguments to the remote function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Limitations of RPC

A

Synchronous request/reply interaction means that client may be clocked for a long time if server is overloaded, and slow clients may delay servers. Host information is required, which means location transparency cannot be facilitated. It’s also not object oriented.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Components of RPC

A

Server: defines the service interface using an interface definition language, which specifies names, parameters, and types for all client-callable procedures

Stub compiler: reads the IDL declarations and produces two stub functions, one server-side and the other client-side

Linking: server programmer implements the service’s functions and links with the server-side stubs, client programmer implements the client program and links it with client-side stubs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Object Oriented Middleware?

A

Remote objects are visible through remote interfaces, RMI masks remote objects as being local using proxy objects, skeleton object directs incoming calls from clients to the appropriate object. The object request broker identifies / discovers remote objects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Remote Method Invocation

A

RMI originated from Java to allow object-to-object communication among Java objects for realizing a distributed system. RMI allows us to distribute our objects on various machines and invoke methods on the objects located on remote sites.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a registry

A

A registry is a running process on a host machine, which maintains names of remote objects and helps with looking up those objects. Servers can register their objects, clients can find server objects by name and obtain stubs for them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is message-oriented middleware?

A

Communication is done using messages, which are stored in message queues. Message servers decouple client and server.
Asynchronous interaction, meaning client and server are only loosely coupled compared to RMI and COBRA. Messages are queued and may be processed/filtered/transformed by the message server. The queues are also in persistent storage, and may be processed by intermediate message servers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Properties of OOM

A

Follows object-oriented programming model, synchronous request/reply interaction, location transparency (the ability to access objects without the knowledge of their location)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are web services?

A

Web services provide a service interface enabling users to communicate with its servers through the internet. The service interface describes operations of web services. They use web-based protocols based on HTTP, designed to work over the public internet. This allows these protocols to traverse firewalls and work in a heterogeneous environment. They count as a middleware technology.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is WSDL

A

Web Services Description Language: interface description for web services, also has details of communication method and URL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is replication and what are the two different types?

A

Replication provides multiple copies of the same data or functionalities (services) in a distributed system. It improves system capabilities in terms of performance, availability and load distribution.

Computation (service) replication: multiple instances of the same functional process are executed, but may run on different hardware and be implemented by different algorithms.

Data replication: same piece of information is being stored on multiple devices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How are incoming requests received when we have replication?

A

The front-end received the request, forwards it to replica servers
Rs accept a request and decide the ordering of a request relative to other requests
They process the request
They reach consensus on the effect of the requests
They reply to the front end, optionally processing the response beforehand

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Fault-tolerant services

A

Provide a correct service despite up to N process failures
Each replica is assumed to behave according to the specification of the distributed system, when they have not crashed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

When is a service based on replication correct?

A

If it keeps responding despite failures (failure transparency), and if clients can’t tell the difference between the service they obtain from an implementation with replicated data and one provided by a single correct replica manager.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How are incoming requests received when we have passive replication?

A

Request: a FE issues the request, containing a unique identifier, to the primary R
The primary processes each request atomically in the order in which they were received
It checks the unique ID, and resends the response if it has already done that request
The primary executes the request and stores the response
If the request is an update, the primary sends the updated state, the response, and the unique identifier to all the backups, which then send an acknowledgement
Response: the primary responds to the FE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Advantages of passive replication?

A

This type of system can survive up to n replica crashes, when the system comprises of n+1 replicas. It requires very little front-end functionality, only needing to lookup new primary replica when the current one is not available.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is active replication?

A

Where the Rs are all state machines playing the same role, organised as a group. They all start in the same state and perform the same task in the same order so that their state remains identical (synchronisation).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How are incoming requests managed when we have active replication?

A
  • Request: a FE attaches a unique ID and uses totally ordered reliable multicast to send requests to Rs. At worst, FE crashes but never sends requests in parallel.
  • Coordination: the multicast delivers requests to all the Rs in the same order
  • Every R executes the request, and the ID is put in the response
  • No agreement is requird because they all execute the same operations in the same order
  • The FEs collect responses from Rs. If the FE is only trying to tolerate crash failures, it just gives the client the first response
22
Q

What is a Byzantine fault?

A

An arbitrary fault that occurs during the execution of an algorithm when multiple servers are involved to produce results based on the same client request. It’s when inconsistent results are obtained from the servers because of omission failures (crashes, failing to receive/send stuff) or commission failures (processing a request incorrectly, corrupting local state, sending an incorrect response).

23
Q

How does active replication mask Byzantine failures?

A

The system can mask up to n Byzantine failures if the system incorporates at least 2n+1 replicas. This is because the front end waits until it has collected n+1 identical responses and then passes that response back to the client, ignoring the others.

24
Q

Read-only requests and active replication

A

If there’s a read-only request, then the front end might only send it to one replica. We lose fault tolerance, but remain sequentially consistent and can easily mask replica failure by just passing the read-only request to another replica.

25
Q

What are deadlocks?

A

When two transactions are both stuck waiting for each other to give up the lock on an item

26
Q

Why is centralised deadlock detection a bad idea?

A

It depends on a single server, suffering from poor availability, lack of fault tolerance and no ability to scale. If the global graph is collected less frequently, deadlocks may take longer to be detected. There are also phantom deadlocks, which are detected but aren’t really deadlocks. The global deadlock detector requires time to receive local wait-for graphs for constructing the global one.

27
Q

What is edge chasing?

A

Where each server involved attempts to find cycles in the wait-for graph by forwarding messages called probes. A probe message consists of transaction wait-for relationships representing a local path in the global wait-for graph.

28
Q

What does it mean if operations are linearisable in distributed systems?

A

The interleaved sequence of operations meets the specification of a (single) correct copy of the objects.
The order of operations in the interleaving is consistent with the real times at which the operations occurred in the actual execution.

29
Q

Modeling of consistency control

A

An object is coherent if the value returned by a read operation is always the value that the user expected
The consistency model defines rules for the apparent order and visibility of updates, and it is a continuum with trade-offs

30
Q

What is strict consistency?

A

The strongest consistency model, where any read on a data item X returns a value corresponding to the result of the most recent write on X. This requires an absolute global time and practically can only be implemented within one single machine.

31
Q

What is sequential consistency?

A

For any transaction, there is some interleaving of the series of operations that satisfies these criteria:
- The interleaved sequence of operations meets the specification of a (single) correct copy of the objects
- The order of operations in the interleaving is consistent with the program order in which each individual process executed them. For each process, the interleaving of operations has the same order for the process’ operations as in the process description.
MEETS SPEC, RIGHT ORDER
No reference to most recent time, absolute global time does not play a role

32
Q

When is an execution of operations sequentially consistent?

A

It can be rearranged into a sequence that respects the order of each process, and each read operation returns the value of the last preceding write operation over the same variable.
If one process has R(x)b and R(x)a, and the other has R(x)a and R(x)b then it’s not sequentially consistent

33
Q

Causal Consistency

A

If event B is caused or influenced by an earlier event A, causality requires that everyone first see A and then B
Concurrent: operations that are not causally related. If W(x)b comes before W(x)c and there is no intervening read operation, then we’re allowed to have:
R(x)c R(x)b
R(x)b R(X)c

34
Q

Time redundancy for fault tolerance

A

Perform same operation multiple times
Detects temporary faults but not permanent ones, also impacts system performance

35
Q

Component redundancy

A

Introduce two or more independent running components which provide the same functionalities
N-version programming implements multiple versions of the program

36
Q

Information redundancy

A

Encode outputs with error detecting or correcting code (e.g. parity check, checksum)
Less hardware required than replicating module, supports fault detection
Added complexity, limited fault recovery

37
Q

Backward and forward recovery

A

Backward recovery: move the system back to a failure-free state
Forward recovery: find a new state from which the system can continue operation

38
Q

Checkpointing for backward recovery

A

Each DS component periodically saves its state, which contains sufficient information to restart component execution. Global checkpoints are when every component makes a local checkpoint. The most recent one is called the recovery line.

39
Q

Types of Checkpointing

A

Uncoordinated checkpointing
Each process takes its checkpoints independently

Coordinated checkpointing
Process coordinate process checkpoints in order to save a system-wide (global) consistent state, avoids domino effect

Communication-induced checkpointing
Force each process to take checkpoints based on information piggybacked on the application messages it receives from other processes (add forced checkpoints -> recovery line)

40
Q

Domino effect

A

Cascaded rollback which causes the system to roll back too far

41
Q

Implementing forward recovery

A

Switch from a failed to a non-failed component executing the same task-code
Error compensation is continuously applied,e.g., Voting schemes (address Byzantine fault)
Simulate application response (e.g. in online gaming)

42
Q

Measuring reliability and availability

A

Reliability: mean time between failures
Availability: percentage of time it’s ready to use
A = MTBF/(MTBF+MTTR)
MTTR = mean time to repair

43
Q

Load balancing with JSQ

A

Joining the server with the shortest queue. This has a high implementation cost for large systems like datacenters.

44
Q

Power-of-d algorithm

A

The algorithm chooses d out of N servers, and route the job to the least loaded one

45
Q

Join-idle-queue algorithm

A

Servers can inform the front-end or loadbalancer when they have no jobs (become idle). Then loadbalancer can send an incoming job to one of these idle servers.

46
Q

Estimating the load

A

Queue length of waiting tasks
CPU and CPU utilisation
Storage read/write bandwidth utilisation
Network bandwidth utilisation
Application dependent factors

This information is collected centrally

47
Q

Layer 4 routing vs Layer 7 routing

A

4: Server selection is purely based on information from the IP header
7: Examine a request at the application level and select a server accordingly.

48
Q

Types of load distribution algorithm

A

Static: decisions are hard-coded into algorithm, requires a priori knowledge of system

Dynamic: make decision during runtime based off system states

Adaptive: enhance dynamic approach by allowing a choice of decision algorithms and the frequency of collecting load information

49
Q

Transfer and selection policies

A

Transfer policy: decides whether a server needs to transfer tasks
Selection policy: determines which tasks to transfer

50
Q

Location and information policies

A

Location policy: Decides which receiving server for transferring a task
Information policy: decides which information to collect. Demand-driven: server collects state of other servers when it becomes sender/receiver, or periodic

51
Q

Problems with sender-initiated algorithm

A

Can become unstable at high loads, hard to find receivers, increased polling activity may make the system more unstable