Distributed Systems Flashcards

1
Q

What are the advantages for organizing concurrent computation as threads instead of processes?

A

Low overhead on creation
Low overhead for context switching
Allows for easier sharing of resources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the advantages for organizing concurrent computation as processes instead of threads?

A

Allows for scalability and fault-tolerance (processes are more independent)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is involved in converting a single, multi-threaded application into a collection of single-threaded processes. What could change about the application’s implementation and why?

A
  • Shared resources are not shared when converted to processes, so this will need to be managed in some way.
  • Communication will change to pipes/sockets. Computation that runs concurrently may need to be changed based on the way it is implemented with the threads.
  • Overhead will need to be considered on startup of multiple processes and how many can run at once.
  • Method of concurrent computation may need to be changed (barrier, signals, etc. instead of waiting on threads to finish)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe some low-level message framing strategies.

A

Fixed length

  • you know the exact size of each message
  • can easily determine if you have failed to receive the entire message
  • could increase network traffic as more messages need to be sent from component to component

Variable length

  • allows for more message flexibility
  • need a way to determine the end of the message
  • still may need to send multiple messages if a message is larger than the network allows or component allows
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are quorum-based decisions?

A

A minimum number of votes before a distributed transaction is carried out. Typically this is more than half of the nodes on the system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a race condition?

A

The system’s behavior is dependent upon the timing of the logic. This can lead to inconsistencies (different results depending upon execution time) in results that the system provides.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between two-phase and strict two-phase locking?

A

Both acquire a lock before critical code is executed and release a lock when it has completed critical code. Strict two-phase disallows the release of a lock until a “moderator” sends a signal for release.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How would switching from proxy to broker simplify an application?

A

It would not simplify the application and it could make it more complicated. A broker is more generic than a proxy and while it could lead to more flexibility fro messages within the system, it is typically more difficult to implement this kind of flexibility. You could potentially gain functionality if moving from a single proxy and nothing else to a single broker and nothing else.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How would switching from broker to proxy simplify an application?

A

Less flexibility in messaging would make the component’s interfaces easier to implement. The proxy is all about access control so that is all it is required to do. You could potentially lose functionality if moving from a single broker with nothing else to a single proxy and nothing else.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the CAP theorem?

A

You cannot have all three in a distributed system

Consistency: Every read gets the most recent write
Availability: Every request receives a non-error response
Partition tolerance: The system continues to operate despite an arbitrary number of drops

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are some issues with network-based communication?

A

Performance - latency and data transfer rate
Scalability - what hardware/software is needed to scale
Security - typical network-based security concerns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is RPC?

A

Remote Procedure Call

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are some issues with RPC?

A
  • Call by value or call by reference?
    By value - a copy is passed
    By reference - the actual value is passed
  • Byte ordering for parameters (big or little endian)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the difference in big- and little-endian?

A

Big endian - most significant bytes are first

Little endian - least significant bytes are first

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What byte-ordering is important in distributed systems?

A

Network order - always big endian

Host byte order - depends on the machine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Access point definition

A

a means of access to an entity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Address definition

A

a location for an access point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Name definition

A

a string that references an entity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is flat naming?

A

Names are unstructured bit strings that have no obvious connection to their referents

Example
MAC address

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is structured naming?

A

Names composed of multiple, supporting names

Example
IP addresses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is attribute-based naming?

A

Names derived from the attributes of their referents (directory service). Given a service category, find a node that provides that service.

Example
LDAP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How to provide fault tolerance in a distributed system?

A

Exploit redundancy

Hot swap (backup that is up to date with the current component) ready to go on failure
Distributed information (one or more components can fail but the system can be reconstructed from the remaining components)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is page-based DSM?

A

Distributed shared memory based on paging.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Why used DSM over RPC?

A
  • Entire distributed system is logically one muti-threaded application
  • No need to worry about passing parameters from machine to machine
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Why use RPC over DSM?

A
  • Easier to deal with failure
  • Method calls should be the same across all components of the system
  • No need to worry about typical shared memory constructs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Name the different architectural styles of distributed computing according to Phil.

A

Unstructured - styles with no theoretical limits on how entities are coupled

Structured - styles that organize entities in ways that limit what entities a given entity can interact with directly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are the pros and cons for unstructured styles?

A

Advantages

  • simplicity of system’s internal organization
  • ease of access among system’s entities

Disadvantages

  • increased difficulty of monitoring and maintaining system operation, due to range of possible interactions among the entities
  • worst case it can be a fully connected (mesh) network that requires O(n^2) connections for interactions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is a distributed hash table?

A
  • schemes that distribute content across objects using a function that’s computed from items’ keys
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Give 2 ways to represent an unstructured style.

A

object-based - unordered collection of components

resource-based - unordered collection of resources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

How are object-based styles implemented?

A
  • definitive version of object, including its state, positioned at a single, base node in a distributed network
  • to use an object, other nodes bind to it, then access it
  • the bind creates proxy representation of the remote object at local host
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is a proxy?

A

a software construct that controls access to other objects; it redirects access to the remote object and relays remote object’s responses to local hosts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Assess object-based style.

A

Advantages

  • can improve application transparency by blurring distinction between remote and local objects
  • natural model for session-based interaction with remote objects
  • natural model for encapsulating data

Disadvantages

  • can’t fully mask impact of network on communications (network failure, remote host failure, impracticality of duplicating large amounts of local host state at remote object site)
  • excessive use of objects can degrade host performance, creating computational bottlenecks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Assess session-based interaction.

A

Advantages

  • simplifies an application’s coding
  • reduces flow of traffic across the network

Disadvantages

  • state information of the remote object must be maintained
  • lends itself to high duplication of data
  • increases complexity of the messages (every message must be self-contained so this may lead to reauthentication for every message in session)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

How are resource-based styles implemented?

A
  • resources are identified via a standard naming scheme
  • services offer a uniform, message-based interface
  • messages are self-describing
  • resources maintain no state on callers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Give an example of resource-based implementation.

A

REpresentational State Transfer (REST)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Describe REST.

A
  • resources named using Uniform Resource Indicators (URIs)
  • offers a CRUD-style, PUT-GET-POST-DELETE protocol for managing state
  • communication protocol (HTTPS) is still stateless
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Assess resource-based interaction.

A

Advantages

  • reduces performance impact on remote object host by eliminating need to for host to maintain state
  • simplifies service implementation by removing need for remote object host to recover from host and network failure

Disadvantages
- potentially complicates client-side implementation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What are the different types of structured styles?

A
  • Horizontal
  • Commons-based
  • Layered
  • Hierarchical
  • Other miscellaneous
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Assess structured-based styles.

A

Advantages

  • facilitates maintaining system operation due to limiting of direct object interaction
  • accommodates networks where some indirect interactions are unsupported

Disadvantages

  • restrictions complicate system fabric by introducing need for routing of communications between indirectly connected entities
  • restrictions increase probability of network partitioning
40
Q

Describe horizontal styles.

A
  • roughly, a linked-list representation of objects
  • can be strict (left and right neighbor) or loose (left and right neighbor with O(1) shortcut to other objects in the architecture)
  • can be singly- or doubly-linked (one-way or two-way request flow)
41
Q

Assess horizontal styles.

A

Advantages
- conceptually and operationally simple

Disadvantages
- specialized applicability

42
Q

Give 2 examples of horizontal styles.

A

Linear: strict horizontal style where start and end node are not connected
- something like a message parser (decrypt, authenticate, remove duplication, etc.)

Ring: loose horizontal style where start and end node are connected with potential chords throughout the structure
- distributed hash table

43
Q

Give 2 ways to implement distributed hash tables.

A

Regular ring

  • you have nodes used for lookup
  • each lookup node has keys associated with it
  • if node is in the current node’s lookup table, great, jump to it
  • if not, go to next lookup node

Chordal

  • each chord contains a finger table that references other nodes on the network
  • to look for node n, visit the first node with a finger table
  • if node n is in the finger table, great, jump there
  • if not, visit the largest node in the finger table that does not exceed the desired node
  • repeat until node is found
  • if nodes in the finger table are all larger, hop to next node (linear search) until node is found or another valid finger table is found
44
Q

Describe commons-based style.

A
  • any style that is referentially decoupled
  • components interact via a central information hold
  • the central information center contains information about the computational environment
45
Q

Describe referential coupling.

A
  • referentially coupled requires entities to explicitly name each other when communicating
  • referentially decoupled entities do not name each other when communicating
46
Q

Describe 2 different ways for centralized commons to be organized.

A

Entity-oriented (temporally decoupled)

  • the commons functions as a shared store
  • items in the commons can be accessed indefinitely until removed

Event-oriented (temporally coupled)

  • items in the commons have a limited useful lifespan
  • each item can only be accessed by processes that are active when that item was added to the commons
47
Q

Describe layered styles.

A
  • components are organized into groups and stacked such that modules in a layer depend on modules beneath them
  • full/strict/closed (layer n is limited to reference n-1)
  • partial/open (layer n can reference all layers below)
48
Q

Assess layered styles.

A

Advantages

  • aids understanding when clear decompositions of components into layers is achievable
  • aids maintainability by isolating concerns by layer
  • aids portability and adaptability by allowing for substitutions by layer

Disadvantages

  • functionality can be difficulty to layer
  • layering potentially decreases performance
49
Q

Describe hierarchical styles.

A
  • generalization of layered style where system’s components interact along lines of communication that are structured as trees

Advantages

  • applicable to networks with limited element-to-element connectivity
  • strikes a balance between component proximity and number of connectors
  • intuitive
  • algorithmic techniques for tree manipulation are well-known
50
Q

Define architectural style.

A

strategy for organizing elements and their interactions

51
Q

Give an architectural example of an unstructured, object-based style.

A

Unstructured Peer

52
Q

Describe an unstructured peer system.

A
  • system organized as a graph of random connections

- requests and data flow through system using various ad-hoc techniques

53
Q

Discuss 4 peer system routing strategies.

A

Flooding - each node distributes all messages to all neighboring nodes

Random walk - messages follow random pathways through the network

Anti-entropy - push-pull on random nodes

Gossiping - nodes distribute messages to their neighbors with a given probability, either fixed or based on history (informed gossip)

54
Q

Assess unstructured peer.

A

Advantages

  • readily distributes over a network
  • easy to scale
  • easy to reuse

Disadvantages

  • difficult to analyze and debug due to relative lack of structure
  • difficult to provide interoperability in networks of heterogeneous components
55
Q

Give an architectural example of an unstructured, resource-based style.

A

Microservice architecture

56
Q

Describe a microservice architecture.

A
  • graph of small services defined by their APIs

- each microservice has arbitrary connections to the others

57
Q

Assess microservice architectures.

A

Advantages

  • low coupling, autonomous, flexible
  • maps well to DevOps
  • independently scalable
  • easier deployment per service
  • system should be more available

Disadvantages

  • end-to-end testability is more difficult
  • logging and overall performance monitoring is harder
  • can get very complex if there are many components
  • performance suffers, due to latencies from repeated message passing
58
Q

Give an example of a horizontal architectural style.

A

Pipe-and-filter

59
Q

Describe a pipe-and-filter architecture.

A
  • upstream components accept data, process it, and pass it downstream
  • architecture must coordinate data flow (asynchronous and synchronous flows)

Example - compilers

60
Q

Assess pipe-and-filter architectures.

A

Advantages

  • simple to implement
  • lends itself toward reusable components
  • easy to maintain
  • provides a natural basis for concurrency

Disadvantages

  • error handling (no global state, restart is often difficult)
  • data flow must be uniform along pipes for best performance
  • forces batch mode processing
61
Q

Give 2 examples of data-oriented architectures.

A

Repository and tuple space

62
Q

Describe a repository architecture.

A
  • supports communication by multiple processes through a shared, persistent store

Examples - files, registry keys, databases

63
Q

Describe a tuple space architecture.

A
  • specialized, distributed repository architecture when processes store, read, and retrieve tuples
64
Q

Give 3 examples of event-oriented architectures.

A

Blackboard, publish-subscribe, service-oriented

65
Q

Discuss blackboard architecture.

A
  • like repository, but with active central store (changes to blackboard trigger events in processes)
  • can either have central (events trigger in blackboard) or distributed (events trigger in processes) control
66
Q

Discuss publish-subscribe architecture.

A
  • like a hybrid between blackboard and repository patterns
  • publishers store data to central store (looks like a write-only repository to publishers)
  • subscribers register for data from server which pushes messages of interest on receipt (looks like a read-only repository to subscribers)
67
Q

What are the advantages of publish-subscribe architectures?

A
  • decouples publishers from subscribers (publishes can publish independently of subscribers, and subscribers can obtain content asynchronously)
  • can support anonymity of both publishers and subscribers
68
Q

Describe a service-oriented architecture.

A
  • created to support integration of heterogeneous systems

- allows multiple peer components to interact via a “software backplane”

69
Q

Give 4 examples of layered architectures.

A

Client-server, tiered computing, broker, and proxy

70
Q

Describe a broker.

A
  • provided by some middleware
  • allows for dynamic binding between client and service provider at beginning of interaction

Interaction sequence

  • before interaction, the server registers with the broker
  • a client requests a service by type
  • broker responds with the address of a server that can provide the requested service
  • client and server communicate directly after that
71
Q

Describe a proxy.

A
  • all communication occurs via an intermediate process

- shields server(s) from client

72
Q

Compare client-server, broker, and proxy

A

Client-server vs. broker, proxy

  • simpler
  • faster: no initial or intermediate communication

Broker

  • vs. client-server: more flexible and scalable (broker can route all requests to any server at the start of a session)
  • vs. proxy: faster (less overhead for multi-message session)

Proxy

  • vs. client-server: more flexible and scalable (allows proxy to route any requests to any server at any point during a session)
  • vs. broker: more secure (hides server(s) from client throughout session)
73
Q

Discuss service provisioning.

A

3 types

  • Infrastructure as a Service (IaaS): network-accessible resources that function as machines
  • Platform as a Service (PaaS): network-accessible resources that function as operating systems
  • Software as a Service (SaaS): network-accessible resources that function as individual services

Management

  • On-premises: you manage everything
  • IaaS: you manage everything up to the OS
  • PaaS: you manage the applications and data
  • SaaS: you manage nothing (only use the service)

Standardization

  • IaaS: instruction sets (achieved through virtualization)
  • PaaS: application environments; Linux, Windows, etc.
  • SaaS: applications
74
Q

What is the difference between virtual machines and virtualization?

A
  • virtual machines do not mirror read hardware

- virtualization mirrors real hardware

75
Q

What is the difference between asynchronous and synchronous network communication?

A
  • with synchronous communication, the client blocks further processing until the request is handled (wastes cycles but the handling of the request – success/failure – is easily known)
  • with asynchronous communication, the client spins a thread/process to make the request while continuing other processing (no wasted cycles but the handling of the request is not easily known)
76
Q

What is the TCP/IP model?

A
Application
Transport
Internet Protocol
Network Access
Physical
77
Q

What is marshalling/unmarshalling?

A
  • the serialization/deserialization of objects/data
78
Q

Describe middleware in a distributed system.

A
  • provides extra support for distributing functionality across a network, typically through providing adapters for relaying messages

Example - MPI

79
Q

What is the main issue with delivering multimedia in a distributed system?

A
  • how does the system ensure that the flow of data is timely enough to meet user expectations for presentation quality in real-time
80
Q

What are some basic options for reducing communications overhead in a distributed system?

A
  • reduce the overall number of messages
  • reduce time to send via caching
  • eliminate needless content
81
Q

What is the difference between authentication and authorization?

A
  • authentication provides access control to the overall system via passwords, usernames, digital fingerprints, etc.
  • authorization provides access control to content/resources within the system via user groups
82
Q

Describe hierarchical routing.

A
  • overlay routing network arranged in the form of a tree
  • tree is balanced (as much as possible)
  • to find a node on the network, start at leaves and work up
83
Q

Describe asymmetric (public-key) cryptography.

A
  • uses two keys
  • public key can be distributed to anyone
  • private key only known by the holder
  • to encrypt a message, the sender uses the receiver’s public key
  • to decrypt a message, the receiver uses their private key
84
Q

Describe coordination in a distributed system.

A
  • ensure that relationships between components hold true at specified points in time
  • two different strategies (centralized and distributed)
85
Q

What are the two basic approaches to time in a distributed system?

A

Absolute (physical) time - using time to coordinate action

Relative (logical) time - using exchanges of messages among cooperating processes to sequence actions

86
Q

Discuss some difficulties with physical time.

A
  • difficult to measure
  • transmissions of time are subject to jitter
  • clocks drift, creating skew
87
Q

Define jitter.

A
  • variability in message latency
88
Q

Define latency.

A
  • time between message send and receipt
89
Q

Define clock drift.

A
  • tendency of a clock to run at a different rate from a reference clock
90
Q

Define clock skew.

A
  • difference between a reference clock and other clocks
91
Q

How do we adjust for jitter and skew in time?

A
  • Cristian’s algorithm - a.k.a. Network Time Protocol (NTP)
  • Berkeley algorithm
  • Reference Broadcast Synchronization (RBS)
  • Google’s TrueTime
92
Q

Describe Cristian’s algorithm.

A
  • host requests UTC from time server, remembers as T1
  • time server sends T2 (receipt of request) and T3 (when response was sent)
  • T4 is when the response was received by the host
  • host adjusts T3 by ((T2 - T1) + (T4 - T3))/2
93
Q

What is mutual exclusion?

A
  • limiting number of processes that can access a pool of resources at any one time
94
Q

What are two basic strategies for assuring mutual exclusion?

A
  • token-based: process seeking access acquires a token and keeps access until surrendering token
  • permission-based: process seeking access obtains permission before acting
95
Q

What are two basic strategies for implementing each mutual exclusion assurance measure?

A
  • centralized: one coordinating process manages access to resource
  • decentralized: competing processes coordinate access together
96
Q

What are the two basic levels of consistency?

A
  • strict consistency: each read retrieves the most recent write to the resource
  • eventual consistency: reads are not guaranteed to contain the most recent writes to the resource but they will eventually have the most recent updates over time