Distributed Systems Flashcards
What are the advantages for organizing concurrent computation as threads instead of processes?
Low overhead on creation
Low overhead for context switching
Allows for easier sharing of resources
What are the advantages for organizing concurrent computation as processes instead of threads?
Allows for scalability and fault-tolerance (processes are more independent)
What is involved in converting a single, multi-threaded application into a collection of single-threaded processes. What could change about the application’s implementation and why?
- Shared resources are not shared when converted to processes, so this will need to be managed in some way.
- Communication will change to pipes/sockets. Computation that runs concurrently may need to be changed based on the way it is implemented with the threads.
- Overhead will need to be considered on startup of multiple processes and how many can run at once.
- Method of concurrent computation may need to be changed (barrier, signals, etc. instead of waiting on threads to finish)
Describe some low-level message framing strategies.
Fixed length
- you know the exact size of each message
- can easily determine if you have failed to receive the entire message
- could increase network traffic as more messages need to be sent from component to component
Variable length
- allows for more message flexibility
- need a way to determine the end of the message
- still may need to send multiple messages if a message is larger than the network allows or component allows
What are quorum-based decisions?
A minimum number of votes before a distributed transaction is carried out. Typically this is more than half of the nodes on the system.
What is a race condition?
The system’s behavior is dependent upon the timing of the logic. This can lead to inconsistencies (different results depending upon execution time) in results that the system provides.
What is the difference between two-phase and strict two-phase locking?
Both acquire a lock before critical code is executed and release a lock when it has completed critical code. Strict two-phase disallows the release of a lock until a “moderator” sends a signal for release.
How would switching from proxy to broker simplify an application?
It would not simplify the application and it could make it more complicated. A broker is more generic than a proxy and while it could lead to more flexibility fro messages within the system, it is typically more difficult to implement this kind of flexibility. You could potentially gain functionality if moving from a single proxy and nothing else to a single broker and nothing else.
How would switching from broker to proxy simplify an application?
Less flexibility in messaging would make the component’s interfaces easier to implement. The proxy is all about access control so that is all it is required to do. You could potentially lose functionality if moving from a single broker with nothing else to a single proxy and nothing else.
What is the CAP theorem?
You cannot have all three in a distributed system
Consistency: Every read gets the most recent write
Availability: Every request receives a non-error response
Partition tolerance: The system continues to operate despite an arbitrary number of drops
What are some issues with network-based communication?
Performance - latency and data transfer rate
Scalability - what hardware/software is needed to scale
Security - typical network-based security concerns
What is RPC?
Remote Procedure Call
What are some issues with RPC?
- Call by value or call by reference?
By value - a copy is passed
By reference - the actual value is passed - Byte ordering for parameters (big or little endian)
What is the difference in big- and little-endian?
Big endian - most significant bytes are first
Little endian - least significant bytes are first
What byte-ordering is important in distributed systems?
Network order - always big endian
Host byte order - depends on the machine
Access point definition
a means of access to an entity
Address definition
a location for an access point
Name definition
a string that references an entity
What is flat naming?
Names are unstructured bit strings that have no obvious connection to their referents
Example
MAC address
What is structured naming?
Names composed of multiple, supporting names
Example
IP addresses
What is attribute-based naming?
Names derived from the attributes of their referents (directory service). Given a service category, find a node that provides that service.
Example
LDAP
How to provide fault tolerance in a distributed system?
Exploit redundancy
Hot swap (backup that is up to date with the current component) ready to go on failure Distributed information (one or more components can fail but the system can be reconstructed from the remaining components)
What is page-based DSM?
Distributed shared memory based on paging.
Why used DSM over RPC?
- Entire distributed system is logically one muti-threaded application
- No need to worry about passing parameters from machine to machine
Why use RPC over DSM?
- Easier to deal with failure
- Method calls should be the same across all components of the system
- No need to worry about typical shared memory constructs
Name the different architectural styles of distributed computing according to Phil.
Unstructured - styles with no theoretical limits on how entities are coupled
Structured - styles that organize entities in ways that limit what entities a given entity can interact with directly
What are the pros and cons for unstructured styles?
Advantages
- simplicity of system’s internal organization
- ease of access among system’s entities
Disadvantages
- increased difficulty of monitoring and maintaining system operation, due to range of possible interactions among the entities
- worst case it can be a fully connected (mesh) network that requires O(n^2) connections for interactions
What is a distributed hash table?
- schemes that distribute content across objects using a function that’s computed from items’ keys
Give 2 ways to represent an unstructured style.
object-based - unordered collection of components
resource-based - unordered collection of resources
How are object-based styles implemented?
- definitive version of object, including its state, positioned at a single, base node in a distributed network
- to use an object, other nodes bind to it, then access it
- the bind creates proxy representation of the remote object at local host
What is a proxy?
a software construct that controls access to other objects; it redirects access to the remote object and relays remote object’s responses to local hosts
Assess object-based style.
Advantages
- can improve application transparency by blurring distinction between remote and local objects
- natural model for session-based interaction with remote objects
- natural model for encapsulating data
Disadvantages
- can’t fully mask impact of network on communications (network failure, remote host failure, impracticality of duplicating large amounts of local host state at remote object site)
- excessive use of objects can degrade host performance, creating computational bottlenecks
Assess session-based interaction.
Advantages
- simplifies an application’s coding
- reduces flow of traffic across the network
Disadvantages
- state information of the remote object must be maintained
- lends itself to high duplication of data
- increases complexity of the messages (every message must be self-contained so this may lead to reauthentication for every message in session)
How are resource-based styles implemented?
- resources are identified via a standard naming scheme
- services offer a uniform, message-based interface
- messages are self-describing
- resources maintain no state on callers
Give an example of resource-based implementation.
REpresentational State Transfer (REST)
Describe REST.
- resources named using Uniform Resource Indicators (URIs)
- offers a CRUD-style, PUT-GET-POST-DELETE protocol for managing state
- communication protocol (HTTPS) is still stateless
Assess resource-based interaction.
Advantages
- reduces performance impact on remote object host by eliminating need to for host to maintain state
- simplifies service implementation by removing need for remote object host to recover from host and network failure
Disadvantages
- potentially complicates client-side implementation
What are the different types of structured styles?
- Horizontal
- Commons-based
- Layered
- Hierarchical
- Other miscellaneous