Part 2: architectures Flashcards
Difference between system architecture and software architecture
- System Architecture:
Refers to the actual realization of a distributed system, requiring the instantiation and placement of software components on real machines.
The final instantiation of a software architecture is also referred to as a system architecture.
Discusses traditional centralized architectures in which a single server implements most of the software components, while remote clients can access that server using simple communication means.
Also considers decentralized peer-to-peer architectures in which all nodes more or less play equal roles.
Many real-world distributed systems are often organized in a hybrid fashion, combining elements from both centralized and decentralized architectures. - Software Architecture:
Refers to the logical organization of a distributed system into software components.
Research on software architectures has matured considerably, and it’s now commonly accepted that designing or adopting an architecture is crucial for the successful development of large software systems.
An architectural style is formulated in terms of components, the way that components are connected to each other, the data exchanged between components, and finally how these elements are jointly configured into a system.
Examples of architectural styles include layering, object-based styles, resource-based styles, and styles in which handling events are prominent.
What is a component?
A component is defined as a modular unit with well-defined required and provided interfaces that is replaceable within its environment. The fact that a component can be replaced, especially while a system continues to operate, is of significant importance. This is because, in many scenarios, it’s not feasible to shut down a system for maintenance. At most, only parts of it may be temporarily disabled. The replacement of a component can only be done if its interfaces remain unchanged. A component’s replaceability is crucial in distributed systems where continuous operation is often a requirement.
Software architecture
Software architecture is about the organization of distributed systems, focusing on the software components that constitute the system. These software architectures detail how various software components are organized and how they interact. The final instantiation of a software architecture is also referred to as a system architecture.
Which are the most important Styles of Architectures for Distributed Systems
- Layered architectures
- Object-based architectures
- Data-centered architectures
- Event-based architectures
Layered architecture
The fundamental concept behind the layered style is straightforward: components are organized in a layered manner. A component at one layer can make a downcall to a component at a lower-level layer and generally expects a response. Upcalls to a higher-level component are made only in exceptional cases.
Layered architectures are universally applied and are often combined with other architectural styles. For instance, many distributed applications are divided into three layers: user interface layer, processing layer, and data layer. This division suggests various possibilities for physically distributing a client-server application across multiple machines.
A well-known application of layered architectures is in communication-protocol stacks. Each layer in these stacks implements one or several communication services, allowing data to be sent from a source to one or several targets. Each layer offers an interface specifying the functions that can be called, ideally hiding the actual implementation of a service. Another essential concept in communication is that of a protocol, which describes the rules that parties will follow to exchange information. It’s crucial to understand the difference between a service offered by a layer, the interface by which that service is made available, and the protocol used for communication.
Object-based architecture
Object-based architectures follow a more loose organization compared to other architectural styles.
In essence, each object in this architecture corresponds to what is defined as a component. These components are modular units with well-defined interfaces.
The components in object-based architectures are connected through a procedure call mechanism. This means that one component can call a procedure or function of another component.
In the context of distributed systems, a procedure call can also take place over a network. This implies that the calling object doesn’t necessarily have to be executed on the same machine as the called object. This flexibility allows for distributed processing and interaction between objects located on different machines.
Object-based architectures provide a natural way of encapsulating data (referred to as an object’s state) and the operations that can be performed on that data. This encapsulation ensures that the internal details of an object are hidden from other objects, promoting modularity and maintainability.
Data-centered architecture
The data level in data-centered architectures contains the programs that maintain the actual data on which the applications operate. Processes communicate through a common (active or passive) repository. An essential property of this level is that data is often persistent, meaning that even if no application is running, the data will be stored somewhere for the next use.
In its simplest form, the data level consists of a file system, but it’s also common to use a full-fledged database.
Besides merely storing data, the data level is generally also responsible for keeping data consistent across different applications. When databases are being used, maintaining consistency means that metadata such as table descriptions, entry constraints, and application-specific metadata are also stored at this level.
Event-based Architecture
In event-based coordination, processes are referentially decoupled and temporally coupled. This means that processes do not know each other explicitly.
A process can publish a notification describing the occurrence of an event, such as wanting to coordinate activities or producing some interesting results. Given the variety of notifications, processes may subscribe to specific kinds of notifications.
In an ideal event-based coordination model, a published notification will be delivered exactly to those processes that have subscribed to it. However, it’s generally required that the subscriber is active and running at the time the notification was published.
A well-known coordination model in this context is the combination of referentially and temporally decoupled processes, leading to what is known as a shared data space. The key idea here is that processes communicate entirely through tuples, which are structured data records consisting of several fields, similar to a row in a database table.
Shared data-space architecture
(Similar to data-centred and event-based architecture)
Shared data spaces provide a coordination model where processes are referentially decoupled and temporally coupled. This means that processes do not know each other explicitly.
Processes communicate entirely through tuples, which are structured data records consisting of several fields, similar to a row in a database table.
Processes can insert any type of tuple into the shared data space. To retrieve a tuple, a process provides a search pattern that is matched against the tuples present. Any tuple that matches the pattern is returned.
When a process wants to extract a tuple from the data space, it specifies the values of the fields it’s interested in. Any tuple that matches that specification is then removed from the data space and passed to the process.
Shared data spaces are often combined with event-based coordination. In this model, a process subscribes to certain tuples by providing a search pattern. When another process inserts a tuple into the data space, matching subscribers are notified.
meaning of:
referentially (de)coupled
temporally (de)coupled
Referentially Coupled:
When processes are referentially coupled, they have explicit references to each other. This means that a process knows the name or identifier of the other processes it wants to exchange information with. This form of coupling generally appears in the form of explicit referencing in communication. [Pages: 81]
Referentially Decoupled:
In referentially decoupled systems, processes do not know each other explicitly. For instance, in event-based coordination, a process can publish a notification describing the occurrence of an event, and other processes can subscribe to specific kinds of notifications without directly knowing the publisher. [Pages: 81-82]
Temporally Coupled:
Temporal coupling means that processes that are communicating will both have to be up and running at the same time for communication to take place. In direct coordination, when processes are both temporally and referentially coupled, communication happens directly between them. [Pages: 81]
Temporally Decoupled:
When processes are temporally decoupled, there is no need for two communicating processes to be executing at the same time to let communication take place. For example, in mailbox coordination, communication takes place by putting messages in a (possibly shared) mailbox, and the recipient can retrieve the message later even if the sender is not currently active. [Pages: 81]
Service-Oriented Architecture (SOA)
(SOA emphasizes the importance of designing and organizing distributed systems as a collection of services that can operate independently yet can be composed to achieve complex functionalities)
In a service-oriented architecture, a distributed application or system is essentially constructed as a composition of many different services. Not all of these services may belong to the same administrative organization.
The service as a whole is realized as a self-contained entity, although it can possibly make use of other services. By clearly separating various services such that they can operate independently, the path is paved toward service-oriented architectures.
Each service offers a well-defined (programming) interface. In practice, this also means that each service offers its own interface, possibly making the composition of services far from trivial.
An example provided is that of a Web shop selling goods such as e-books. A simple implementation may consist of an application for processing orders, which operates on a local database containing the e-books. Order processing typically involves selecting items, registering and checking the delivery channel, and ensuring payment. The payment can be handled by a separate service run by a different organization. In this way, developing a distributed system is partly about service composition and ensuring that those services operate in harmony.
Centralized architectures
- Centralized architectures often involve a single server that implements most of the software components, while remote clients can access that server using simple communication means. [Pages: 69-70]
Client-Server Architectures:
Many researchers and practitioners agree that thinking in terms of clients requesting services from servers helps in understanding and managing the complexity of distributed systems.
Simple Client-Server Architecture: This is a traditional way of modularizing software where a module (client) calls the functions available in another module (server). By placing different components on different machines, a natural physical distribution of functions across a collection of machines is achieved. - Client-Server Communication:
The client-server model is fundamental to distributed systems. Clients send requests to servers, which then process these requests and return the results to the clients.
In some cases, a server may act as a client, forwarding requests to other servers responsible for specific tasks. For instance, a database server might forward requests to file servers that manage specific database tables. [Pages: 91,92] - Application Layering:
Many distributed applications are divided into three layers:
User interface layer: all details necessary to the interface
Processing layer: contains typically the applications
Data layer: where data is placed
These layers can be distributed across different machines. For instance, the user interface might be on the client machine, while processing and data layers might be on the server. [Pages: 92] - Multitiered Architectures:
The distinction into three logical levels suggests various possibilities for physically distributing a client-server application across multiple machines.
The simplest organization is a two-tiered architecture where a client machine contains only the user-interface level, and a server machine contains the processing and data level.
In some cases, a three-tiered architecture is used, especially in transaction processing. Here, a separate process, known as the transaction processing monitor, coordinates all transactions across different data servers. [Pages: 91,92,94]
Physically Three-Tier Architectures:
In a three-tiered architecture, programs that form part of the processing layer are executed by a separate server. This architecture might also distribute some parts of the processing layer across both client and server machines.
An example of this architecture is in the organization of websites. A web server acts as an entry point, passing requests to an application server where the actual processing occurs. This application server then interacts with a database server. [Pages: 94]
vertical vs horizontal distribution
Vertical Distribution:
Vertical distribution refers to the organization of a distributed system by placing logically different components on different machines. This type of distribution is achieved by aligning with the logical organization of applications. For instance, in many business environments, distributed processing is equivalent to organizing a client-server application as a multitiered architecture.
The term “vertical distribution” is related to the concept of vertical fragmentation used in distributed relational databases, where tables are split columnwise and then distributed across multiple machines. [Pages: 95]
Horizontal Distribution:
Horizontal distribution is a way of organizing client-server applications where a client or server may be physically split up into logically equivalent parts. Each part operates on its own share of the complete data set, thus balancing the load.
In horizontal distribution, processes that constitute a system are all equal, meaning the functions that need to be carried out are represented by every process in the distributed system. As a result, much of the interaction between processes is symmetric, with each process acting as both a client and a server. [Pages: 95]
Structured peer-to-peer architectures
Structured peer-to-peer systems are a type of distributed system where nodes (processes) are organized in a specific, deterministic topology, such as a ring, binary tree, grid, etc. This deterministic topology is used to efficiently look up data. A key characteristic of structured peer-to-peer systems is that they typically use a semantic-free index. This means that each data item maintained by the system is uniquely identified without relying on the meaning of the data.
In structured peer-to-peer systems, the overlay network (a logical network where nodes represent processes and links represent possible communication channels) adheres to a specific topology. This topology is used to route messages efficiently between nodes. The organization of nodes in a structured overlay is deterministic, which means that given a particular data item or key, there is a specific node responsible for that key.
The structured nature of these systems allows for efficient data lookup. However, maintaining the structure requires additional overhead, especially when nodes join or leave the system. Despite this, structured peer-to-peer systems offer advantages in terms of scalability and efficiency compared to unstructured systems.
Deterministic Procedure to Build Overlay Network
Overlay Network: In distributed systems, especially in peer-to-peer architectures, an overlay network is a virtual network of nodes and logical links. The nodes represent processes, and the logical links represent possible communication channels, often realized as TCP connections. The overlay network is constructed on top of the physical network, and it abstracts the underlying infrastructure to offer services like routing, data storage, and search.
Distributed Hash Table (DHT): DHT is a key component of structured peer-to-peer systems. It provides a lookup service similar to a hash table; key-value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. Both data items and nodes are assigned random keys from a large key space (e.g., 128-bit space). The challenge is to map these keys to nodes in a manner that ensures efficient lookup.
Key Assignment: Data items are given a random key from a vast key space. Similarly, nodes in the network are also assigned a random key from the same space. The challenge is to determine how to map these keys to specific nodes to ensure efficient data lookup.
Efficient Lookup: The primary goal of DHTs is to enable efficient lookup. When a node wants to find a data item, it queries the DHT with the item’s key. The DHT then determines which node holds that item and routes the query to that node. This process should be efficient, often aiming for logarithmic time complexity in relation to the number of nodes.
The construction of the overlay network and the organization of nodes in it is crucial for the performance and robustness of distributed systems. In structured overlays, nodes have a well-defined set of neighbors, and the organization can be in forms like a logical ring or tree. The deterministic nature of these structures ensures that operations like data lookup can be performed efficiently.
In the context of peer-to-peer systems, the overlay network’s organization requires special effort, and sometimes it’s one of the more intricate parts of distributed-systems management. The goal is to ensure that the overlay network remains connected, meaning that there’s always a communication path between any two nodes, allowing them to route messages to each other.