Part 2: architectures Flashcards

1
Q

Difference between system architecture and software architecture

A
  • System Architecture:
    Refers to the actual realization of a distributed system, requiring the instantiation and placement of software components on real machines.
    The final instantiation of a software architecture is also referred to as a system architecture.
    Discusses traditional centralized architectures in which a single server implements most of the software components, while remote clients can access that server using simple communication means.
    Also considers decentralized peer-to-peer architectures in which all nodes more or less play equal roles.
    Many real-world distributed systems are often organized in a hybrid fashion, combining elements from both centralized and decentralized architectures.
  • Software Architecture:
    Refers to the logical organization of a distributed system into software components.
    Research on software architectures has matured considerably, and it’s now commonly accepted that designing or adopting an architecture is crucial for the successful development of large software systems.
    An architectural style is formulated in terms of components, the way that components are connected to each other, the data exchanged between components, and finally how these elements are jointly configured into a system.
    Examples of architectural styles include layering, object-based styles, resource-based styles, and styles in which handling events are prominent.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a component?

A

A component is defined as a modular unit with well-defined required and provided interfaces that is replaceable within its environment. The fact that a component can be replaced, especially while a system continues to operate, is of significant importance. This is because, in many scenarios, it’s not feasible to shut down a system for maintenance. At most, only parts of it may be temporarily disabled. The replacement of a component can only be done if its interfaces remain unchanged. A component’s replaceability is crucial in distributed systems where continuous operation is often a requirement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Software architecture

A

Software architecture is about the organization of distributed systems, focusing on the software components that constitute the system. These software architectures detail how various software components are organized and how they interact. The final instantiation of a software architecture is also referred to as a system architecture.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which are the most important Styles of Architectures for Distributed Systems

A
  • Layered architectures
  • Object-based architectures
  • Data-centered architectures
  • Event-based architectures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Layered architecture

A

The fundamental concept behind the layered style is straightforward: components are organized in a layered manner. A component at one layer can make a downcall to a component at a lower-level layer and generally expects a response. Upcalls to a higher-level component are made only in exceptional cases.
Layered architectures are universally applied and are often combined with other architectural styles. For instance, many distributed applications are divided into three layers: user interface layer, processing layer, and data layer. This division suggests various possibilities for physically distributing a client-server application across multiple machines.
A well-known application of layered architectures is in communication-protocol stacks. Each layer in these stacks implements one or several communication services, allowing data to be sent from a source to one or several targets. Each layer offers an interface specifying the functions that can be called, ideally hiding the actual implementation of a service. Another essential concept in communication is that of a protocol, which describes the rules that parties will follow to exchange information. It’s crucial to understand the difference between a service offered by a layer, the interface by which that service is made available, and the protocol used for communication.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Object-based architecture

A

Object-based architectures follow a more loose organization compared to other architectural styles.
In essence, each object in this architecture corresponds to what is defined as a component. These components are modular units with well-defined interfaces.
The components in object-based architectures are connected through a procedure call mechanism. This means that one component can call a procedure or function of another component.
In the context of distributed systems, a procedure call can also take place over a network. This implies that the calling object doesn’t necessarily have to be executed on the same machine as the called object. This flexibility allows for distributed processing and interaction between objects located on different machines.
Object-based architectures provide a natural way of encapsulating data (referred to as an object’s state) and the operations that can be performed on that data. This encapsulation ensures that the internal details of an object are hidden from other objects, promoting modularity and maintainability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data-centered architecture

A

The data level in data-centered architectures contains the programs that maintain the actual data on which the applications operate. Processes communicate through a common (active or passive) repository. An essential property of this level is that data is often persistent, meaning that even if no application is running, the data will be stored somewhere for the next use.
In its simplest form, the data level consists of a file system, but it’s also common to use a full-fledged database.
Besides merely storing data, the data level is generally also responsible for keeping data consistent across different applications. When databases are being used, maintaining consistency means that metadata such as table descriptions, entry constraints, and application-specific metadata are also stored at this level.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Event-based Architecture

A

In event-based coordination, processes are referentially decoupled and temporally coupled. This means that processes do not know each other explicitly.
A process can publish a notification describing the occurrence of an event, such as wanting to coordinate activities or producing some interesting results. Given the variety of notifications, processes may subscribe to specific kinds of notifications.
In an ideal event-based coordination model, a published notification will be delivered exactly to those processes that have subscribed to it. However, it’s generally required that the subscriber is active and running at the time the notification was published.
A well-known coordination model in this context is the combination of referentially and temporally decoupled processes, leading to what is known as a shared data space. The key idea here is that processes communicate entirely through tuples, which are structured data records consisting of several fields, similar to a row in a database table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Shared data-space architecture

A

(Similar to data-centred and event-based architecture)
Shared data spaces provide a coordination model where processes are referentially decoupled and temporally coupled. This means that processes do not know each other explicitly.
Processes communicate entirely through tuples, which are structured data records consisting of several fields, similar to a row in a database table.
Processes can insert any type of tuple into the shared data space. To retrieve a tuple, a process provides a search pattern that is matched against the tuples present. Any tuple that matches the pattern is returned.
When a process wants to extract a tuple from the data space, it specifies the values of the fields it’s interested in. Any tuple that matches that specification is then removed from the data space and passed to the process.
Shared data spaces are often combined with event-based coordination. In this model, a process subscribes to certain tuples by providing a search pattern. When another process inserts a tuple into the data space, matching subscribers are notified.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

meaning of:
referentially (de)coupled
temporally (de)coupled

A

Referentially Coupled:
When processes are referentially coupled, they have explicit references to each other. This means that a process knows the name or identifier of the other processes it wants to exchange information with. This form of coupling generally appears in the form of explicit referencing in communication. [Pages: 81]
Referentially Decoupled:
In referentially decoupled systems, processes do not know each other explicitly. For instance, in event-based coordination, a process can publish a notification describing the occurrence of an event, and other processes can subscribe to specific kinds of notifications without directly knowing the publisher. [Pages: 81-82]
Temporally Coupled:
Temporal coupling means that processes that are communicating will both have to be up and running at the same time for communication to take place. In direct coordination, when processes are both temporally and referentially coupled, communication happens directly between them. [Pages: 81]
Temporally Decoupled:
When processes are temporally decoupled, there is no need for two communicating processes to be executing at the same time to let communication take place. For example, in mailbox coordination, communication takes place by putting messages in a (possibly shared) mailbox, and the recipient can retrieve the message later even if the sender is not currently active. [Pages: 81]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Service-Oriented Architecture (SOA)

A

(SOA emphasizes the importance of designing and organizing distributed systems as a collection of services that can operate independently yet can be composed to achieve complex functionalities)
In a service-oriented architecture, a distributed application or system is essentially constructed as a composition of many different services. Not all of these services may belong to the same administrative organization.
The service as a whole is realized as a self-contained entity, although it can possibly make use of other services. By clearly separating various services such that they can operate independently, the path is paved toward service-oriented architectures.
Each service offers a well-defined (programming) interface. In practice, this also means that each service offers its own interface, possibly making the composition of services far from trivial.
An example provided is that of a Web shop selling goods such as e-books. A simple implementation may consist of an application for processing orders, which operates on a local database containing the e-books. Order processing typically involves selecting items, registering and checking the delivery channel, and ensuring payment. The payment can be handled by a separate service run by a different organization. In this way, developing a distributed system is partly about service composition and ensuring that those services operate in harmony.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Centralized architectures

A
  • Centralized architectures often involve a single server that implements most of the software components, while remote clients can access that server using simple communication means. [Pages: 69-70]
    Client-Server Architectures:
    Many researchers and practitioners agree that thinking in terms of clients requesting services from servers helps in understanding and managing the complexity of distributed systems.
    Simple Client-Server Architecture: This is a traditional way of modularizing software where a module (client) calls the functions available in another module (server). By placing different components on different machines, a natural physical distribution of functions across a collection of machines is achieved.
  • Client-Server Communication:
    The client-server model is fundamental to distributed systems. Clients send requests to servers, which then process these requests and return the results to the clients.
    In some cases, a server may act as a client, forwarding requests to other servers responsible for specific tasks. For instance, a database server might forward requests to file servers that manage specific database tables. [Pages: 91,92]
  • Application Layering:
    Many distributed applications are divided into three layers:
    User interface layer: all details necessary to the interface
    Processing layer: contains typically the applications
    Data layer: where data is placed
    These layers can be distributed across different machines. For instance, the user interface might be on the client machine, while processing and data layers might be on the server. [Pages: 92]
  • Multitiered Architectures:
    The distinction into three logical levels suggests various possibilities for physically distributing a client-server application across multiple machines.
    The simplest organization is a two-tiered architecture where a client machine contains only the user-interface level, and a server machine contains the processing and data level.
    In some cases, a three-tiered architecture is used, especially in transaction processing. Here, a separate process, known as the transaction processing monitor, coordinates all transactions across different data servers. [Pages: 91,92,94]
    Physically Three-Tier Architectures:
    In a three-tiered architecture, programs that form part of the processing layer are executed by a separate server. This architecture might also distribute some parts of the processing layer across both client and server machines.
    An example of this architecture is in the organization of websites. A web server acts as an entry point, passing requests to an application server where the actual processing occurs. This application server then interacts with a database server. [Pages: 94]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

vertical vs horizontal distribution

A

Vertical Distribution:
Vertical distribution refers to the organization of a distributed system by placing logically different components on different machines. This type of distribution is achieved by aligning with the logical organization of applications. For instance, in many business environments, distributed processing is equivalent to organizing a client-server application as a multitiered architecture.
The term “vertical distribution” is related to the concept of vertical fragmentation used in distributed relational databases, where tables are split columnwise and then distributed across multiple machines. [Pages: 95]
Horizontal Distribution:
Horizontal distribution is a way of organizing client-server applications where a client or server may be physically split up into logically equivalent parts. Each part operates on its own share of the complete data set, thus balancing the load.
In horizontal distribution, processes that constitute a system are all equal, meaning the functions that need to be carried out are represented by every process in the distributed system. As a result, much of the interaction between processes is symmetric, with each process acting as both a client and a server. [Pages: 95]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Structured peer-to-peer architectures

A

Structured peer-to-peer systems are a type of distributed system where nodes (processes) are organized in a specific, deterministic topology, such as a ring, binary tree, grid, etc. This deterministic topology is used to efficiently look up data. A key characteristic of structured peer-to-peer systems is that they typically use a semantic-free index. This means that each data item maintained by the system is uniquely identified without relying on the meaning of the data.

In structured peer-to-peer systems, the overlay network (a logical network where nodes represent processes and links represent possible communication channels) adheres to a specific topology. This topology is used to route messages efficiently between nodes. The organization of nodes in a structured overlay is deterministic, which means that given a particular data item or key, there is a specific node responsible for that key.

The structured nature of these systems allows for efficient data lookup. However, maintaining the structure requires additional overhead, especially when nodes join or leave the system. Despite this, structured peer-to-peer systems offer advantages in terms of scalability and efficiency compared to unstructured systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Deterministic Procedure to Build Overlay Network

A

Overlay Network: In distributed systems, especially in peer-to-peer architectures, an overlay network is a virtual network of nodes and logical links. The nodes represent processes, and the logical links represent possible communication channels, often realized as TCP connections. The overlay network is constructed on top of the physical network, and it abstracts the underlying infrastructure to offer services like routing, data storage, and search.
Distributed Hash Table (DHT): DHT is a key component of structured peer-to-peer systems. It provides a lookup service similar to a hash table; key-value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. Both data items and nodes are assigned random keys from a large key space (e.g., 128-bit space). The challenge is to map these keys to nodes in a manner that ensures efficient lookup.
Key Assignment: Data items are given a random key from a vast key space. Similarly, nodes in the network are also assigned a random key from the same space. The challenge is to determine how to map these keys to specific nodes to ensure efficient data lookup.
Efficient Lookup: The primary goal of DHTs is to enable efficient lookup. When a node wants to find a data item, it queries the DHT with the item’s key. The DHT then determines which node holds that item and routes the query to that node. This process should be efficient, often aiming for logarithmic time complexity in relation to the number of nodes.
The construction of the overlay network and the organization of nodes in it is crucial for the performance and robustness of distributed systems. In structured overlays, nodes have a well-defined set of neighbors, and the organization can be in forms like a logical ring or tree. The deterministic nature of these structures ensures that operations like data lookup can be performed efficiently.

In the context of peer-to-peer systems, the overlay network’s organization requires special effort, and sometimes it’s one of the more intricate parts of distributed-systems management. The goal is to ensure that the overlay network remains connected, meaning that there’s always a communication path between any two nodes, allowing them to route messages to each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Chord

A

Chord is a structured peer-to-peer system that organizes nodes in a logical ring. This system is designed to efficiently locate the node responsible for storing a particular data item, identified by a key.

Key and Node Assignment:
Chord uses an m-bit identifier space.
Data items and nodes are assigned random identifiers in this space.
Typically, keys and identifiers are 128 or 160 bits long.
A data item with a key k is mapped to the node with the smallest identifier id such that id ≥ k. This node is referred to as the successor of key k and is denoted as succ(k).
Lookup Mechanism:
In Chord, each node maintains shortcuts to other nodes. These shortcuts help in efficient lookup.
To look up a key, a node will try to forward the request “as far as possible” without passing it beyond the node responsible for that key.
For example, in a Chord ring, if node 9 is asked to look up the node responsible for key 3 (which is node 4), node 9 will use its shortcuts to find the closest node without surpassing the target.
Overlay Network Construction:
The nodes are logically organized in a ring.
Each node maintains a finger table that helps in efficient routing and lookup.
The finger table provides shortcuts to other nodes in the system, allowing for faster lookups.
Scalability and Robustness:
Chord is designed to be scalable and can handle a large number of nodes.
It provides a mechanism to efficiently route queries even in the presence of node failures or when nodes join/leave the system.
The use of shortcuts and the finger table ensures that the length of the shortest path between any pair of nodes is of order O(log N), where N is the total number of nodes.
Applications:
Chord is commonly used in distributed systems for tasks like file sharing, distributed storage, and more.
Its ability to efficiently locate data in a distributed environment makes it suitable for large-scale applications.
In summary, Chord is a structured P2P system that uses a ring topology and a unique identifier space to efficiently route queries and locate data. Its design ensures scalability and robustness in distributed environments.

17
Q

Superpeers

A

Superpeers are nodes that play a special role in certain peer-to-peer networks. They are often responsible for tasks that regular peers might not be able to handle efficiently.
In many cases, the association between a regular peer (often referred to as a “weak peer”) and its superpeer is fixed. When a weak peer joins the network, it attaches to one of the superpeers and remains attached until it leaves the network.
Superpeers are expected to be long-lived processes with high availability. To account for potential unstable behavior of a superpeer, backup schemes can be deployed. For instance, every superpeer might be paired with another one, requiring weak peers to attach to both.
A fixed association with a superpeer might not always be the best solution. For example, in file-sharing networks, it might be better for a weak peer to attach to a superpeer that maintains an index of files that the weak peer is currently interested in.
In the context of the Skype network, when a weak peer wants to establish a connection, it first connects to a superpeer. The superpeer can assist in finding other users or facilitating communication, especially when peers are behind firewalls or NATs.
Superpeer selection in large-scale systems needs to meet certain requirements:
Normal nodes should have low-latency access to superpeers.
Superpeers should be evenly distributed across the overlay network.
There should be a predefined portion of superpeers relative to the total number of nodes in the overlay network.
Each superpeer should not need to serve more than a fixed number of normal nodes.

18
Q

Cloud Computing

A

Definition and Characteristics:
“Cloud computing is characterized by an easily usable and accessible pool of virtualized resources. Which and how resources are used can be configured dynamically, providing the basis for scalability. The link to utility computing is formed by the fact that cloud computing is generally based on a pay-per-use model with guarantees offered by means of customized service level agreements (SLAs).” [Page 44]
Organization of Clouds:
Clouds are organized into four layers: Hardware, Infrastructure, Platform, and Applications. The hardware layer manages the necessary hardware, including processors, routers, power, and cooling systems. [Page 44]
Levels of abstraction:
*Hardware: processors, routers, power and cooling.
Customers don’t usually get to see these.
*Infrastructure: virtualization techniques. Allocating and
managing virtual storage devices and virtual servers.
*Platform: higher-level abstractions for storage etc.
*Application: actual applications, e.g., office suites (text
processors, spreadsheets).
Challenges and Concerns:
There are obstacles to cloud computing, including provider lock-in, security and privacy issues, and dependency on service availability. Different providers may also show varying performance profiles. [Page 45-46]

19
Q

Edge computing

A

This concept of edge-server systems is now often taken a step further: taking cloud computing as implemented in a data center as the core, additional servers at the edge of the network are used to assist in computations and storage, essentially leading to distributed cloud systems. In the case of fog computing, even end-user devices form part of the system and are (partly) controlled by a cloud-service provider [Yi et al., 2015].” [Page 105]
*Motivation
✦ Latency (and bandwidth): important for real-time applications
e.g. augmented reality. Latency (and bandwidth) to cloud are
often underestimated (overestimated).
✦ Reliability: connection to cloud can be unreliable. Often high
connectivity guarantees are required.
✦ Security and privacy: resources are not always better
protected in edge data centers than clouds, but security
handling in cloud is tricker than within organization.

20
Q

Middleware

A

“In a sense, middleware is the same to a distributed system as what an operating system is to a computer: a manager of resources offering its applications to efficiently share and deploy those resources across a network.”
Middleware offers services similar to operating systems, including:
Facilities for interapplication communication.
Security services.
Accounting services.
Masking of and recovery from failures.
The main difference between middleware and operating-system services is that middleware services are offered in a networked environment. Middleware can also be viewed as a container of commonly used components and functions that don’t need to be implemented by applications separately.
It provides transparency.
Middleware plays a role in achieving openness in distributed systems. Two important design patterns often applied to the organization of middleware are wrappers and interceptors.
A wrapper or adapter is a special component that offers an interface acceptable to a client application, transforming the functions to be suitable for the client. Wrappers have played a significant role in extending systems with existing components. Middleware can reduce the number of wrappers needed by implementing a broker, which handles accesses between different applications. [Pages: 85,86,87]
An interceptor is a software construct that can be used to extend and adapt middleware. Interceptors offer a means to adapt the standard behavior of middleware. [Pages: 86,87,88,89]

21
Q

Unstructured peer-to-peer architectures

A

Unstructured P2P systems do not impose a particular structure on the overlay network. Nodes in these systems connect randomly to each other. The primary advantage of such systems is their simplicity and ability to adapt to the dynamic addition, departure, and failure of nodes. File sharing systems like Gnutella and the early versions of Napster are examples of unstructured P2P systems. In these systems, searches are typically performed using flooding, where a query is sent to all neighbors, who then forward it to their neighbors, and so on. This can be inefficient as it might generate a lot of network traffic.

22
Q

interceptor

A

An interceptor is conceptually a software construct that interrupts the usual flow of control, allowing other (possibly application-specific) code to be executed.
Interceptors are a primary means for adapting middleware to the specific needs of an application, playing a crucial role in making middleware open.
Use in Object-Based Distributed Systems:
In many object-based distributed systems, the basic idea of interception is that an object A can call a method that belongs to another object B, which resides on a different machine.
For instance, if object B is replicated, each replica should be invoked. A request-level interceptor can handle this by calling the method for each of the replicas. The advantage is that object A doesn’t need to be aware of B’s replication, and the middleware doesn’t need special components to deal with this replicated call. Only the request-level interceptor, which may be added to the middleware, needs to know about B’s replication. [Pages 88]

23
Q

wrapper or adapter

A

A wrapper or adapter is a special component that offers an interface acceptable to a client application. The functions of this interface are transformed to be suitable for the client.
When building a distributed system out of existing components, a fundamental problem arises: the interfaces offered by legacy components might not be suitable for all applications. Wrappers or adapters address this issue by providing a transformed interface that is acceptable to client applications.

24
Q

Proxies/stubs

A

Proxies:
When a client binds to a distributed object, an implementation of the object’s interface, called a proxy, is loaded into the client’s address space.
A proxy is analogous to a client stub in RPC (Remote Procedure Call) systems. Its primary function is to marshal method invocations into messages and unmarshal reply messages to return the result of the method invocation to the client.
The actual object resides on a server machine, where it offers the same interface as it does on the client machine. [Pages: 76,77]
Stubs:
Incoming invocation requests to the server are first passed to a server stub, which unmarshals them to make method invocations at the object’s interface on the server side.
The server stub is also responsible for marshaling replies and forwarding reply messages to the client-side proxy.
The server-side stub is often referred to as a skeleton as it provides the bare means for letting the server middleware access the user-defined objects. [Pages: 76,77]