2 - Distributed Information Systems Architecture Flashcards
Give an overview of distributed transactions
Keyword: COORDINATION
- Distributed is: subsystems working together in a COORDINATED fashion, possibly with multiple resource manager.
- Idea of distributed transactions: centralized COORDINATOR interacting with multiple agents (RM/DB).
- Commit protocol: allow atomicity of global transactions (CID require extra protocols). Must provide:
- minimal overhead
- parallelism (independence) of agents
- fault tolerance (logging state changes and interactions).
What are the goals of the Two-phase commit protocol and how does it work?
Perform commit on all agents in two phases:
1: receives EOT and send “prepare” message to all agents, waiting for a “failed” or “ready” response.
- Guarantees recoverability of changes
2: if all agents replied “ready”, send “commit” message to all of them, otherwise send “abort” message. Send ACK back to coordinator.
- Perform changes and release resources
- Every state change is logged on both coordinator and agent.
- Hierarchical/2PC for trees of requests
What are the conceptual layers of an information system?
Presentation: accepts requests (input) and present information (output) (UI, reports, etc)
Application logic: implementation of services supported by the system (often retrieve/modify data)
Resource management: source of data. DB, files, “external” providers (encapsulation).
- Transactions span all layers.
Compare the top-down and bottom-up approaches to the design of information systems.
TOP-DOWN
Ideal scenario, development from scratch
Steps of design/specification go from presentation down to necessary application logic (functionality) and data modelling.
Results usually in homogeneous tightly-coupled components.
High-level goals, addressing functional and non-functions requirements.
BOTTOM-UP
Necessity if reusing (RM), wrapping and integrating them on application logic, finally adapting presentation layer.
Functionality of components is predefined: high-level goals are achieved by adapting and combining them.
Results in loosely-coupled components, which remain autonomous and can be used stand-alone.
How can the conceptual layers be mapped into tiers?
Tiers = implementation architecture
- Layers can be mapped to one or more tiers, as well as more layers into one tier (tier != node)
- Modularization of the system is achieved
Each tier provides well defined interfaces to access its functionality
Adding tiers improves flexibility, functionality, distribution, and scalability - but impacts performance and complexity.
What are the different tier architectures and their pros and cons? 1-Tier
1-tier: monolith system
- Motivated by mainframe with scarce resources and dumb terminals: no other entry points (ie API)
- Legacy systems pre-middleware and DBMS
- Pros: optimized performance (tight coupling), closed environment
- Cons: difficult and expensive to maintain/reuse
What are the goals and benefits of a horizontally distributed information system?
Reduced costs (vs. mainframe) Organization separation, integration of existing components Increase performance, scalability, reliability!
- Issues: distribution transparency to client, multiple realization alternatives.
What are the communication alternative in the distributed information system scenario?
Blocking (synchronous) versus non-blocking (asynchronous) interactions
- Synch: caller must wait for reply before continuing processing (easier for programmer). Tighter coupling: receiver must be active.
- Asynch: caller continues processing: message queuing.
What are the different tier architectures and their pros and cons? 2-tier
2-tier: client-server system
- motivated by powerful PCs/workstations coming up as clients
- Different approaches of where to place application logic (fat vs thin server and client): on client or on server
- Pros: emphasis on “service”: API (portability, stability by standardizing interface of server), supports multiple clients
- Cons: low scalability: central server, high-maintenance for fat clients (which also become integration engines for multiples servers)
What are the different tier architectures and their pros and cons? 3-tier
3-tier: one tier for each IS layer (application logic introduces middleware)
- Pros: good scalability (application logic distributed across nodes), portability of application logic, supports multiple resource managers
- Cons: increased communication
What are the different tier architectures and their pros and cons? N-tier
N-tiers: generalization of 3-tier (dividing singles layer into multiples tiers)
- Resource layer may include other tiered systems
- Presentation layer divided into client-side presentation and server-side presentation (4-tier architecture, presentation = browser + web server)
Explain and compare the different alternative for distribution (for client transparency). ALTERNATIVE 1: unity of distribution = transaction
ALTERNATIVE 1: unity of distribution = transaction
- Route transaction to specific node (XOR), which are orthogonal on all tiers (sort of load-balancing)
- Pros: simple solution, easy to implement, heterogeneous environment (servers) supported
- Cons: inflexible, only local transactions (no distributed trans.)
Explain and compare the different alternative for distribution (for client transparency). ALTERNATIVE 2: unit of distribution = application component
ALTERNATIVE 2: unit of distribution = application component (program) (class or library)
- Application components communicate among them (RPC/RMI), each one with its own local DB
- Allows distributed transactions
- Pros: locality of processing, resuse and heterogeneity
- Cons: inflexible data access (no DB operations across nodes), a bit more complex programming model (deals with distribution)
Explain and compare the different alternative for distribution (for client transparency). ALTERNATIVE 3: unit of distribution = DB operation
ALTERNATIVE 3: unit of distribution = DB operation
- Application access local AND REMOTE DBs (multiple schemes/DBs, distributed transactions)
- Pros: high flexibility, operation across DBs (but not on single query) (see ALT4)
- Cons: complex prog. model (details of remote schemas needed), exposed heterogeneity, increased communication (for each query): lower performance
- This ALT is worse than ALT2 because of performance/communication overhead
Explain and compare the different alternative for distribution (for client transparency). ALTERNATIVE 4: distribution handled entirely by DB/RM
ALTERNATIVE 4: distribution handled entirely by DB/RM
- Single schema view for application (distributed/federated DBMS), distributed transactions
- Pros: query can join data from multiple sources, high flexibility for data access, simple and powerful programming model)
- Cons: increasing communication overhead, schema integration required.