Chapter 1: Introduction, Motivation & Overview Flashcards
Working definition for this course Distributed Systems
A distributed system is a system that is comprised of several physically disjoint compute resources interconnected by a network.
Leslie Lamport’s anecdotal remark
• “A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable
Why build a distributed system?
- Centralized system is simpler in all respects
- Scalability limitations
- Single point of failure
- Availability and redundancy
- Many resources are inherently distributed
- Many resources used in a shared fashion
Client‐server model: Examples
- Clients and servers are often separated by network, but may also be running on the same machine
- Clients initiate request, server awaits client requests
- Example servers: Web server, database server, ftp server, name server, print server, mail server, file server, compute server (software servers vs. physical server nodes)
• Example clients: Web browser, email clients, chat clients
Client‐server model: Implementation challenges
- Software server architecture
- Authentication, access control, encryption, …
- Concurrent processing of client requests
- Concurrent access to shared resources
DS Challenges
- How to keep server replicas consistent?
- How to detect failures?
- How many failures can a given design tolerate?
- What kind of failures can it tolerate?
- How to recover from failures?
Tiered architecture
- Client - Web Server - DB
- Web server, application server, database server
Multi‐tiered architecture
- Data persistence tier <–>
- Business logic tier <–>
- Presentation tier <–>
- Client <–>
N‐tiered architecture
- Client‐server architecture style
- Requests pass through tiers in a linear fashion
- Logical and physical separation of functions
- Typical function of each tier
- Presentation (user interface)
- Application processing (business logic)
- Data management (data persistence)
- Predominantly used is the 3‐tiered architecture
- Fosters flexibility, re‐usability, modularity, separation of concerns
- Tiers can be more independently modified, upgraded and replaced
- Layers vs. tiers
- Logical structuring of software vs.
- Physical structure of infrastructure
DNS: The Domain name system
- Cornerstone of the Internet (like a phone book)
- Maps domain names to IP addresses
- www.example.com to IP address of host serving this domain
- A distributed database of name servers
- E.g. ,used by clients (Web/email) to resolve names
- Developed to replace a centralized resolution scheme
- Early example of a distributed system
DNS Name resolutoin
- What is the IP address of some‐webserver.com?
Please reply to my IP address - Q: Where can I find the IP address of some‐webserver.com?
- A: I don‘t know but .com Namespace should have the answer.
- Q: What is the IP address of some‐webserver.com
- A: Primary DNS Server of some‐ webserver.com knows it.
- Q: What is the IP address of some‐webserver.com?
- A: Here is the IP address of some‐webserver.com
Web data platform
- A.k.a. key‐value store (NO SQL)
- Emerged around 2004 with Google’s BigTable,
- Facebook’s Cassandra, Yahoo!’s PNUTS etc.
- Data model based on keys associated with
- K/V stores are not new, but scale of deployment and use was unprecedented
- Backs Web properties of major Internet companies
- Meant to manage Peta bytes of data
Summary: Distribute systems examples
- Client‐server model
- Multi‐tiered enterprise architectures
- Cyber‐physical systems
- Power grid and smart power grid – Cellular networks
- ATM and banking networks
- Large‐scale distributed systems
- Distributed application
Characteristics of distributed systems
- Reliable
- Fault‐tolerant
- Highly available
- Recoverable
- Consistent
- Scalable
- Predictable performance
- Secure
- Heterogeneous
- Open
Reliability
- Probability of a system to perform its required functions under stated conditions for a specified period of time
- To run continuously without failure
- Expressed as MTBF, failure rate
Availability & high‐availability
- Proportion of time a system is in a functioning state, i.e., can be used, also as 1 – unavailable
- Ratio of time usable over entire time
- Informally, uptime / (uptime + downtime)
- System that can be used 100 hrs out of 168 hrs has availability of 100/168
- System could be up, but not usable (outage!)
- Specified as decimal or percentage
- Five nines is 0.99999 or 99.999% available
- 1x9 = 36,5 days, 4x9 = 52,56 min 6x9 = 31,5s
Availability is not equal to reliability
- System going down 1 ms every 1 hr has an availability of more than 99.9999%
- Highly available, but also highly unreliable
- A system that never crashes, but is taken
down for two weeks
* **Highly reliable**, but only about **96% available**
Middleware NN
Middleware comprises services and abstractions that facilitate the design, development, and deployment of distributed applications in heterogeneous, networked environments.
Example abstractions: Remote invocation, messaging, publish/subscribe, TP monitor, locking service, etc.
Examples: DCE, CORBA, RMI, JMS, Web services, etc.
- Constitutes building blocks
- Captures common functionalities
- Message passing, remote invocation
- Message queuing, publish/subscribe
- Transaction processing
- Naming, directory, security provisions
- Fault‐tolerance, consistent views
- Replication, availability
- Deals with interoperability
- Deals with system integration
- Not directly covered in this course
Middleware stack NN
Application layer
Middleware layer
Networking layers (Transport, etc.)
Fallacies of distributed systems design
- Assumptions (novice )designers of distributed systems often make that turn out to be false
- Originated in 1994 by Peter Deutsch, SunFellow, Sun Microsystems
- Also see“A Note on Distributed Computing”
-
The 8 fallacies
- The network is reliable.
- Topology doesn’t change.
- Latency is zero.
- There is one administrator.
- Bandwidth is infinite.
- Transport cost is zero.
- The network is secure.
- The network is homogeneous
The network is reliable
- Switches & routers rarely fail
- Mean time between failures is very high (years!)
- Why then is this a fallacy?
- Power supply, hard disk, node failures
- Incorrect configurations
- Bugs, dependency on external services
- Effect is that application hangs or crashes
Selected fallacies dissected NN
- Adapted from “Fallacies of Distributed Computing Explained” by Arnon Rotem‐Gal‐Oz
- Let us look at some fallacies
- Assumptions
- Effects
- Countermeasures
The network is reliable: Implications for design
-
Redundancy
- Infrastructure & hardware
- Software systems, middleware & application
- Catch exceptions, check error codes, react accordingly
- Prepare to retry connecting upon timeouts
- Acknowledge, reply or use negative acknowledgements
- Identify & ignore duplicates
- Use idempotent operations
- Verify message integrity
Latency is zero
- “Making a call over the wire is like making a local call.”
- Informally speaking
- Latency is the time it takes for data to move from one place to another
- Bandwidth is how much data can be transferred during that time (bits per second)
- Latency is capped by the maximum speed of information transmission, i.e., the speed of light
- at ~300,000 kilometres per second, round trip time between US‐Europe (~ 8K km) is ~60ms
- Bandwidth (& its use) keeps on growing
Latency is zero vs. cost of a method cal NN
- Local call is essentially a Push and a Jump to Subroutine
- System call is taken care of by OS (100s of assembly instructions)
- Call across a LAN involves system calls on caller and callee and network latency
- Call across a WAN … transmission delays etc.
- Strive to make as few calls as possible, moving as much data as possible
- Trading off computing for data transmitted (cf. “bandwidth is not infinite” & I/O is expensive).
Summary: The 8 fallacies
- The network is reliable.
- Latency is zero.
- Bandwidth is infinite.
- The network is secure.
- Topology doesn’t change.
- There is one administrator.
- Transport cost is zero.
- The network is homogeneous.