Fundamentals Flashcards
What is a distributed system?
What are the parts of a distributed system (or “distributed application”) called that may be arranged and may cooperate in different ways?
Processes/Threads
In client/server distrubted computing what does the server provide to clients?
service(s)
When we say “client” or “server” we are often referring to either/both of:
The machine providing/using the service and, more accurately, the thread(s)/process(es) running on those machines to provide/use the service
In a P2P system, what happens with “servers” and “clients”
In such systems, there are no systems indentified as “servers” or “clients”, all are supposedly equivalent “peers”
With P2P, each system typically performs both roles (server and client) at the same or, possibly, at different times
– E.g. a P2P file sharing user might be providing MPGs to another user at the same time s/he is downloading other MPGs
• When providing files, the user’s machine is acting as a server but when it is downloading files, it is clearly acting as a client
What are some possible benefits offered by distributed systems?
Improved/broader access to the system
- Data access, and other services may be made conveniently available across multiple machines
- E.g. a database/e-mail/file directory/… that is physically maintained at an organization’s head office may be accessible from its many branch offices around the world
Enhanced sharing
- Resources can be easily shared by many users
- E.g. high-end printers & backup devices are commonly shared
Cost-effectiveness
- Sharing devices is commonly more cost-effective than supplying each user with their own
- E.g. share a single high-speed, duplex, colour printer between several users rather than buying many
Less Systems Administration Effort
- Accounting information can be shared across many machines in a distributed system
- E.g. the system administrator only has to create and maintain a single userid for many machines
Enhanced Availability
- Having multiple copies of data can offer improved availability
- E.g. “replicated” web servers can continue to provide service even when one or more have failed
Better Performance
- Multiple service providers can offer better service
- E.g. the web requests sent to a large web server are commonly “distributed” across a number of web servers to provide faster response to each request
- content distribution across geographical regions (ala Akamai)
- E.g. the web requests sent to a large web server are commonly “distributed” across a number of web servers to provide faster response to each request
What are some possible liabilities of using distributed systems?
Complexity
- Building and maintaining distributed systems is harder than building and maintaining centralized ones
- E.g. how do we know which parts of the system should be on which machines? What happens when one part fails?
Higher operational costs
- Looking after distributed systems can be expensive
- E.g. OS upgrades and patches must be done on N smaller machines instead of just one big one, etc.
Security and Trust Issues
- Whenever sensitive data is moved across a network, security becomes a concern
- E.g. typing your credit card number and expiration date into a web site – who’s really at the other end, listening on the line?
Decreased Availability
- Wait a minute!!! Didn’t we list improved availability as a benefit of distributed systems???
- Yes, but you have to work for it!
- What if one part of a distributed system goes down and you don’t have replication of that part? – “Extended failure modes”
- “A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable” (Leslie Lamport - paraphrased)
For processes and threads, which is lightweight and which is heavywieght?
Processes = Lightweight
Threads = Heavyweight
What do all distributed systems share in common?
- Network communication
- How to find things (servers/services, peers, …)
- The need to deal with a variety of failures
- How to buiild systems that can grow (scalability)
What is the “network”?
They provide the “mechanism” by which the parts of a distributed applicaiton communicate
What happens inside the network has an impact on how we build distributed systems and what they can be expected to do
The cloud, like driving a car, we don’t need to understand how the engine works, we just need to understand how to drive it (the interface)
“Useful dsitributed applications”
What does the term “useful” mean in this context?
Useful = How fast, how available/reliabile
With regard to network communication, what are some key questions one should probably ask?
- What does the communication?
- How do we identify communicating parties?
- How do they communicate? (What is the API?)
- Is communication always reliable?
- the internet is a “best effort” service unless you manually account for it’s unreliability
- Is communication performance predicatable?
- What happens as networks get big (or small)?
What communicates in a distributed systems?
The users of distributed systems are NOT the communicating entities
- Though they sometimes do communicate using distributed systems (skype, email, facebook)
The communicating entities are actually the “running programs”
- “processes” or “threads”
- For now we assume that processes communicate over the network
What is the network communication diagram with regard to client machines and server machines?
How are communicating parties identified?
We need at least both a unique machine id and a unique process id on that machine
- For the internet, an Internet Protocol(IP) address uniquely identies each machine(more or less)
But even this is still not enough single pids get re-used over time on a single machine.
Process IDs will need to reused at some point. It is unique within the set that currently exists. If we have services that run for a long time, we could possibly run out of PIDS.
- A port number identifies a service provided by a process running on a given machine
What does it mean for a process to communicate?
Model of “message passing”
Other abstractions can be built on top
This works well because networks provide for the delivery of data in the form of “packets”
Building on packets, “messages” may be sent between two “end points” (processes on specific machines identified using port #’s)
Prof notes: Representations of the stuff can be affected by proc x or proc y
Where the client and server parts don’t correct agree on what’s being transferred over the network
What does the intuitive message example look like?
“please give file ‘f’” to port 27983 on machine 130.179.28.1
and might be followed by something like:
recieve file ‘f’ from port 27983 on machine 130.179.28.1