Qs from Prof Flashcards
- Provide a concise definition of a distributed system.
“A distributed system is a program that conisists of multiple parts running on more than one computer interconnected via a network”
- Describe three benefits that may be offered by using a distributed system
rather than a centralized one.
- Improved/broader access to the system
- Enhanced sharing
* resources can be easily shared by many users - Cost-effectiveness (because of sharing)
- Less systems admin effort
- Enhanced availability (multiple copies of data, replicated servers)
- Better performance
* content can be closer geographically
- Describe three possible problems that may arise when using a distributed
system rather than a centralized one.
- More complex (harder to build and maintain)
- Higher operational costs (os upgrades/patches)
- Security and trust issue
- Decreased availability (what if part of the system goes down)
- Briefly explain the difference between connection-oriented communication
and connectionless communication.
Connection Oriented (like TCP) creates a connection like a phone call, it is active until one side hangs up
Connectionless Communication(like UDP): Is like sending a letter. Each communication is treated as a single letter addressed and sent to the other party.
We must specific the communication partnet (port and IP) in every send/recieve op
- Explain why a connection-oriented protocol (like TCP/stream sockets) tends
to have higher overhead than a connectionless protocol (like UDP/datagram
sockets).
UDP is faster to setup and send, it doesn’t create and hold open a pipe for the entirety of communication.
- What is a network port number and why is it necessary?
Port number helps with ensuring we get unique PIDs across multiple machines.
We need a unique machine ID and a unique process ID
“A port number identifies a service provided by a process running on a given machine”
- What does DNS stand for? What function does DNS perform? In general terms,
what type of distributed application is DNS?
Domain Name System
Gives a name that is easier to remember than an IP address
DNS helps to locate a resource.
?A server farm?
- Briefly explain when *broadcasting* can be useful in locating some service.
Can be useful on a LAN, it will ask all machines network where a resource is.
- What are extended failure modes? Give two examples of such failure modes.
“ways of failingn that don’t occur in centralized systems”
- Concurrency: Having more than one thing happening at the same time
- Communication
Examples:
- One process fails but another still runs
- Communication fails between communicating processes
- Communication is garbled between communicating processes

- How are extended failure modes and the choice between connectionless and
connection-oriented communications related?
They both rely on communications over a network.
When communicating order, reliability are both factors
What is a timeout? Give an example of where a timeout might be useful in
a distributed system.
A timeout is a specific amount of time to wait for a response back when a communication is sent. It helps us to detect a failure.
This is useful, as we may be waiting on one machine to respond back and that can cause delays.
It is possible for the other side to be down.
A timeout allows for a retry with the same or a different server
- What is meant by the term scalability?
Scalability is simply defined as the ability of a system to grow (in scale) to larger sizes without making changes to the system design.
- We don’t have to change the techniques used as the system becomes 10/100/1000/… times bigger
- What is meant by the term consistency?
How does this relate to using replicated servers to enhance reliability?
Keeping the contents of replica servers identical.
If we replicate servers to increase reliability, we need to ensure all content is identical and don’t hand out incorrect/stale information to different web clients.
- Give an example of how design choices related to resource naming can affect
distributed system scalability.
When designing we need to consider:
- Information required to do discovery
- Cost of discovery
- Reliability of discovery
- The scale of the discovery mechanism
- What is the primary difference between a process and a thread?
Process can be though of as a program in execution includes:
- code that is executing
- data being operated on
- Execution state
- register contents, PC, call stack
processes run on a single machine
A thread is a cheaper alternative to processes
- A unit of actibity alone
Can having more than on thread in a memory space
a thread is a light-weight process, multiple running on a single machine
- Concisely describe what a *nix fork() call does.
A clone of the current running process is made.
It returns either:
- the process id (pid)of the created child process
- or 0 (to indicate that we are the child process)
This can then be used to continue on with the code or have a clone exec(…) another program.

- Why must each thread have its own stack?
In order for it to be concurrent is needs to be able to manipulate the program/data independently, which wouldn’t happen with a shared stack.
- What does the start() method on a Java Thread object do?
After a call to start( ), the original thread (the one that called start) and a new thread will both be running
The new thread, once it is started will be running it’s run( ) method. Every newly created thread starts by running this code.
So the thread is created but doesn’t actually start until the start( ) command is recieved.
- How do we normally define a correct *concurrent* execution?
Correctness is defined to be “The same as any sequential execution of the concurrent programs”
Correctness refers to access and modifying data at the same time (well one at a time)
it doesn’t matter who goes first, as long as one finishes before the other begins
- What is a critical section?
A section of code that acceses shared data
- “What are the shared variables?”
- Which critical sections are related
* Which access the same shared variables

What is RPC? Why does the lack of shared memory between the caller and the callee make RPC harder to implement than LPC?
RPC was originally designed to elimate sending messages (which was unfamiliar). It abstracts/hides the messaging.
An RPC system translates procedure calls into appropriate message passing for the programmer.
Instead of running a procedure call on a local machine, we are making a procedure call on a different machine than the caller.
In short, how do we deal with and pass data of different types. Long answer, see the image.

- What is marshalling?

- What is meant by the term “wire/network format”? How does this relate to RPC?
Wire/network format
Everything is tranlated to a pre-agreed upon machine-architecture-neutral representation for transmission and then from that format upon reception.
It handles the problem of deal with multiple data types and system architectures to sucessfully pass as parameters.

- What is RMI? Why is an RMIregistry required in Java?
RMI is a more modern (object oriented) version of RPC.
RMI = Remote Method Invocation
Complex arguments to, and results from, a remote method invocation are passed by deep copy rather than by value.
The RMIregistery is required to get a reference to the remote object you want to invoke a method on.
This is an RMI look up server so we can get that reference.

What is meant by synchronization? When must threads synchronize their activities?
Synchronization is used to control access to shared variables that could negatively affect the correctness.
In most cases, synchronization only involves ensuring that concurrent threads do not concurrently access shared data
Threads must be synchronized when they are altering shared variables.
What is a mutex? How is it used to guard access to shared data and thereby synchronize the threads accessing it?
Mutex stands for mutually exclusive
Mutex is a common approach to controlling accress to shared data that uses locks or mutex (mutual exlcusion) on the variables associated with the corresponding data
Lock (sometimes called acquire)
Unlock (sometimes called free)
- What, generally speaking, is a socket?
A socket is an abrstraction that mimics a physical wall socket.
Sockets is a message passing API
Allows for message ppassing communication
It is an endpoint for communication

-What is meant by the term “well-known address”?
Well known address refers to the address of a server that is known/hardcoded in advance.
We already know where and what server we are connecting to specifically.
A pre-agreed upon server name and port
Briefly explain what an accept() call on a Java serverSocket object does.
accept() accepts a connection from a requesting client..
This method creates the connection and returns a normal socket which is then used by the server for messaging.
Why are servers normally concurrent? What advantage does this provide?
Servers must serve multiple requests.
Threads then handle concurrent procesessing for each individual client.
There is usually delays in processing of information on either side, or in messaging, this would create a queue and slow every other client down.
Briefly sketch the *high-level* operation of a *multi-threaded* server based on stream sockets.
draw it
-Scripting languages are good for implementing “glue logic”. Briefly explain why this makes them useful in developing distributed applications.
Scripting languages have special facilities for things such as sequencing commands, pattern matching etc.
Interpreted - easy to use and distribute
Weakly typed - can work on any data
Powerful - from special features
Flexible and composible.
Helps us to easily handle connections or processing of data
Give three examples of scripting language features that make them useful in creating distributed systems.

What is JDBC? Why is it relevant to distributed systems development?
Sketch how this works generally.
JDBC allows for remote access of database systems
A standard providing a means of accressing “any” database seafly and remotely.
JDBC provides this same ability of DB access but specific to Java

Give an example of some processing that naturally belongs on a client.
Display of results from server to the client.
As an example, a web-client.
Anything that would be hampered by a delay with the server.
Give an example of some processing that naturally belongs on a server.
NFS file server and sending/recieving files
Give an example of some processing that might belong on either a client or the server.
Some sort of processing that needs to be done and there are resources avail on either side.
- What was meant by the “Thin Client vs. Thick Client” argument?
Thin means not much is processed on the client side logic wise.
Thick means that there is a lot of processing happening on the client side.
What three factors should you consider when deciding how to distributed application functionality between the server and the group of clients
- Location of the resources being accressed
- Communication costs
- Workload balance/distribution
You should always structure in a way that minimized communication
All things being equal, chose the dsitribution that best balance the workload across all available machines.
- Conisder the capabilities of each machine
Briefly explain why a stateless server provides better fault tolerance.
Stateful server is on that maintains information about it’s clients.
- Which clients are connected
- What are they doing?
What happens if the server fails? How is the state information reconstructed after it recovers.
Reconstructing this informating is the difficulty during fault tolerance.
How does a *datagram* server know which client it is communicating with so that it can return a result to that client?
It gets the information from the datagram and responds back to that.
It’s gets the information from the packet
bankThread(Hashtable<integer> accounts,DatagramSocket clientSocket, InetAddress address, String firstMessage, int port){</integer>
this. accounts = accounts;
this. mySocket = clientSocket;
buf = new byte[1000];
this. address = address;
this. firstMessage = firstMessage;
PORT = port;
}
Why is a stateless server commonly preferred over a stateful server?

If a server is to be stateless, where must the state be stored?
The state should be stored in the client.
If stateful, then the state needs to be stored in the main server (not the inidividual threads)
What is a server Process Pool? What advantage does it offer?
The cost of creating and maintain multiple processes is high
- This impacts the performance improvements that can be achieved.
Rather than create new processes for each clinet as they connected, servers pre-created a “pool” of existing processes to which they could assign client request as they arrived.
- Cheap that suring service provision as the cost of process creation is up front before connection.
This improved performance
Give an example of an application where having the threads at a server able to share data would be a significant advantage.
Think about a multi-player game
What is the difference between thread per request and thread per connection?
Thread per request:
Incoming client requests come into a central location and threads handle those individual requests
Thread per connection:
Once a connection is accepted it recieves a dedicated thread, which handles all interactions with that client.

What is a server farm? What two primary benefits are offered by a server farm?

What is consistency maintenance? Why is it a challenge with replicated server?
With server farms each individal server machine must maintain a copy of the “resources” to be server
-When we want to change the data, we have to change it in all the copies
We want to avoid providing clients stale data

What is Round Robin (RR) DNS? How is it useful with server farms? When might its benefit be limited? (Hint: What does DNS do to ensure lookup efficiency and
how does this impact RR-DNS?
DNS provides a feature know as “round robin” DNS where a single domain name is mapped to multiple ip Addresses (e.g. aviary.cs.umanitoba.ca)
- The address is selected in turn - round robin fashion
This is useful for server farms, because we put one machine out front and redirect to an available server.
Better handling of failed servers and load balancing
It’s benefit is limited if the initial outfront server is slow or we have an issues with it.
The lookup result is cached, so that it doesn’t need to lookup the request again. If RR assigns a specific server, then the comp will chache that result and connect to it directly.
What is a redirect host? What potential benefits does it offer over RR-DNS?
A redirect host is one central server that then directs to a faster to access server, closer to a client geographically.
It also goes to one central direct, that hopefully doesn’t depend on your cache as much, because your chache directs to this central server.
What does CDN stand for? What service does a CDN provide?
Content Distribution/Delivery Network
provides for efficient delivery of content across a large geohraphic are via wide-area server replication.
Akamai is an example of this.
It always for synchronization of content across multiple edge servers.

How, in general, can a CDN client’s web site be transparently redirected to an appropriate CDN edge server?
The domain name from the client is mapped to IP address of appropriate edge server by “mapping” function (& DNS)
Using a collection of simple diagrams, differentiate between the organizational structure of centralized, single-tier, two-tier, and 3-tier Client/Server designs

What is middleware? What three common advantages are offered by the use of middleware in designing distributed systems?
“Middleware is any software that is used solely to connect application components together”
Allows for components to communicate.
Advantages:
- complexity Management
- hide difficult code behind a simpler middleware - Speed of development
- re-use code so you don’t have to rewrite it - Enhanced reliability