622 Flashcards

Question 1

Q

Eight pleasant thoughts

Answer

A

-The network is reliable
-The network is secure
-The network is homogenous
-The topology does not change
-Latency is zero
-Bandwidth is infinite
-Transport cost is zero
-There is one administrator

Question 2

Q

What is a distributed system?[Tanenbaum]

Answer

A

A collection of independent computers that appears to its users as a single coherent system

Question 3

Q

Three key characteristics [Tanenbaum]

Answer

A

Multiple machines are autonomous
Software lets users see a single system
System easy to expand without user noticing

Question 4

Q

What is a distributed system?[Webopedia]

Answer

A

A type of computing in which different components and objects comprising an application can be located on different computers connected to a network.
Key requirement:set of standards that specify how objects communicate with one another(e.g. CORBA, DCOM, REST, …).

Question 5

Q

[Wikipedia] distributed computing:

Answer

A

decentralized and parallel computing, using two or more computers communicating over a network to accomplish a common objective or task.

Note:The types of hardware, programming languages, operating systems and other resources may vary drastically. It is similar to computer clustering with the main difference being a wide geographic dispersion of the resources.

Question 6

Q

Challenges of DS

Answer

A

-latency of communication
-coordination
-shared resources and mutual exclusion
-ordering, deadlock and live-lock
-timing
-adaptation to change
-failures, soft faults, and optimization
-service discovery and configuration
-heterogeneity and third-party software
-scalability and evolution
-security and privacy
-trust on machines, software, communications & other users

Question 7

Q

Advantages of DS

Answer

A

processing capacity
fault tolerant, evolving, scalable
-explicit control, preferences

Question 8

Q

Replicas

Answer

A

Often useful to have same task performed by multiple components so all have to fail for task to fail
What if data must be shared between components?
Often one component is “master” (aka “original” or “authoritative version”)
The other components are copies from the master
This may be apportioned, e.g., a component may be a master for just some portion like “names A-K”
Confusion in counting the number of “replicas”:
Some might not include the “master” in the count of replicas
Some might include the “master” (“replicas of each other”)
Make sure you know if the “master” is included

Question 9

Q

how to solve for number of replicas

Answer

A

If we assume independence and:
F = Probability that one replica fails in time period, F≠1
n = (natural) number of components (e.g., replicas including master). Thus F^n is the probability all n will fail simultaneously
G = Goal, permitted probability of total system failure where all n replicas fail (including original)

F^n ≤ G or
n ≥ (log G)/(log F)

Question 10

Q

Independence assumption

Answer

A

These calculations assume independent failures
Reasonable model for many hardware failures
Software failures often not independent
Knight & Leveson [1986] found via experiment that software faults are not independent
Thus “N-version programming” doesn’t lead to the reliability increase you might predict
It can be helpful, but less than you’d think

Question 11

Q

Caching: Special case of replication

Answer

A

Make cop(ies) of a resource (data)
Often happens on demand
Other replication approaches often planned & executed in advance

Question 12

Q

Challenges of DSexample: replication has downsides

Answer

A

buy more hardware
administration costs
software upgrades
load balancing
performance overhead
more complex software
consistency problems
sometimes tolerable

Question 13

Q

hiding access

Answer

A

hide differences in data representation and how a resource is accessed

(conversion of complex fortmats. latency vs fidelity of access)

Question 14

Q

hiding location

Answer

A

hide where a resource is located (trusted hosts, different performance, difference capabilities and network access)

Question 15

Q

migration

Answer

A

hide that a resource may move to another location (trusted hosts, different performance, difference capabilities and network access)

Question 16

Q

relocation

Answer

A

hide that a resource may be moved to another location while in use (trusted hosts, different performance, difference capabilities and network access)

Question 17

Q

replication

Answer

A

hide the fact that several copies of a resource exist (select server based on QoS)

Question 18

Q

concurrency

Answer

A

hide that a resource may be shared by several competitive users(cannot hide sharing of resources: they’re consumed , data is modified by others)

Question 19

Q

failure

Answer

A

hide the failure and recovery of a resource(unexplained behavior)

Question 20

Q

persistence

Answer

A

hide whether a (software) resource is in memory or on disk(someone needs to decide whether an object is persistent and commit it to disk)

Question 21

Q

awareness and adaptation

Answer

A

separate decisions from (controllable) mechanisms

Question 22

Q

Some measures (per Neumann) for system scaleability

Answer

A

Size
Users & resources
Geographical
May lie far apart
Administrative
May span many
independent
administrative
organizations

Question 23

Q

Decentralized algorithms

Answer

A

No machine has complete information about the system state.
Machines make decisions based only on local information.
Failure of one machine does not ruin the algorithm.
There is no implicit assumption that a global clock exists

Question 24

Q

Asynchronous communication

Answer

A

Hiding communication latencies important for geographical scaleability
Max speed is speed of light in vacuum (~3.00×108 m/s)
Information transfer through material normally less
Physical components have other performance latencies
Software takes time to execute once it receives data
Sending information and waiting for reply is synchronous communication
Alternative: Asynchronous – send information, don’t wait for reply

Question 25

Q

Moving location of execution

Answer

A

E.G., when checking form inputs, consider two options:
Send each input to server & wait for reply – maybe long delay
Could move checking code to client
Now check response can be immediate (once code is there)
Can be special-case (e.g., HTML5 form validators)
Can be general (e.g., a general execution engine like Javascript)
Beware of security ramifications
Often two sides (e.g., client & server) cross trust boundary
Security checks must often be redone on server in many cases
Server can’t trust client
Many checks are done on both client (for speed) and server (for security)
Client must often check the data it’s asked to execute (especially if it’s a full language like Javascript)
Client can’t trust server

Question 26

Q

Cloud computing

Answer

A

Clouds widely used, often misunderstood
Clouds often cheaper (where appropriate), many variations
Decisions to use cloud (and how) impact security
NIST Definition of Cloud Computing (NIST SP 800-145):
Cloud computing is “a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.”
Five essential characteristics: On-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service
Virtualization is common, but not required, to be a cloud
Book makes this (common) mistake

Question 27

Q

Cloud service models

Answer

A

Infrastructure as a Service (IaaS): “consumer [can] deploy and run arbitrary software [including] operating systems and applications….”
Platform as a Service (PaaS): “consumer [can deploy] consumer-created or acquired applications…” [on top of provided platform]
Software as a Service (SaaS): “consumer [can] use the provider’s applications running on a cloud infrastructure…”

Question 28

Q

OSI 7 layer model (Open Systems Interconnection)

Answer

A

physical, data link, network, transport, session, presentation, application

Question 29

Q

physical

Answer

A

specifies: pin layout, voltages, modulation
does: establish & terminate access to medium,flow control, contention resolution
at this level: hubs, repeaters,network adapters

Question 30

Q

data link

Answer

A

specifies: how to transfer data in a LAN
does: detect and correct errors
at this level: MAC addresses (flat, HW-based)

Question 31

Q

network

Answer

A

specifies: how to transfer data sequences across LANs (e.g., IP)
does: routing
at this level: hierarchical address scheme,routers, bridges & switches

Question 32

Q

IP Service Model

Answer

A

Connectionless (datagram/packet-based)
Best-effort delivery (unreliable service)
packets are lost
packets are delivered out of order
duplicate copies of a packet are delivered
packets can be delayed for a long time
Datagram format

Question 33

Q

Datagram forwarding

Answer

A

Strategy
every datagram contains destination’s address
if directly connected to destination network, then forward to host
if not directly connected to destination network, then forward to some router
forwarding table maps network number into next hop
each host has a default router
each router maintains a forwarding table

Question 34

Q

Forwarding Tables

Answer

A

Suppose there are n possible destinations, how many bits are needed to represent addresses in a routing table?
log2n
So, we need to store and search n * log2n bits in routing tables?
We’re smarter than that!

Question 35

Q

Global Addresses

Answer

A

Globally unique,
hierarchical: network+host
Dot Notation
10.3.2.4
128.96.33.81
192.12.69.77

Question 36

Q

Transport

Answer

A

specifies: reliable transference of data(e.g., TCP, UDP)
does: flow control, segmentation, error control,retransmission

Question 37

Q

UDP

Answer

A

(User Datagram Protocol)
connectionless - sends independent packets of data, called datagrams, from one computer to another with no guarantees about arrival
each time a datagram is sent, the local and receiving socket address need to be sent as well

Question 38

Q

TCP

Answer

A

(Transmission Control Protocol)
connection-oriented - provides a reliable flow of data between two computers: data sent from one end of the connection gets to the other end in the same order
in order to communicate using TCP protocol, a connection must first be established between the pair of sockets
once two sockets have been connected, they can be used to transmit data in both (or either one of the) directions

Question 39

Q

Overhead

Answer

A

UDP - every time a datagram is sent, the local and receiving socket address need to be sent along with it
TCP - a connection must be established before communications between the pair of sockets start (i.e. there is a connection setup time in TCP)

Question 40

Q

Packet size

Answer

A

UDP - there is a size limit of 64 kilobytes per datagram
TCP - there is no limit; the pair of sockets behaves like streams

Question 41

Q

reliability

Answer

A

UDP - there is no guarantee that the sent datagrams will be received in the same order by the receiving socket
TCP - it is guaranteed that the sent packets will be received in the order in which they were sent

Question 42

Q

which protocol to use?`

Answer

A

TCP - useful when indefinite amount of data need to be transferred ‘in order’ and reliably

UDP - useful when data transfer should not be slowed down by the extra overhead of the reliable connection

Question 43

Q

session

Answer

A

specifies: establishing long lived connections
does: checkpointing, adjournment, restart

Question 44

Q

presentation

Answer

A

specifies: data formats and transformation(e.g., MIME)
does: serialization, compression, encryption, encoding transformation (EBCDIC/ASCII)

Question 45

Q

application

Answer

A

specifies: application-specific protocols(e.g., http, smtp, ftp, telnet)
does: support app-specific functionality

Question 46

Q

goal of the OSI

Answer

A

separation of concerns enables good implementationat each level
each layer is independentof the ones on top
layer n depends on the spec of n-1, but not on its implementation/manufacturer

Question 47

Q

port

Answer

A

Generally, a computer has a single physical connection to the network
this connection is identified by the computer’s 32-bit IP address
all data destined for a particular computer arrives through this connection
TCP and UDP use ports to identify a particular process/application
port = abstract destination point at a particular host
each port is identified by a positive 16-bit number, in the range 0 - 65,535
port numbers 0 - 1023 are reserved for well-known services (HTTP - 80, telnet – 23)

Question 48

Q

socket

Answer

A

basic abstraction for network communication

“end-point of communication” uniquely identified with IP address and port
example: Socket MyClient = new Socket(“Machine name”, PortNumber);
gives a file-system like abstraction to the capabilities of the network
two end-points communicate by “writing” into and “reading” out of socket
there are two types of transport via sockets
reliable, byte-stream oriented unreliable datagram

Question 49

Q

socket programming with TCP

Answer

A

Server Side:
server runs on a specific computer and has a socket bound to a specific port number
server listens to the socket for a client to make a connection request
Client Side:
client tries to rendezvous with the server on the server’s machine and port

Server Side:
the server accepts the connection by creating a new socket bound to a different port
Client Side:
if the connection is accepted, the client uses the new socket to communicate with the server

Question 50

Q

Socket programming with UDP

Answer

A

All clients use the same socket to communicate with the server
Packets of data (datagrams) are exchanged
No new sockets need to be created

Question 51

Q

C- vs. Java- socket programming

Answer

A

Java keeps all the socket complexity “under the cover”
It does not expose the full range of socket possibilities
But, it enables sockets to be opened/used as easily as a file would be opened/used

By using the java.net.Socket class instead of relying on native code, Java programs can communicate over the network in a platform-independent fashion

Question 52

Q

Java socket programming

Answer

A

all classes related to sockets are in java.net package
Socket class - implements client sockets (also called just “sockets”)
ServerSocket class - implements server sockets
A server socket waits for requests to come in over the network. It performs some operation based on that request, and then possibly returns a result to the requester.
DatagramSocket class - socket for sending and receiving datagram packets
DatagramPacket class - represents a datagram packet
Datagram packets are used to implement a connectionless packet delivery service. Multiple packets sent from one machine to another might be routed differently, and might arrive in any order.
InetAddress class - represents an Internet Protocol (IP) address
MulticastSocket class - useful for sending and receiving IP multicast packets.
A MulticastSocket is a (UDP) DatagramSocket, with additional capabilities for joining “groups” of other multicast hosts on the internet. A multicast group is specified by a class D IP address.

Question 53

Q

what does middleware offer?

Answer

A

conceptual model for communication

Question 54

Q

different styles have different data sharing assumptions

Answer

A

RPC
(address space or memory)

RMI
object refs (middleware)

messages

data store -> files/objects persistent store

data stream -> and <- data store/source

Question 55

Q

read slide 49-60 in lecture 2

Question 56

Q

RPC is implemented by

Answer

A

sending messages

Question 57

Q

where to send RPC messages?

Answer

A

hardwired for fixed deployment
some RPC environments support dynamic binding (more to come during the lecture on Service Discovery)

Question 58

Q

solution to object refs

Answer

A

increase granularity from bytes to objects

both local objects and references to remote objects are passed by value (serialization)
the result of the called method is also serialized and passed back to the caller

Question 59

Q

difference between RMI and RPC?

Answer

A

RMI:
doesn’t try to hide distribution in the language:remote objects are declared “remote”
marshalling is simplified
by passing by value only(object references can be used in nested RMIs)
(in Java) by having JVMs hide platform dependencies in data representation
serialization could be much heavier by having to pass the code for the objects with every call, but that can be avoided by passing URLs for downloading the code, rather than the code itself

Question 60

Q

reasons to escape the call return style

Answer

A

no result needs to be returned
a server may not be availableat the time of the request
make the client more responsiveto other events/user
allow any component to initiate communication

Question 61

Q

some middleware push the envelope

Answer

A

dealing with errors:idempotent, at-least-once, at-most-once…
the promised simplicity of procedure calling sometimes hinders more sophisticated solutions

Question 62

Q

when to use call return style

Answer

A

the server is ready to process each request
components and network are mostly reliable
not many concurrent events in the caller:it is fine to block the caller
one component (client) has the initiative,others (servers) wait for requests

Question 63

Q

what is needed for RMI

Answer

A

Java makes RMI (Remote Method Invocation) fairly easy, but there are some extra steps
To send a message to a remote “server object,”
The “client object” has to find the object
Do this by looking it up in a registry
The client object then has to marshal the parameters (prepare them for transmission)
Java requires Serializable parameters
The server object has to unmarshal its parameters, do its computation, and marshal its response
The client object has to unmarshal the response

Question 64

Q

remote object

Answer

A

an object on another computer

Answer 64

A

object making the request

Answer 65

A

object receiving the request((can easily trade roles with the client object)

Answer 66

A

special server that looks up objects by name

Answer 67

A

special compiler for creating stub (client) and skeleton (server) classes

Answer 68

A

The Client
The Server
The Object Registry, rmiregistry, which is like a DNS service for objects
You also need TCP/IP

Answer 69

A

Interfaces define behavior
Classes define implementation
Therefore,
In order to use a remote object, the client must know its behavior (interface), but does not need to know its implementation (class)
In order to provide an object, the server must know both its interface (behavior) and its class (implementation)
In short,
The interface must be available to both client and server
The class should only be on the server

Answer 70

A

one whose instances can be accessed remotely On the computer where it is defined, instances of this class can be accessed just like any other object
On other computers, the remote object can be accessed via object handles

Answer 71

A

one whose instances can be marshaled (turned into a linear sequence of bits)
Serializable objects can be transmitted from one computer to another

Answer 72

A

If an object is to be serialized:
The class must be declared as public
The class must implement Serializable
All fields of the class must be serializable: either primitive types or serializable objects

Answer 73

A

The interface (used by both client and server):
Must be public
Must extend the interface java.rmi.Remote
Every method in the interface must declare that it throws java.rmi.RemoteException (other exceptions may also be thrown)

The class itself (used only by the server):
Must implement a Remote interface
Should extend java.rmi.server.UnicastRemoteObject
May have locally accessible methods that are not in its Remote interface

Answer 74

A

lives on another computer (like server)

You can send messages to a Remote object and get responses back from the object
All you need to know about the Remote object is its interface
Remote objects don’t pose much of a security issue

Answer 75

A

You can transmit a copy of a Serializable object between computers
The receiving object needs to know how the object is implemented; it needs the class as well as the interface
There is a way to transmit the class definition
Accepting classes does pose a security issue

Answer 76

A

The class that defines the server object should extend UnicastRemoteObject
This makes a connection with exactly one other computer
If you must extend some other class, you can use exportObject() instead
Sun does not provide a MulticastRemoteObject class
The server class needs to register its server object:
String url = “rmi://” + host + “:” + port + “/” + objectName;
The default port is 1099
Naming.rebind(url, object);
Every remotely available method must throw a RemoteException. Why?

Answer 77

A

The class that implements the remote object should be compiled as usual
Then, it should be compiled with rmic:
rmic Hello
This will generate files Hello_Stub.class and Hello_Skel.class
These classes do the actual communication
The “Stub” class must be copied to the client area
The “Skel” was needed in SDK 1.1 but is no longer necessary

Answer 78

A

network/transport support for multicast
hasn’t worked well in practice
overlay networks at application layer!
result is a logical network built on top of a (probabaly different) network
new network optimized for application
but may have performance issues due to underlying network

Answer 79

A

tree network approach
function to map topics to nodes
identifies unique root node for a given topic
follow() request follows tree
if node has not seen request
become a “forwarder”
make sender your “child” for that request
forward request to next node in tree
if node has seen request
make sender your “child” for that request
send() request follows topic tree

Answer 80

A

node states: infected, susceptible, removed
goal is to become infected!
P picks a random Q
push: P updates pushed to Q // not so good…
pull: Q updates pulled to P
push-pull: both
gossip adaptation
less interest if receiver already has update

Answer 81

A

suppose an update is deleted
what happens when old copy is found?
distributed systems are different!
notion of a death certificate
“that update doesn’t matter anymore”
have to keep that update
how long?

Answer 82

A

once sent, messages endure in the system, regardless of the sender remaining activeand the recipient being available

Answer 83

A

the sender continues after sending,or blocks, waiting for the message
to be delivered (buffered)
to be received (read)
to be processed

Answer 84

A

socket: create a new communication endpoint
bind: attach a local address to a socket
listen: announce willingness to accept connections
accept: block caller until a connection request arrives
connect: actively attempt to establish a connection
send: send some data over the connection
receive: receive some data over the connection
close: release the connection

Answer 85

A

component interactions don’tfollow a strict call-return pattern
make components more responsiveto other events/user (no blocking)
allow any component to initiate communication
components may not be available to receive/process messages (persistency)

Answer 86

A

(persistent) data plays a central role in the system
components don’t need to synchronize control flow other than on data availability or values
E.g., modern database management systems, distributed file systems

Answer 87

A

interaction partners do not need to know each other

Answer 88

A

data is stored/generated at one place and consumed by one or more clients
timeliness of data delivery is crucial
asynchronous (unbound, e.g., caching)
synchronous (upper bound, e.g., sensors)
isochronous (bounded jitter, e.g., media)
complex data may be transmitted as separate streams which need to be synchronized: e.g. video + stereo sound
streaming is supported by middleware (e.g. RSVP) on top of the data link network layer

Answer 89

A

the interaction partners do not need to be actively participating in the interaction at the same time

Answer 90

A

publishers aren’t blocked while producing events, subscribers can get asynchronously notified of the occurrence of an event

Answer 91

A

An event filter selects event notifications by specifying a set of attributes and constraints on the values of those attributes

A pattern is composed of several filters
A subscription can be expressed as a filter or a pattern

Answer 92

A

A subscription matches an event notification when the notification satisfies all the constraints specified in the subscription

Answer 93

A

Nodes publish event notifications to access points

Nodes subscribe to access points in order to receive event notifications
Specified by filters and/or patterns

Advertisements
defines the event notifications a node may possibly generate using the same semantics as filters

Answer 94

A

The goal of filtering
only deliver messages “of interest” to nodes, reducing the overall traffic across the network
The system is aided by use of advertisements

Answer 95

A

Server must establish appropriate routing path to ensures that notification published by objects of interest are correctly delivered to all the interested parties that subscribed to them

Simplest strategy is to maintain the subscriptions at their access point and broadcast the notification throughout the network
Least efficient
Consumes lots of bandwidth

Answer 96

A

Central idea is to send the notification towards the event servers that have clients that are interested in that notification (possibly using shortest path)
Downstream replication
Notification should be replicated only downstream and as close as possible to parties interested in it

Upstream evaluation
Filters are applied and patterns are assembled upstream – as close as possible to the source of notification

Subscription forwarding
Routing paths for notification are set by subscriptions which are propagated throughout the network so as to form a tree that connects subscribers to all the servers in network

Answer 97

A

a mapping between:

identifier of an entity
(app/process, file)

address of an access point (network address, stub/proxy, piece of hardware plugged to network)

entities connect to the network via one or more access points, which can be reached at an address

Answer 98

A

refers to at most one entity
each entity is referred to by at most one identifier
an identifier always refers to the same entity (i.e., not reused)

Answer 99

A

names are normally organized into name spaces ex. file names

usually organized as a directed, acyclic graph
name represented as a path through nodes in the graph
absolute path starts from root
relative path starts from an arbitrary point
paths represented as <link-1, …, link-n> or /link-1/…/link-n

Answer 100

A

could lead to shorter response time, but
increases resource requirements in name servers
effectiveness of caching depends heavilyon stability of addresses (mobility is an issue)

many resolution servers support only iterative resolution

Answer 101

A

an application may need to find a component with certain capabilities, e.g., a spell checker, or a nearby printer
“resolution” should be guided by the capabilities,not the identity (name) of such components

for other purposes, the identity (name) is still important (e.g. web servers and email servers)
since these components are not typically mobile, conventional name resolution can still be applied

Answer 102

A

type name, e.g., printer, speech recognition
Note: service is ambiguously used to designate(a) service instance (b) service type (c) service supplier
version
interface signature
how to request a service from the supplier, e.g., Java interface
ontology
relations among types

difficulty: relations are not always hierarchical
example frameworks: DAML/OWL, UDDI, WSDL

Answer 103

A

static attributes
intrinsic to the suppliernot dependent on circumstances or resources
e.g., printer supports color and duplex
dynamic attributes
dependent on usage history and resources
e.g., latency (size of Q, available CPU, bandwidth…),accuracy (used algorithms, iterations, quality of data)
subject to tradeoffs
database query: fast vs. complete
language translation: speed/cost vs. accuracy
printing: high quality printing with long Qvs. low quality with short Q

Answer 104

A

physical characterizationof where, when and how the service will be provided
static attributes
e.g., location of a wall-mounted display
dynamic attributes
e.g., location of a PDA, printer queue size
implications to privacy and security
e.g., is the wall-mounted display in a private roomor at a lounge with public access
context is more of an issue for some kinds ofservices than others (non-interactive)
distinguish computation from presentation of results

Answer 105

A

bare bones
name of service type, address/stub to reach supplier
E.g., RMI
some include spec of API signatures
e.g., Jini/JNDI (Java interface)
a few include QoS
Web Services describes generic attributes such as price and reliability, but not service-specific attributes
E.g., web services (covered later in this lecture)
some research middleware includeservice-specific QoS, context, privacy and security

Answer 106

A

depends on the level of service description

bare bones
no way to distinguish, just pick one
some include spec of API signatures
pick one that is compatible
trust, QoS & context
use a quantitative framework (e.g. utility functions)to evaluate which one is best
requires richer description of the service requirementse.g. find a duplex printer < 100 ft away and with < 2 minute wait
p1 is 102 ft away, 2s wait
p2 is 95 ft away, 1 minute wait
p3 is 10 ft away, 3 minute wait

Answer 107

A

service discovery: description of capabilities -> address of an access point

name resolution: identifier of an entity -> address of an access point

Answer 108

A

mechanisms to make it work: description of capabilities -> address of an access point

Answer 109

A

mechanisms to make it work: description of capabilities -> address of an access point

Answer 110

A

clients are configured with a list of addressto go ask for services

Answer 111

A

clients broadcast service requests on demand

Answer 112

A

suppliers broadcast their capabilities periodically

Answer 113

A

suppliers post their capabilities on a directory
clients query the directory

Answer 114

A

broadcasting-based discovery is boundedby the network policy for broadcast (usually LAN)
directed and directory-basedoffer more control of scalability
hard question: how do directories coordinate?
how far, which directories, to direct a query?

Answer 115

A

the act of performing helpful or useful laborthat does not produce a tangible commodity

Answer 116

A

separation of concerns

Answer 117

A

the act of performing helpful or useful labor,where the service supplier is developed separately from consumers and may serve many consumers

Answer 118

A

suppliers register their capabilities,consumers look for services not specific components

Answer 119

A

factory & pool

Answer 120

A

stateful keeps state of conversation while stateless doesn’t

Answer 121

A

introduces dependency on middleware

Answer 122

A

too many “standards”one born every few months: code evolution nightmare
integration with legacy systemsmillions of LOC and billions of $ already invested
strategy: wrappers around old code
with wrappers latency becomes an issue

Answer 123

A

focus on bridging existing technologies
key characteristic: middleware for middleware
it’s about how to access an application,it is not an implementation technology
looser coupling than RPC-based middleware
avoid proprietary APIs
Simple Object Access Protocol (SOAP)based on sending XML messages over http,no SOAP API or ORB
WSDL & SOAP are not widely used today, but it’s important to understand why – complexity is a killer

Answer 124

A

directory (UDDI: universal description, discovery and integration)

service description(WSDL: web services description language)

messages(SOAP: simple object access protocol)

which work on top of:
data types(XML Schema)
data(XML)

Answer 125

A

an approach for service composition and coordination. A central service coordinates the invocation of other services to achieve the system’s functionality

Answer 126

A

another approach for service composition. a decentralized approach to coordination of services, where each service knows how it needs to behave to achieve the system’s functionalities

Answer 127

A

Business process execution language. often used for modeling the coordination among services. BPEL engine executes the model and exposes it as a service

Answer 128

A

receives a message but will not respond

Answer 129

A

receives a message and may issue a fault message

Answer 130

A

receives a message and may issue a reply or fault message

Answer 131

A

call-return, only valid for in-only and in-out patterns

Answer 132

A

(International Resource Identifier)message can be serialized as an IRI
multipart: …
The word “style” here is not the same as what we called “communication style” in this class

Answer 133

A

defines the communication protocoltypically SOAP

Answer 134

A

associates theinterfaces with a URLand protocol (binding)

Answer 135

A

SW companies, standards bodies, and programmers populate the registry with
descriptions of different types of services
Businesses populate the registry with
descriptions of the services they support
UBR assigns a programmatically unique identifier to each service and business registration
Marketplaces, search engines, and business apps query the registry to discover services at other companies
Business uses this data to facilitate easier integration with each other over the Web

Answer 136

A

supports business registrations.

XML document
created by supplier company (or on its behalf)
may have multiple service listings

Answer 137

A

In spite of the name, SOAP was not simple
Pursuit of generality made WSDL & SOAP too complicated for use in “normal cases”
Simple things weren’t simple
Inadequate security story
Today, other approaches far more common, e.g. REST

Answer 138

A

Representational State Transfer (REST)
“a software architecture style
consisting of guidelines and best practices for creating scalable web services” [Wikipedia, “Representational state transfer”]

Answer 139

A

An API following REST style

Intended to be simpler alternative to SOAP and WSDL-based Web services
In practice, RESTful systems typically communicate over the HTTP protocol with the same HTTP verbs (GET, POST, PUT, DELETE, etc.)

Answer 140

A

Client–server (storage on server)
Stateless
Cacheable
Layered system
Client may connect to intermediary
Code on demand (optional)
Uniform interface
Identification of resources (e.g., URIs)
Manipulation of resources through these representations
Self-descriptive messages (e.g., MIME type)
Hypermedia as the engine of application state (HATEOS) –THIS ONE IS CONTROVERSIAL

Answer 141

A

GET: list the URIS and perhaps other details of the collection’s members
PUT: replace the entire collection with another collection
POST: create a new entry in the collection. the new entry’s URI is assigned automatically and is usually returned by the operation
DELETE: delete the addressed member of the collection

Answer 142

A

GET: retrieve a representation of the addressed member of the collection, expressed in an appropriate Internet media type.
PUT: replace the addressed member of the collection, or if it doesn’t exist, create it.
POST: not generally used. treat the addressed member as a collection in its own right and create a new entry in it.
DELETE: delete the addressed member of the collection

Answer 143

A

Original REST definition requires “Hypermedia as the engine of application state” (HATEOS)
One accessing the initial REST URI, client must be able to follow server-provided links to (eventually) discover all the resources
Human analogy: from start page can only click
In theory, HATEOS = no need for client to hard code information about application structure or dynamics [https://restfulapi.net/hateoas/]
Some purists insist that REST requires HATEOS
Original definition requires it!
However, I don’t accept this claim…

Answer 144

A

In practice, almost all RESTful APIs do not implement HATEOS (this variant sometimes called “Practical REST”)
Problems with HATEOS:
Increased implementation complexity
Bloats every response with many almost-always-unused links
In the rare cases that HATEOS links are used, encourages “chatty” (slow & resource-intensive) integration as clients must navigate instead of directly requesting what they need
Clients rarely use it – typically clients make a direct request
No de facto standard, so clients can’t easily use HATOS info
HATEOAS only communicates connection, not meaning, so HATEOS info often doesn’t provide enough info to be useful

Answer 145

A

REST builds on HTTP – same general rules
For wire confidentiality, use TLS (SSL)
To authenticate must use agreed-on authentication method
E.G., OAUTH2 (token or key/secret) or basic authentication
Typically on login uses cookies to store session key or other session info
Requestee determines authorization

Answer 146

A

OpenAPI (originally “Swagger spec”)
machine-readable interface files for describing, producing, consuming, and visualizing RESTful Web services
Development overseen by Open API Initiative (of the Linux Foundation)
Language-agnostic
Swagger = common implementation

Answer 147

A

OpenAPI document is a JSON object
may be represented in JSON or YAML
All field names are case sensitive
Primitive data types based on JSON
integer is a type
optional modifier “format”, e.g., a dateTime represented as type=string format=date-time
Defines supported paths, operations (incl. summary & parameters), responses

Answer 148

A

strict timing constraints for messages
compare timestamps on distributed data

Answer 149

A

Fundamental unit: second
Historically, 1/86 400 of a mean solar day
But there are irregularities in the rotation of the Earth
Also, Earth’s rotation is slowing down
Since 1967, a second is based on atomic measures
A second is the duration of 9 192 631 770 periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the cesium 133 atom (in its ground state at a temperature of 0 K)

Answer 150

A

Weighted average of the time kept by over 400 atomic clocks in over 50 national laboratories worldwide
Continuously increasing, accurate, not coordinated with Earth’s rotation

Answer 151

A

Each GPS satellite broadcasts its position & local time
Receivers determine location & time using transmission delay
GPS time was zero at 0h 6-Jan-1980
TAI is always ahead of GPS by 19 seconds

Answer 152

A

Conceptually mean solar time at 0°longitude (Greenwich), but actually uses distant quasars, etc.
Measures Earth’s rotation (thus coordinated with it), but Earth wobbles, so its length of second varies (!)

Answer 153

A

Primary time standard by which the world regulates clocks and time
Tanenbaum uses term nonstandard“Universal Coordinated Time”
Based on TAI, but seconds added/removed to keep within 1 second of UT1 (mean solar time at 0°longitude)
Leap seconds occasionally inserted: 58, 59, 60, 0, 1, …
Insertion preference at the end of December and June
Leap second inserted on 2015-07-01; TAI-UTC=36s
Some want to stop adding leap seconds – this would redefine “day” to be unrelated to the sun and Earth’s rotation
I’m pro-leap-seconds; if you want continuous, use TAI or GPS time

Answer 154

A

Add daylight saving time & timezone to UTC
Eastern Standard Time (EST - US) UTC -0500
Eastern Daylight Time (EDT; summer) UTC -0400
India standard time UTC +0530
Nepal standard time UTC +0545

Answer 155

A

2ρ.Δt (read slide 12 of 05-synch)

Answer 156

A

See third edition section 6.1. Given maximum clock drift rate p (the difference per unit time from a perfect reference clock), and precision “precision” in seconds (the maximum 2 clocks are allowed to be, even if they drift in worse case in opposite directions), then the clocks must be resynchronized at least every precision / (2p) seconds.

Given p=10^-6 (a typical rate for hardware quartz clock), and precision of 10^-5, we have a minimum resynchronization frequency = (precision)/(2p) = (10^-5)/(2*(10^-6)) = every 5 seconds.

Answer 157

A

The client contacts its local name resolver to implement the name resolution process on www.cs.gmu.edu.
The name resolver hands the complete name www.cs.gmu.edu to the root name server “.”.

The DNS root server resolves www.cs.gmu.edu as far as it can, and since it can only resolve to edu, it will return the address of the name server for “edu.”.

The client name resolver contacts the name server for “edu.” and requests it to resolve www.cs.gmu (.edu).

The name server for “edu.” resolves www.cs.gmu (.edu) as far as it can, and since it can only resolve to gmu, it returns the address of the name server for “gmu.edu.”.

The client name resolver contacts the name server for “gmu.edu.” and requests it to resolve www.cs (.gmu.edu).

The name server for “gmu.edu.” resolves www.cs (.gmu.edu) as far as it can, and it returns the address of the name server for “cs.gmu.edu.”.

The client name resolver contacts the name server for “cs.gmu.edu.” and requests it to provide the IP address of www (.cs.gmu.edu).

The name server for “cs.gmu.edu.” provides the IP address of www.cs.gmu.edu.

The client’s local name resolver returns the IP address of www.cs.gmu.edu. This IP address can then be used to initiate the HTTPS protocol to perform a GET of “/about/contact-info”.

Answer 158

A

This is just the computed time offset (θ). This is different from the round-trip delay (δ), though that’s also calculated in the algorithm because results with lower round-trip delay are preferred. However, since the question only asked for the computed time offset, that’s what you should have provided.

This computed time offset is computed using ((t1-t0)+(t2-t3))/2.

answer is 585.5

Answer 159

A

radio broadcast
±10ms due to atmospheric fluctuations
satellite
±500μs knowing the distance to a geostationary satellite
local network
tens or hundreds of ms, due to network stack, load on processors
internet
seconds range, due to routers, queues…

Answer 160

A

Network Time Protocol (NTP) is a networking protocol for clock synchronization to “real” time
Designed for variable-latency data networks
Servers provide time values, clients request time info (it does support peer-to-peer)
Hierarchical: “stratum” counts layers from reference clock (prevents cycles). Stratum 0 = reference clock
Clients & servers include local clock timestamps in messages
More complex algorithm, but gets real time distributed
Clients
Regularly polls three or more NTP servers on diverse networks
Gathers data & determines how to adjust its clock

Answer 161

A

Client regularly polls servers, for each computes time offset (θ) and round-trip delay (δ)
The values for θ and δ are passed through filters and subjected to statistical analysis (book: θ of smallest δ)
Outliers are discarded and an estimate of time offset is derived from the best three remaining candidates
Presumes symmetrical nominal delay
Many details omitted here!

Answer 162

A

definition:if a and b are events, a b denotes that a occurs before b
transitivity:if a b and b  c then a c
if a and b occur in the same process, and a occurs before b, then a  b holds
if a is the event of a message being sent by one process, and b is the event of that message being received by another process then a b holds
definition (interleaving):if a and b happen in two processes that do not coordinate, then neither a b and b a holds

Answer 163

A

definition:C(a) is the clock value when event a occurs
if a and b occur in the same process, and a occurs before b, then C(a) < C(b)
if a is the event of a message being sent by one process, and b is the event of that message being received by another process then make sure C(a) < C(b)
interleaving:if a and b happen in two processes that do not coordinate, then we don’t know the relation between C(a) and C(b)

Answer 164

A

Lamport’s algorithm can be used to ensure that all nodes agree on the ordering of events if
(1) Each time stamped message is sent to everyone in the group,
(2) Messages sent from the same sender are received in the same order
(3) No messages are lost

Answer 165

A

Simultaneous execution - execution of process or computation simultaneously
Need >1 CPU core (but today that’s the normal case, and it’s always true in a distributed system)

Answer 166

A

“concurrency is the property of program, algorithm, or problem decomposability into order-independent or partially-ordered components or units.” [Lamport1978]
Several computations are executing during overlapping time periods—concurrently—instead of sequentially (one completing before the next starts)
Concurrency doesn’t require parallelism – it can be implemented on a single processor (through interleaving)

Answer 167

A

but often same solutions apply to both

Answer 168

A

Process
Each process has a separate memory area from other processes
Thread
Executes code, no attempt to isolate memory of a thread from other threads in the same process
Thread implementation generally maintains minimum information (e.g., CPU context)
Using multiple threads can have higher performance than multiple processes, but using them correctly requires more intellectual effort
Easier to get things wrong, and can be difficult to debug because defects often aren’t reproduceable
Using threads or processes can improve scaleability
Take advantage of those multiple processors you have

Answer 169

A

some advantages of using threads
separation of concerns:different activities in different threads
one thread remains responsive (e.g. user input)even if others are busy or blocked (e.g. waiting for messages) – decreases overall latency
support requests of multiple clients
using threads for replicated computation
pool: assign a thread when a request comes in
more efficient, harder to manage
factory: create a thread when a request comes in
easier to manage, less efficient
threads are supported by a library/VM, the OS, or both
making a process-blocking OS call blocks all threads in some library/VM implementations
calling exit() in one thread terminates the process

Answer 170

A

Threads provide convenient way to allow blocking system calls without blocking entire process
Good for distributed systems – easier to express communication with multiple logical connections at same time
Distributed systems can impose significant delays in communication between components – don’t want to block everything

Answer 171

A

shared
memory (address space)
distributed shared memory
objects
distributed object stores
files
distributed file systems

Answer 172

A

P and Q running concurrently.
For the moment we’ll assume
+ and * are atomic (normally
they are not) (see slide 11and 12 lecture 6)

x = x + 3 often implemented like this:
Load value of x
Load constant 3
Add them
Store result in x

… so even “simple” operations like addition are often not atomic (they can be broken down)

Answer 173

A

single data repository (blackboard)
all components read/write on blackboard

Answer 174

A

multiple clients may access same DB records

Answer 175

A

distribution
network propagation delays
different clocks
causes inconsistency
given one trace of events, different components may see those events in a different order

Answer 176

A

concurrency
causes unpredictability
cannot tell which trace will occur

distribution
network propagation delays
different clocks
causes inconsistency
given one trace of events, different components may see those events in a different order

Answer 177

A

definition
ordering (aka data-centric) consistency:
all components observe operations on shared data
at the same time
strict consistency – only possible with
shared clock and
insignificant propagation delays
in the same order
linear consistency
in an order that “makes sense”
sequential, causal, FIFO consistency

Answer 178

A

address both predictability and ordering consistency.

programmers need to useexplicit synchronization techniques anyway,because of unpredictability (due to concurrency)

explicit synchronizationrelieves the middleware/OS fromhaving to assure ordering consistency

Answer 179

A

used for explicit synchronization

Monitor provides a queue with certain entry condition that is used to guarantee only one process operates on the critical section (data) at a time

Answer 180

A

events or actions of interest?
arrival and departure
identify processes
arrivals, departures and CarPark control
define structure and interactions

Answer 181

A

Reasons for replication:
Improve reliability – can continue working even if some component fail (see fault tolerance)
Performance – can distribute work & put data near where it’s needed
Problem: Can lead to consistency problems
Challenge to keep replicants consistent

Answer 182

A

Often focus on shared data aka “data store”
May be (distributed) shared database, shared filesystem, shared memory, etc.
Consistency model = a contract between processes & data store (if processes do X, data store promises Y)
Without a global clock not easy to define “last write”
Often can accept some inconsistencies… but need to bound it, & thus need to categorize them

Answer 183

A

Deviation in values between replicas
Absolute numerical deviation (“no more than $0.02”)
Relative numerical deviations (“no more than 0.5%”)
Deviation in staleness (how old)
“Data no older than X seconds”
Deviation in ordering of update operations
This is more complex! Sequential consistency, causal consistency, eventual consistency, …

Answer 184

A

“The result of any execution is the same as if the read & write operations by all processes on the data store were executed in some sequential order and the operations of each individual process appear in this sequence in the order specified by its program.”
Any valid interleaving of read & write operations is acceptable, but all processes see the same interleaving
Expensive to implement in distributed system

Answer 185

A

Weakens causal consistency – distinguishes what is potentially causally related
“Writes that are potentially causally related must be seen by all processes in the same order. Concurrent writes may be seen in a different order on different machines.”

Answer 186

A

“If no updates take place for a long time, all replicas will gradually become consistent [have exactly the same data]”
Updates must eventually propagate to all replicas
Problem: write-write conflicts (same data item written with different values)
Often the solution is an algorithm that declares one as the “winner” (cancelling the effects of any previous conflict)
Often cheap to implement, at a cost of inconsistency for a period of time

Answer 187

A

Give up the idea of a “central data store”
Provide consistency guarantees from POV of 1 client,
No guarantees about different clients
Various consistency models:
Monotonic reads: If a process reads the value of a data item x, any successive read of x by that process will always return that same or more recent value (“never read older version”)
Monotonic writes: A write by a process on data item x is completed before any later write on x by the same process
Read-your-writes: The effect of a write operation by a process on data item x will always be seen by a successive read operation on x by the same process
Writes follow reads: A write operation by a process on data item x following a previous read operation on x by the same process is guaranteed to take place on the same or more recent value of x that was read (“See a posting about an article only if saw original article”)

Answer 188

A

none of them are true:

The network topology does not ever change.

Network bandwidth is infinite.

The network is reliable.

Network latency is zero.

Answer 189

A

In RMI, unlike in RPC, applications send messages to logical contact points.

Answer 190

A

3

System down = all servers are down.

Probability the system is down= (probability of each server down)^N

Goal ≥ Probability the system is down

Probability each server is down = 20% = 20/100 = 0.2

Goal = 0.7% = 0.7/100 = 0.007

Probability the system is down = (0.2)^N

N = 2 ==> Probability the system is down = (0.2)*(0.2) = 0.04

N = 3 ==> Probability the system is down = (0.2)(0.2)(0.2) = 0.008

N = 4 ==> Probability the system is down = (0.2)(0.2)(0.2)*(0.2) = 0.0016

Goal = 0.007 ≥ Probability the system is down = 0.0016

So, the minimum number of servers needed to reduce the down time to 0.7%, including the master, is 4. That means only 3 additional replicas are needed (since the question expressly asked you to not count the master server in the count of replicas).

Answer 191

A

Once a transaction commits, the changes are permanent.

Answer 192

A

install a handler to be called when a message is put into the specified queue

Answer 193

A

Check a specified queue for messages, and remote the first. Never block

Answer 194

A

8

We need “F” (Fail) probability, and all we’re given is the probability of success for each component. This is easily resolved:

F = 1 - Success

n >= (log Goal)/(log F)

so find the smallest n (the “ceiling”) that meets this inequality. Since “n” has to be an integer (you can’t use a part of a component), you’ll typically have to go up to the smallest integer greater than or equal to this calculation (this is also called the “ceiling”). Finally: Do we include the “master” in this case? As discussed in class, people aren’t consistent in their terminology so you have to look at it for each question. In this case, the question clearly states that we do include the “master” system in this case, so you don’t subtract by one.

Answer 195

A

The client procedure calls the client stub in the normal way.

The client stub builds the message (e.g., “marshalling” the parameters) and calls the local operating system asking the message to be sent via the network.

The client’s operating system sends the message to the remote (server) operating system.

The remote operating system gives the message to the server stub.

The server stub unpacks the parameter(s) (aka “unmarshalls” the parameters) and calls the server.

The server does the work and returns the result to the stub.

The server stub packs the result into a message (including “marshalling” the parameters) and calls its local operating system.

The server’s operating system sends the message to the client’s operating system.

The client’s operating system gives the message to the client stub.

The client stub unpacks the result and returns it to the caller within the client.

Answer 196

A

Persistence refers to whether the messages endure in the system (regardless of the sender being active and the recipient being available) or not.

Answer 197

A

publish-subscribe

Answer 198

A

false

Read carefully!

It’s true that “There are two models for activating service supply: factory & pool.”

The later text is also true, because it’s true that “An advantage of pre-creating instances is that requests can often be handled more quickly, since the instance is already ready to go (this effect is more important if instantiation takes a long time). A disadvantage of pre-creating instances is that the instance typically requires resources (at least storage) even when it is not being used.”

However, the definitions of factory and pool are completely reversed.

In the pool model, “the system instantiates all the instances that might be needed ahead-of-time, and then when a request shows up, one of those pre-created instances is allocated to the request”.

In the factory model, an instance is created at the time of the request.