622 Flashcards

1
Q

Eight pleasant thoughts

A

-The network is reliable
-The network is secure
-The network is homogenous
-The topology does not change
-Latency is zero
-Bandwidth is infinite
-Transport cost is zero
-There is one administrator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a distributed system?[Tanenbaum]

A

A collection of independent computers that appears to its users as a single coherent system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Three key characteristics [Tanenbaum]

A

Multiple machines are autonomous
Software lets users see a single system
System easy to expand without user noticing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a distributed system?[Webopedia]

A

A type of computing in which different components and objects comprising an application can be located on different computers connected to a network.
Key requirement:set of standards that specify how objects communicate with one another(e.g. CORBA, DCOM, REST, …).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

[Wikipedia] distributed computing:

A

decentralized and parallel computing, using two or more computers communicating over a network to accomplish a common objective or task.

Note:The types of hardware, programming languages, operating systems and other resources may vary drastically. It is similar to computer clustering with the main difference being a wide geographic dispersion of the resources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Challenges of DS

A

-latency of communication
-coordination
-shared resources and mutual exclusion
-ordering, deadlock and live-lock
-timing
-adaptation to change
-failures, soft faults, and optimization
-service discovery and configuration
-heterogeneity and third-party software
-scalability and evolution
-security and privacy
-trust on machines, software, communications & other users

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Advantages of DS

A
  • processing capacity
  • fault tolerant, evolving, scalable
    -explicit control, preferences
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Replicas

A

Often useful to have same task performed by multiple components so all have to fail for task to fail
What if data must be shared between components?
Often one component is “master” (aka “original” or “authoritative version”)
The other components are copies from the master
This may be apportioned, e.g., a component may be a master for just some portion like “names A-K”
Confusion in counting the number of “replicas”:
Some might not include the “master” in the count of replicas
Some might include the “master” (“replicas of each other”)
Make sure you know if the “master” is included

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how to solve for number of replicas

A

If we assume independence and:
F = Probability that one replica fails in time period, F≠1
n = (natural) number of components (e.g., replicas including master). Thus F^n is the probability all n will fail simultaneously
G = Goal, permitted probability of total system failure where all n replicas fail (including original)

F^n ≤ G or
n ≥ (log G)/(log F)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Independence assumption

A

These calculations assume independent failures
Reasonable model for many hardware failures
Software failures often not independent
Knight & Leveson [1986] found via experiment that software faults are not independent
Thus “N-version programming” doesn’t lead to the reliability increase you might predict
It can be helpful, but less than you’d think

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Caching: Special case of replication

A

Make cop(ies) of a resource (data)
Often happens on demand
Other replication approaches often planned & executed in advance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Challenges of DSexample: replication has downsides

A

buy more hardware
administration costs
software upgrades
load balancing
performance overhead
more complex software
consistency problems
sometimes tolerable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

hiding access

A

hide differences in data representation and how a resource is accessed

(conversion of complex fortmats. latency vs fidelity of access)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

hiding location

A

hide where a resource is located (trusted hosts, different performance, difference capabilities and network access)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

migration

A

hide that a resource may move to another location (trusted hosts, different performance, difference capabilities and network access)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

relocation

A

hide that a resource may be moved to another location while in use (trusted hosts, different performance, difference capabilities and network access)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

replication

A

hide the fact that several copies of a resource exist (select server based on QoS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

concurrency

A

hide that a resource may be shared by several competitive users(cannot hide sharing of resources: they’re consumed , data is modified by others)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

failure

A

hide the failure and recovery of a resource(unexplained behavior)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

persistence

A

hide whether a (software) resource is in memory or on disk(someone needs to decide whether an object is persistent and commit it to disk)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

awareness and adaptation

A

separate decisions from (controllable) mechanisms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Some measures (per Neumann) for system scaleability

A

Size
Users & resources
Geographical
May lie far apart
Administrative
May span many
independent
administrative
organizations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Decentralized algorithms

A

No machine has complete information about the system state.
Machines make decisions based only on local information.
Failure of one machine does not ruin the algorithm.
There is no implicit assumption that a global clock exists

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Asynchronous communication

A

Hiding communication latencies important for geographical scaleability
Max speed is speed of light in vacuum (~3.00×108 m/s)
Information transfer through material normally less
Physical components have other performance latencies
Software takes time to execute once it receives data
Sending information and waiting for reply is synchronous communication
Alternative: Asynchronous – send information, don’t wait for reply

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Moving location of execution

A

E.G., when checking form inputs, consider two options:
Send each input to server & wait for reply – maybe long delay
Could move checking code to client
Now check response can be immediate (once code is there)
Can be special-case (e.g., HTML5 form validators)
Can be general (e.g., a general execution engine like Javascript)
Beware of security ramifications
Often two sides (e.g., client & server) cross trust boundary
Security checks must often be redone on server in many cases
Server can’t trust client
Many checks are done on both client (for speed) and server (for security)
Client must often check the data it’s asked to execute (especially if it’s a full language like Javascript)
Client can’t trust server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Cloud computing

A

Clouds widely used, often misunderstood
Clouds often cheaper (where appropriate), many variations
Decisions to use cloud (and how) impact security
NIST Definition of Cloud Computing (NIST SP 800-145):
Cloud computing is “a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.”
Five essential characteristics: On-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service
Virtualization is common, but not required, to be a cloud
Book makes this (common) mistake

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Cloud service models

A

Infrastructure as a Service (IaaS): “consumer [can] deploy and run arbitrary software [including] operating systems and applications….”
Platform as a Service (PaaS): “consumer [can deploy] consumer-created or acquired applications…” [on top of provided platform]
Software as a Service (SaaS): “consumer [can] use the provider’s applications running on a cloud infrastructure…”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

OSI 7 layer model (Open Systems Interconnection)

A

physical, data link, network, transport, session, presentation, application

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

physical

A

specifies: pin layout, voltages, modulation
does: establish & terminate access to medium,flow control, contention resolution
at this level: hubs, repeaters,network adapters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

data link

A

specifies: how to transfer data in a LAN
does: detect and correct errors
at this level: MAC addresses (flat, HW-based)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

network

A

specifies: how to transfer data sequences across LANs (e.g., IP)
does: routing
at this level: hierarchical address scheme,routers, bridges & switches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

IP Service Model

A

Connectionless (datagram/packet-based)
Best-effort delivery (unreliable service)
packets are lost
packets are delivered out of order
duplicate copies of a packet are delivered
packets can be delayed for a long time
Datagram format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Datagram forwarding

A

Strategy
every datagram contains destination’s address
if directly connected to destination network, then forward to host
if not directly connected to destination network, then forward to some router
forwarding table maps network number into next hop
each host has a default router
each router maintains a forwarding table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Forwarding Tables

A

Suppose there are n possible destinations, how many bits are needed to represent addresses in a routing table?
log2n
So, we need to store and search n * log2n bits in routing tables?
We’re smarter than that!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Global Addresses

A

Globally unique,
hierarchical: network+host
Dot Notation
10.3.2.4
128.96.33.81
192.12.69.77

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Transport

A

specifies: reliable transference of data(e.g., TCP, UDP)
does: flow control, segmentation, error control,retransmission

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

UDP

A

(User Datagram Protocol)
connectionless - sends independent packets of data, called datagrams, from one computer to another with no guarantees about arrival
each time a datagram is sent, the local and receiving socket address need to be sent as well

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

TCP

A

(Transmission Control Protocol)
connection-oriented - provides a reliable flow of data between two computers: data sent from one end of the connection gets to the other end in the same order
in order to communicate using TCP protocol, a connection must first be established between the pair of sockets
once two sockets have been connected, they can be used to transmit data in both (or either one of the) directions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Overhead

A

UDP - every time a datagram is sent, the local and receiving socket address need to be sent along with it
TCP - a connection must be established before communications between the pair of sockets start (i.e. there is a connection setup time in TCP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Packet size

A

UDP - there is a size limit of 64 kilobytes per datagram
TCP - there is no limit; the pair of sockets behaves like streams

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

reliability

A

UDP - there is no guarantee that the sent datagrams will be received in the same order by the receiving socket
TCP - it is guaranteed that the sent packets will be received in the order in which they were sent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

which protocol to use?`

A

TCP - useful when indefinite amount of data need to be transferred ‘in order’ and reliably

UDP - useful when data transfer should not be slowed down by the extra overhead of the reliable connection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

session

A

specifies: establishing long lived connections
does: checkpointing, adjournment, restart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

presentation

A

specifies: data formats and transformation(e.g., MIME)
does: serialization, compression, encryption, encoding transformation (EBCDIC/ASCII)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

application

A

specifies: application-specific protocols(e.g., http, smtp, ftp, telnet)
does: support app-specific functionality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

goal of the OSI

A

separation of concerns enables good implementationat each level
each layer is independentof the ones on top
layer n depends on the spec of n-1, but not on its implementation/manufacturer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

port

A

Generally, a computer has a single physical connection to the network
this connection is identified by the computer’s 32-bit IP address
all data destined for a particular computer arrives through this connection
TCP and UDP use ports to identify a particular process/application
port = abstract destination point at a particular host
each port is identified by a positive 16-bit number, in the range 0 - 65,535
port numbers 0 - 1023 are reserved for well-known services (HTTP - 80, telnet – 23)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

socket

A

basic abstraction for network communication

“end-point of communication” uniquely identified with IP address and port
example: Socket MyClient = new Socket(“Machine name”, PortNumber);
gives a file-system like abstraction to the capabilities of the network
two end-points communicate by “writing” into and “reading” out of socket
there are two types of transport via sockets
reliable, byte-stream oriented unreliable datagram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

socket programming with TCP

A

Server Side:
server runs on a specific computer and has a socket bound to a specific port number
server listens to the socket for a client to make a connection request
Client Side:
client tries to rendezvous with the server on the server’s machine and port

Server Side:
the server accepts the connection by creating a new socket bound to a different port
Client Side:
if the connection is accepted, the client uses the new socket to communicate with the server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Socket programming with UDP

A

All clients use the same socket to communicate with the server
Packets of data (datagrams) are exchanged
No new sockets need to be created

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

C- vs. Java- socket programming

A

Java keeps all the socket complexity “under the cover”
It does not expose the full range of socket possibilities
But, it enables sockets to be opened/used as easily as a file would be opened/used

By using the java.net.Socket class instead of relying on native code, Java programs can communicate over the network in a platform-independent fashion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

Java socket programming

A

all classes related to sockets are in java.net package
Socket class - implements client sockets (also called just “sockets”)
ServerSocket class - implements server sockets
A server socket waits for requests to come in over the network. It performs some operation based on that request, and then possibly returns a result to the requester.
DatagramSocket class - socket for sending and receiving datagram packets
DatagramPacket class - represents a datagram packet
Datagram packets are used to implement a connectionless packet delivery service. Multiple packets sent from one machine to another might be routed differently, and might arrive in any order.
InetAddress class - represents an Internet Protocol (IP) address
MulticastSocket class - useful for sending and receiving IP multicast packets.
A MulticastSocket is a (UDP) DatagramSocket, with additional capabilities for joining “groups” of other multicast hosts on the internet. A multicast group is specified by a class D IP address.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

what does middleware offer?

A

conceptual model for communication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

different styles have different data sharing assumptions

A

RPC
(address space or memory)

RMI
object refs (middleware)

messages

data store -> files/objects persistent store

data stream -> and <- data store/source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

read slide 49-60 in lecture 2

A

ok

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

RPC is implemented by

A

sending messages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

where to send RPC messages?

A

hardwired for fixed deployment
some RPC environments support dynamic binding (more to come during the lecture on Service Discovery)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

solution to object refs

A

increase granularity from bytes to objects

both local objects and references to remote objects are passed by value (serialization)
the result of the called method is also serialized and passed back to the caller

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

difference between RMI and RPC?

A

RMI:
doesn’t try to hide distribution in the language:remote objects are declared “remote”
marshalling is simplified
by passing by value only(object references can be used in nested RMIs)
(in Java) by having JVMs hide platform dependencies in data representation
serialization could be much heavier by having to pass the code for the objects with every call, but that can be avoided by passing URLs for downloading the code, rather than the code itself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

reasons to escape the call return style

A

no result needs to be returned
a server may not be availableat the time of the request
make the client more responsiveto other events/user
allow any component to initiate communication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

some middleware push the envelope

A

dealing with errors:idempotent, at-least-once, at-most-once…
the promised simplicity of procedure calling sometimes hinders more sophisticated solutions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

when to use call return style

A

the server is ready to process each request
components and network are mostly reliable
not many concurrent events in the caller:it is fine to block the caller
one component (client) has the initiative,others (servers) wait for requests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

what is needed for RMI

A

Java makes RMI (Remote Method Invocation) fairly easy, but there are some extra steps
To send a message to a remote “server object,”
The “client object” has to find the object
Do this by looking it up in a registry
The client object then has to marshal the parameters (prepare them for transmission)
Java requires Serializable parameters
The server object has to unmarshal its parameters, do its computation, and marshal its response
The client object has to unmarshal the response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

remote object

A

an object on another computer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

client object

A

object making the request

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

server object

A

object receiving the request((can easily trade roles with the client object)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

rmiregistry

A

special server that looks up objects by name

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

rmic

A

special compiler for creating stub (client) and skeleton (server) classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

processes for RMI

A

The Client
The Server
The Object Registry, rmiregistry, which is like a DNS service for objects
You also need TCP/IP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

interfaces

A

Interfaces define behavior
Classes define implementation
Therefore,
In order to use a remote object, the client must know its behavior (interface), but does not need to know its implementation (class)
In order to provide an object, the server must know both its interface (behavior) and its class (implementation)
In short,
The interface must be available to both client and server
The class should only be on the server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

remote class

A

one whose instances can be accessed remotely On the computer where it is defined, instances of this class can be accessed just like any other object
On other computers, the remote object can be accessed via object handles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

serializable class

A

one whose instances can be marshaled (turned into a linear sequence of bits)
Serializable objects can be transmitted from one computer to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

conditions for serializability

A

If an object is to be serialized:
The class must be declared as public
The class must implement Serializable
All fields of the class must be serializable: either primitive types or serializable objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

parts of remote class

A

The interface (used by both client and server):
Must be public
Must extend the interface java.rmi.Remote
Every method in the interface must declare that it throws java.rmi.RemoteException (other exceptions may also be thrown)

The class itself (used only by the server):
Must implement a Remote interface
Should extend java.rmi.server.UnicastRemoteObject
May have locally accessible methods that are not in its Remote interface

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

remote object

A

lives on another computer (like server)

You can send messages to a Remote object and get responses back from the object
All you need to know about the Remote object is its interface
Remote objects don’t pose much of a security issue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

serializable object

A

You can transmit a copy of a Serializable object between computers
The receiving object needs to know how the object is implemented; it needs the class as well as the interface
There is a way to transmit the class definition
Accepting classes does pose a security issue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

server class

A

The class that defines the server object should extend UnicastRemoteObject
This makes a connection with exactly one other computer
If you must extend some other class, you can use exportObject() instead
Sun does not provide a MulticastRemoteObject class
The server class needs to register its server object:
String url = “rmi://” + host + “:” + port + “/” + objectName;
The default port is 1099
Naming.rebind(url, object);
Every remotely available method must throw a RemoteException. Why?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

rmic

A

The class that implements the remote object should be compiled as usual
Then, it should be compiled with rmic:
rmic Hello
This will generate files Hello_Stub.class and Hello_Skel.class
These classes do the actual communication
The “Stub” class must be copied to the client area
The “Skel” was needed in SDK 1.1 but is no longer necessary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

overlay networks

A

network/transport support for multicast
hasn’t worked well in practice
overlay networks at application layer!
result is a logical network built on top of a (probabaly different) network
new network optimized for application
but may have performance issues due to underlying network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q

distributed solution

A

tree network approach
function to map topics to nodes
identifies unique root node for a given topic
follow() request follows tree
if node has not seen request
become a “forwarder”
make sender your “child” for that request
forward request to next node in tree
if node has seen request
make sender your “child” for that request
send() request follows topic tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
81
Q

mesh network “epidemic” approach

A

node states: infected, susceptible, removed
goal is to become infected!
P picks a random Q
push: P updates pushed to Q // not so good…
pull: Q updates pulled to P
push-pull: both
gossip adaptation
less interest if receiver already has update

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
82
Q

removing data

A

suppose an update is deleted
what happens when old copy is found?
distributed systems are different!
notion of a death certificate
“that update doesn’t matter anymore”
have to keep that update
how long?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
83
Q

persistence

A

once sent, messages endure in the system, regardless of the sender remaining activeand the recipient being available

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
84
Q

synchronicity

A

the sender continues after sending,or blocks, waiting for the message
to be delivered (buffered)
to be received (read)
to be processed

85
Q

primitives and their meaning

A

socket: create a new communication endpoint
bind: attach a local address to a socket
listen: announce willingness to accept connections
accept: block caller until a connection request arrives
connect: actively attempt to establish a connection
send: send some data over the connection
receive: receive some data over the connection
close: release the connection

86
Q

when to use messaging style

A

component interactions don’tfollow a strict call-return pattern
make components more responsiveto other events/user (no blocking)
allow any component to initiate communication
components may not be available to receive/process messages (persistency)

87
Q

when to use data sharing style

A

(persistent) data plays a central role in the system
components don’t need to synchronize control flow other than on data availability or values
E.g., modern database management systems, distributed file systems

87
Q

space decoupling

A

interaction partners do not need to know each other

87
Q

data streaming style is a variant where

A

data is stored/generated at one place and consumed by one or more clients
timeliness of data delivery is crucial
asynchronous (unbound, e.g., caching)
synchronous (upper bound, e.g., sensors)
isochronous (bounded jitter, e.g., media)
complex data may be transmitted as separate streams which need to be synchronized: e.g. video + stereo sound
streaming is supported by middleware (e.g. RSVP) on top of the data link network layer

87
Q

time decoupling

A

the interaction partners do not need to be actively participating in the interaction at the same time

88
Q

synchronization decoupling

A

publishers aren’t blocked while producing events, subscribers can get asynchronously notified of the occurrence of an event

89
Q

event filters and patterns

A

An event filter selects event notifications by specifying a set of attributes and constraints on the values of those attributes

A pattern is composed of several filters
A subscription can be expressed as a filter or a pattern

90
Q

event matching

A

A subscription matches an event notification when the notification satisfies all the constraints specified in the subscription

91
Q

publish subscribe and advertise

A

Nodes publish event notifications to access points

Nodes subscribe to access points in order to receive event notifications
Specified by filters and/or patterns

Advertisements
defines the event notifications a node may possibly generate using the same semantics as filters

92
Q

filtering

A

The goal of filtering
only deliver messages “of interest” to nodes, reducing the overall traffic across the network
The system is aided by use of advertisements

93
Q

routing algorithms

A

Server must establish appropriate routing path to ensures that notification published by objects of interest are correctly delivered to all the interested parties that subscribed to them

Simplest strategy is to maintain the subscriptions at their access point and broadcast the notification throughout the network
Least efficient
Consumes lots of bandwidth

94
Q

SIENA

A

Central idea is to send the notification towards the event servers that have clients that are interested in that notification (possibly using shortest path)
Downstream replication
Notification should be replicated only downstream and as close as possible to parties interested in it

Upstream evaluation
Filters are applied and patterns are assembled upstream – as close as possible to the source of notification

Subscription forwarding
Routing paths for notification are set by subscriptions which are propagated throughout the network so as to form a tree that connects subscribers to all the servers in network

95
Q

review siena routing from reading assignment

A

ok

96
Q

name resolution

A

a mapping between:

identifier of an entity
(app/process, file)

address of an access point (network address, stub/proxy, piece of hardware plugged to network)

entities connect to the network via one or more access points, which can be reached at an address

97
Q

identifier

A

refers to at most one entity
each entity is referred to by at most one identifier
an identifier always refers to the same entity (i.e., not reused)

98
Q

name spaces

A

names are normally organized into name spaces ex. file names

usually organized as a directed, acyclic graph
name represented as a path through nodes in the graph
absolute path starts from root
relative path starts from an arbitrary point
paths represented as <link-1, …, link-n> or /link-1/…/link-n

99
Q

look at slides 10-15 on lecture 4

A

ok

100
Q

recursive resolutions

A

could lead to shorter response time, but
increases resource requirements in name servers
effectiveness of caching depends heavilyon stability of addresses (mobility is an issue)

many resolution servers support only iterative resolution

101
Q

insight: for many applicationsentity names are irrelevant

A

an application may need to find a component with certain capabilities, e.g., a spell checker, or a nearby printer
“resolution” should be guided by the capabilities,not the identity (name) of such components

for other purposes, the identity (name) is still important (e.g. web servers and email servers)
since these components are not typically mobile, conventional name resolution can still be applied

102
Q

service type

A

type name, e.g., printer, speech recognition
Note: service is ambiguously used to designate(a) service instance (b) service type (c) service supplier
version
interface signature
how to request a service from the supplier, e.g., Java interface
ontology
relations among types

difficulty: relations are not always hierarchical
example frameworks: DAML/OWL, UDDI, WSDL

103
Q

quality of service

A

static attributes
intrinsic to the suppliernot dependent on circumstances or resources
e.g., printer supports color and duplex
dynamic attributes
dependent on usage history and resources
e.g., latency (size of Q, available CPU, bandwidth…),accuracy (used algorithms, iterations, quality of data)
subject to tradeoffs
database query: fast vs. complete
language translation: speed/cost vs. accuracy
printing: high quality printing with long Qvs. low quality with short Q

104
Q

context

A

physical characterizationof where, when and how the service will be provided
static attributes
e.g., location of a wall-mounted display
dynamic attributes
e.g., location of a PDA, printer queue size
implications to privacy and security
e.g., is the wall-mounted display in a private roomor at a lounge with public access
context is more of an issue for some kinds ofservices than others (non-interactive)
distinguish computation from presentation of results

105
Q

existing discovery middleware supports different levels of service description

A

bare bones
name of service type, address/stub to reach supplier
E.g., RMI
some include spec of API signatures
e.g., Jini/JNDI (Java interface)
a few include QoS
Web Services describes generic attributes such as price and reliability, but not service-specific attributes
E.g., web services (covered later in this lecture)
some research middleware includeservice-specific QoS, context, privacy and security

106
Q

what if more than one supplier matches the description?

A

depends on the level of service description

bare bones
no way to distinguish, just pick one
some include spec of API signatures
pick one that is compatible
trust, QoS & context
use a quantitative framework (e.g. utility functions)to evaluate which one is best
requires richer description of the service requirementse.g. find a duplex printer < 100 ft away and with < 2 minute wait
p1 is 102 ft away, 2s wait
p2 is 95 ft away, 1 minute wait
p3 is 10 ft away, 3 minute wait

107
Q

which component to talk to?

A

service discovery: description of capabilities -> address of an access point

name resolution: identifier of an entity -> address of an access point

108
Q

how to describe servicesand on the mechanisms to make discovery work

A

mechanisms to make it work: description of capabilities -> address of an access point

109
Q

how about the discovery mechanisms?

A

mechanisms to make it work: description of capabilities -> address of an access point

110
Q

directed discovery

A

clients are configured with a list of addressto go ask for services

111
Q

client-initiated broadcast

A

clients broadcast service requests on demand

112
Q

supplier-initiated broadcast

A

suppliers broadcast their capabilities periodically

113
Q

directory-based discovery

A

suppliers post their capabilities on a directory
clients query the directory

114
Q

should the mechanisms enforce boundaries to discovery?

A

broadcasting-based discovery is boundedby the network policy for broadcast (usually LAN)
directed and directory-basedoffer more control of scalability
hard question: how do directories coordinate?
how far, which directories, to direct a query?

115
Q

what is a service?

A

the act of performing helpful or useful laborthat does not produce a tangible commodity

116
Q

the notion of service implies_______between supplier and consumer

A

separation of concerns

117
Q

computing definition of service

A

the act of performing helpful or useful labor,where the service supplier is developed separately from consumers and may serve many consumers

118
Q

services become first class

A

suppliers register their capabilities,consumers look for services not specific components

119
Q

two models for activating service supply:

A

factory & pool

120
Q

stateful suppliers vs stateless

A

stateful keeps state of conversation while stateless doesn’t

121
Q

middleware may alleviate dependencyon programming language and OS but

A

introduces dependency on middleware

122
Q

Middleware solutions have significant challenges

A

too many “standards”one born every few months: code evolution nightmare
integration with legacy systemsmillions of LOC and billions of $ already invested
strategy: wrappers around old code
with wrappers latency becomes an issue

123
Q

web servicescome in as an integration technology

A

focus on bridging existing technologies
key characteristic: middleware for middleware
it’s about how to access an application,it is not an implementation technology
looser coupling than RPC-based middleware
avoid proprietary APIs
Simple Object Access Protocol (SOAP)based on sending XML messages over http,no SOAP API or ORB
WSDL & SOAP are not widely used today, but it’s important to understand why – complexity is a killer

124
Q

web servicesintroduces its own set of standards

A

directory (UDDI: universal description, discovery and integration)

service description(WSDL: web services description language)

messages(SOAP: simple object access protocol)

which work on top of:
data types(XML Schema)
data(XML)

125
Q

Orchestration

A

an approach for service composition and coordination. A central service coordinates the invocation of other services to achieve the system’s functionality

126
Q

Choreography

A

another approach for service composition. a decentralized approach to coordination of services, where each service knows how it needs to behave to achieve the system’s functionalities

127
Q

BPEL

A

Business process execution language. often used for modeling the coordination among services. BPEL engine executes the model and exposes it as a service

128
Q

in-only communication pattern

A

receives a message but will not respond

129
Q

robust in-only communication pattern

A

receives a message and may issue a fault message

130
Q

in-out communication pattern

A

receives a message and may issue a reply or fault message

131
Q

rpc

A

call-return, only valid for in-only and in-out patterns

132
Q

iri

A

(International Resource Identifier)message can be serialized as an IRI
multipart: …
The word “style” here is not the same as what we called “communication style” in this class

133
Q

binding

A

defines the communication protocoltypically SOAP

134
Q

service (java)

A

associates theinterfaces with a URLand protocol (binding)

135
Q

WS descriptionsmay be posted on UBRs

A
  1. SW companies, standards bodies, and programmers populate the registry with
    descriptions of different types of services
  2. Businesses populate the registry with
    descriptions of the services they support
  3. UBR assigns a programmatically unique identifier to each service and business registration
  4. Marketplaces, search engines, and business apps query the registry to discover services at other companies
  5. Business uses this data to facilitate easier integration with each other over the Web
136
Q

UDDI business registry

A

supports business registrations.

XML document
created by supplier company (or on its behalf)
may have multiple service listings

137
Q

Wrap-up of WSDL & SOAP

A

In spite of the name, SOAP was not simple
Pursuit of generality made WSDL & SOAP too complicated for use in “normal cases”
Simple things weren’t simple
Inadequate security story
Today, other approaches far more common, e.g. REST

138
Q

what is REST?

A

Representational State Transfer (REST)
“a software architecture style
consisting of guidelines and best practices for creating scalable web services” [Wikipedia, “Representational state transfer”]

139
Q

RESTful API

A

An API following REST style

Intended to be simpler alternative to SOAP and WSDL-based Web services
In practice, RESTful systems typically communicate over the HTTP protocol with the same HTTP verbs (GET, POST, PUT, DELETE, etc.)

140
Q

Original Formal REST Design Constraints

A

Client–server (storage on server)
Stateless
Cacheable
Layered system
Client may connect to intermediary
Code on demand (optional)
Uniform interface
Identification of resources (e.g., URIs)
Manipulation of resources through these representations
Self-descriptive messages (e.g., MIME type)
Hypermedia as the engine of application state (HATEOS) –THIS ONE IS CONTROVERSIAL

141
Q

Applying RESTful API (Collection URI)

A

GET: list the URIS and perhaps other details of the collection’s members
PUT: replace the entire collection with another collection
POST: create a new entry in the collection. the new entry’s URI is assigned automatically and is usually returned by the operation
DELETE: delete the addressed member of the collection

142
Q

RESTful API (element URI)

A

GET: retrieve a representation of the addressed member of the collection, expressed in an appropriate Internet media type.
PUT: replace the addressed member of the collection, or if it doesn’t exist, create it.
POST: not generally used. treat the addressed member as a collection in its own right and create a new entry in it.
DELETE: delete the addressed member of the collection

143
Q

Hypermedia as the engine of application state (HATEOS) (1)

A

Original REST definition requires “Hypermedia as the engine of application state” (HATEOS)
One accessing the initial REST URI, client must be able to follow server-provided links to (eventually) discover all the resources
Human analogy: from start page can only click
In theory, HATEOS = no need for client to hard code information about application structure or dynamics [https://restfulapi.net/hateoas/]
Some purists insist that REST requires HATEOS
Original definition requires it!
However, I don’t accept this claim…

144
Q

Hypermedia as the engine of application state (HATEOS) (2)

A

In practice, almost all RESTful APIs do not implement HATEOS (this variant sometimes called “Practical REST”)
Problems with HATEOS:
Increased implementation complexity
Bloats every response with many almost-always-unused links
In the rare cases that HATEOS links are used, encourages “chatty” (slow & resource-intensive) integration as clients must navigate instead of directly requesting what they need
Clients rarely use it – typically clients make a direct request
No de facto standard, so clients can’t easily use HATOS info
HATEOAS only communicates connection, not meaning, so HATEOS info often doesn’t provide enough info to be useful

145
Q

REST Security

A

REST builds on HTTP – same general rules
For wire confidentiality, use TLS (SSL)
To authenticate must use agreed-on authentication method
E.G., OAUTH2 (token or key/secret) or basic authentication
Typically on login uses cookies to store session key or other session info
Requestee determines authorization

146
Q

OpenAPI / Swagger

A

OpenAPI (originally “Swagger spec”)
machine-readable interface files for describing, producing, consuming, and visualizing RESTful Web services
Development overseen by Open API Initiative (of the Linux Foundation)
Language-agnostic
Swagger = common implementation

147
Q

OpenAPI basics

A

OpenAPI document is a JSON object
may be represented in JSON or YAML
All field names are case sensitive
Primitive data types based on JSON
integer is a type
optional modifier “format”, e.g., a dateTime represented as type=string format=date-time
Defines supported paths, operations (incl. summary & parameters), responses

148
Q

synchronized clocks

A

strict timing constraints for messages
compare timestamps on distributed data

149
Q

Time

A

Fundamental unit: second
Historically, 1/86 400 of a mean solar day
But there are irregularities in the rotation of the Earth
Also, Earth’s rotation is slowing down
Since 1967, a second is based on atomic measures
A second is the duration of 9 192 631 770 periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the cesium 133 atom (in its ground state at a temperature of 0 K)

150
Q

International Atomic Time (TAI)

A

Weighted average of the time kept by over 400 atomic clocks in over 50 national laboratories worldwide
Continuously increasing, accurate, not coordinated with Earth’s rotation

151
Q

Global Positioning System (GPS) time

A

Each GPS satellite broadcasts its position & local time
Receivers determine location & time using transmission delay
GPS time was zero at 0h 6-Jan-1980
TAI is always ahead of GPS by 19 seconds

152
Q

Universal Time 1 (UT1)

A

Conceptually mean solar time at 0°longitude (Greenwich), but actually uses distant quasars, etc.
Measures Earth’s rotation (thus coordinated with it), but Earth wobbles, so its length of second varies (!)

153
Q

Coordinated Universal Time (UTC)

A

Primary time standard by which the world regulates clocks and time
Tanenbaum uses term nonstandard“Universal Coordinated Time”
Based on TAI, but seconds added/removed to keep within 1 second of UT1 (mean solar time at 0°longitude)
Leap seconds occasionally inserted: 58, 59, 60, 0, 1, …
Insertion preference at the end of December and June
Leap second inserted on 2015-07-01; TAI-UTC=36s
Some want to stop adding leap seconds – this would redefine “day” to be unrelated to the sun and Earth’s rotation
I’m pro-leap-seconds; if you want continuous, use TAI or GPS time

154
Q

Local time

A

Add daylight saving time & timezone to UTC
Eastern Standard Time (EST - US) UTC -0500
Eastern Daylight Time (EDT; summer) UTC -0400
India standard time UTC +0530
Nepal standard time UTC +0545

155
Q

clocks drift from UTCby the very nature of UTC and device characteristics

A

2ρ.Δt (read slide 12 of 05-synch)

156
Q

What is the minimum frequency (in seconds) that two clocks must be resynchronized if you want to guarantee that they are no more than 10^-5 seconds (10 microseconds) apart, where each has a maximum (linear) drift rate of p=10^-6 (1 microsecond/second, a typical rate for a hardware quartz clock)?

A

See third edition section 6.1. Given maximum clock drift rate p (the difference per unit time from a perfect reference clock), and precision “precision” in seconds (the maximum 2 clocks are allowed to be, even if they drift in worse case in opposite directions), then the clocks must be resynchronized at least every precision / (2p) seconds.

Given p=10^-6 (a typical rate for hardware quartz clock), and precision of 10^-5, we have a minimum resynchronization frequency = (precision)/(2p) = (10^-5)/(2*(10^-6)) = every 5 seconds.

157
Q

Order the following process when attempting to get https://www.cs.gmu.edu/about/contact-info. Presume that DNS iterative resolution is used, that the DNS resolver begins with no cached data other than the DNS root, and that the names “www.cs.gmu.edu” and “cs.gmu.edu” are managed by the same DNS server (i.e., are within the same DNS zone)

A

The client contacts its local name resolver to implement the name resolution process on www.cs.gmu.edu.
The name resolver hands the complete name www.cs.gmu.edu to the root name server “.”.

The DNS root server resolves www.cs.gmu.edu as far as it can, and since it can only resolve to edu, it will return the address of the name server for “edu.”.

The client name resolver contacts the name server for “edu.” and requests it to resolve www.cs.gmu (.edu).

The name server for “edu.” resolves www.cs.gmu (.edu) as far as it can, and since it can only resolve to gmu, it returns the address of the name server for “gmu.edu.”.

The client name resolver contacts the name server for “gmu.edu.” and requests it to resolve www.cs (.gmu.edu).

The name server for “gmu.edu.” resolves www.cs (.gmu.edu) as far as it can, and it returns the address of the name server for “cs.gmu.edu.”.

The client name resolver contacts the name server for “cs.gmu.edu.” and requests it to provide the IP address of www (.cs.gmu.edu).

The name server for “cs.gmu.edu.” provides the IP address of www.cs.gmu.edu.

The client’s local name resolver returns the IP address of www.cs.gmu.edu. This IP address can then be used to initiate the HTTPS protocol to perform a GET of “/about/contact-info”.

158
Q

An NTP client is polling its NTP servers. It sends one request at client time 345 ms, which is received by the server at server time 1032 ms. The server responds at server time 1184 ms, and the client receives the message at client time 700 ms. What is the computed time offset for this particular request? (If the answer is not an integer use a decimal representation, e.g., “1.5” or “-80.5”).

A

This is just the computed time offset (θ). This is different from the round-trip delay (δ), though that’s also calculated in the algorithm because results with lower round-trip delay are preferred. However, since the question only asked for the computed time offset, that’s what you should have provided.

This computed time offset is computed using ((t1-t0)+(t2-t3))/2.

answer is 585.5

159
Q

how about the network propagation time?

A

radio broadcast
±10ms due to atmospheric fluctuations
satellite
±500μs knowing the distance to a geostationary satellite
local network
tens or hundreds of ms, due to network stack, load on processors
internet
seconds range, due to routers, queues…

160
Q

Network Time Protocol (NTP)

A

Network Time Protocol (NTP) is a networking protocol for clock synchronization to “real” time
Designed for variable-latency data networks
Servers provide time values, clients request time info (it does support peer-to-peer)
Hierarchical: “stratum” counts layers from reference clock (prevents cycles). Stratum 0 = reference clock
Clients & servers include local clock timestamps in messages
More complex algorithm, but gets real time distributed
Clients
Regularly polls three or more NTP servers on diverse networks
Gathers data & determines how to adjust its clock

161
Q

NTP client data processing

A

Client regularly polls servers, for each computes time offset (θ) and round-trip delay (δ)
The values for θ and δ are passed through filters and subjected to statistical analysis (book: θ of smallest δ)
Outliers are discarded and an estimate of time offset is derived from the best three remaining candidates
Presumes symmetrical nominal delay
Many details omitted here!

162
Q

Lamport had a simple ideapreserving before-after relationships

A

definition:if a and b are events, a b denotes that a occurs before b
transitivity:if a b and b  c then a c
if a and b occur in the same process, and a occurs before b, then a  b holds
if a is the event of a message being sent by one process, and b is the event of that message being received by another process then a b holds
definition (interleaving):if a and b happen in two processes that do not coordinate, then neither a b and b a holds

163
Q

Lamport’s ideapreserving before-after relationshipsof message sending and receiving events

A

definition:C(a) is the clock value when event a occurs
if a and b occur in the same process, and a occurs before b, then C(a) < C(b)
if a is the event of a message being sent by one process, and b is the event of that message being received by another process then make sure C(a) < C(b)
interleaving:if a and b happen in two processes that do not coordinate, then we don’t know the relation between C(a) and C(b)

164
Q

lamport clock adjustment

A

Lamport’s algorithm can be used to ensure that all nodes agree on the ordering of events if
(1) Each time stamped message is sent to everyone in the group,
(2) Messages sent from the same sender are received in the same order
(3) No messages are lost

165
Q

parallelism

A

Simultaneous execution - execution of process or computation simultaneously
Need >1 CPU core (but today that’s the normal case, and it’s always true in a distributed system)

166
Q

concurrency

A

“concurrency is the property of program, algorithm, or problem decomposability into order-independent or partially-ordered components or units.” [Lamport1978]
Several computations are executing during overlapping time periods—concurrently—instead of sequentially (one completing before the next starts)
Concurrency doesn’t require parallelism – it can be implemented on a single processor (through interleaving)

167
Q

parallelism != concurrency

A

but often same solutions apply to both

168
Q

threads

A

Process
Each process has a separate memory area from other processes
Thread
Executes code, no attempt to isolate memory of a thread from other threads in the same process
Thread implementation generally maintains minimum information (e.g., CPU context)
Using multiple threads can have higher performance than multiple processes, but using them correctly requires more intellectual effort
Easier to get things wrong, and can be difficult to debug because defects often aren’t reproduceable
Using threads or processes can improve scaleability
Take advantage of those multiple processors you have

169
Q

understanding threads

A

some advantages of using threads
separation of concerns:different activities in different threads
one thread remains responsive (e.g. user input)even if others are busy or blocked (e.g. waiting for messages) – decreases overall latency
support requests of multiple clients
using threads for replicated computation
pool: assign a thread when a request comes in
more efficient, harder to manage
factory: create a thread when a request comes in
easier to manage, less efficient
threads are supported by a library/VM, the OS, or both
making a process-blocking OS call blocks all threads in some library/VM implementations
calling exit() in one thread terminates the process

170
Q

threads in distributed systems

A

Threads provide convenient way to allow blocking system calls without blocking entire process
Good for distributed systems – easier to express communication with multiple logical connections at same time
Distributed systems can impose significant delays in communication between components – don’t want to block everything

171
Q

data is shared at different granularities

A

shared
memory (address space)
distributed shared memory
objects
distributed object stores
files
distributed file systems

172
Q

concurrency challenges thepredictability of effects

A

P and Q running concurrently.
For the moment we’ll assume
+ and * are atomic (normally
they are not) (see slide 11and 12 lecture 6)

x = x + 3 often implemented like this:
Load value of x
Load constant 3
Add them
Store result in x

… so even “simple” operations like addition are often not atomic (they can be broken down)

173
Q

blackboard architectures

A

single data repository (blackboard)
all components read/write on blackboard

174
Q

client-server with central database

A

multiple clients may access same DB records

175
Q

distribution challenges theconsistency of observed values

A

distribution
network propagation delays
different clocks
causes inconsistency
given one trace of events, different components may see those events in a different order

176
Q

predictability ≠ consistency

A

concurrency
causes unpredictability
cannot tell which trace will occur

distribution
network propagation delays
different clocks
causes inconsistency
given one trace of events, different components may see those events in a different order

177
Q

ordering consistency policies share the premiseof hiding synchronization from programmers

A

definition
ordering (aka data-centric) consistency:
all components observe operations on shared data
at the same time
strict consistency – only possible with
shared clock and
insignificant propagation delays
in the same order
linear consistency
in an order that “makes sense”
sequential, causal, FIFO consistency

178
Q

synchronization

A

address both predictability and ordering consistency.

programmers need to useexplicit synchronization techniques anyway,because of unpredictability (due to concurrency)

explicit synchronizationrelieves the middleware/OS fromhaving to assure ordering consistency

179
Q

monitors

A

used for explicit synchronization

Monitor provides a queue with certain entry condition that is used to guarantee only one process operates on the critical section (data) at a time

180
Q

the parking lotstart by identifying structure and events

A

events or actions of interest?
arrival and departure
identify processes
arrivals, departures and CarPark control
define structure and interactions

181
Q

Replication

A

Reasons for replication:
Improve reliability – can continue working even if some component fail (see fault tolerance)
Performance – can distribute work & put data near where it’s needed
Problem: Can lead to consistency problems
Challenge to keep replicants consistent

182
Q

Data-centric consistency models

A

Often focus on shared data aka “data store”
May be (distributed) shared database, shared filesystem, shared memory, etc.
Consistency model = a contract between processes & data store (if processes do X, data store promises Y)
Without a global clock not easy to define “last write”
Often can accept some inconsistencies… but need to bound it, & thus need to categorize them

183
Q

Kinds of inconsistencies

A

Deviation in values between replicas
Absolute numerical deviation (“no more than $0.02”)
Relative numerical deviations (“no more than 0.5%”)
Deviation in staleness (how old)
“Data no older than X seconds”
Deviation in ordering of update operations
This is more complex! Sequential consistency, causal consistency, eventual consistency, …

184
Q

Sequential consistency

A

“The result of any execution is the same as if the read & write operations by all processes on the data store were executed in some sequential order and the operations of each individual process appear in this sequence in the order specified by its program.”
Any valid interleaving of read & write operations is acceptable, but all processes see the same interleaving
Expensive to implement in distributed system

185
Q

Causal consistency

A

Weakens causal consistency – distinguishes what is potentially causally related
“Writes that are potentially causally related must be seen by all processes in the same order. Concurrent writes may be seen in a different order on different machines.”

186
Q

Eventual consistency

A

“If no updates take place for a long time, all replicas will gradually become consistent [have exactly the same data]”
Updates must eventually propagate to all replicas
Problem: write-write conflicts (same data item written with different values)
Often the solution is an algorithm that declares one as the “winner” (cancelling the effects of any previous conflict)
Often cheap to implement, at a cost of inconsistency for a period of time

187
Q

Client-centric consistency

A

Give up the idea of a “central data store”
Provide consistency guarantees from POV of 1 client,
No guarantees about different clients
Various consistency models:
Monotonic reads: If a process reads the value of a data item x, any successive read of x by that process will always return that same or more recent value (“never read older version”)
Monotonic writes: A write by a process on data item x is completed before any later write on x by the same process
Read-your-writes: The effect of a write operation by a process on data item x will always be seen by a successive read operation on x by the same process
Writes follow reads: A write operation by a process on data item x following a previous read operation on x by the same process is guaranteed to take place on the same or more recent value of x that was read (“See a posting about an article only if saw original article”)

188
Q

Some systems support so-called “ACID” transactions. What does the “A” in “ACID” stand for?

A

Atomic

189
Q

in a publish/subscribe system, applications can indicate their interest in specific types of message, and the communication middleware will ensure that those types of messages are delivered to the applications interested in that type of message.

A

true

190
Q

magine an advanced calculator running locally on a computer system. It does not communicate outside that computer system.

True or False: This system would typically be considered a distributed system.

A

false

191
Q

True or false: Caching and replication can lead to consistency problems.

A

true

192
Q

Which of the following is a valid assumptions to make, in general, when developing a distributed system?

A

none of them are true:

The network topology does not ever change.

Network bandwidth is infinite.

The network is reliable.

Network latency is zero.

193
Q

which of the following is false

A

In RMI, unlike in RPC, applications send messages to logical contact points.

194
Q

The world-wide web (WWW) consists users who use web browsers (web clients) to talk to a variety of web servers. Users type in URLs to retrieve information from the web.

A

true

195
Q

An experimental file server is down 20% of the time, due to various problems. If you are to use replication to alleviate this problem, how many replicas would you need to reduce the down time to 0.7%? Assume independent failures. Do NOT include the “master” system in your count of the replicas.

A

3

System down = all servers are down.

Probability the system is down= (probability of each server down)^N

Goal ≥ Probability the system is down

Probability each server is down = 20% = 20/100 = 0.2

Goal = 0.7% = 0.7/100 = 0.007

Probability the system is down = (0.2)^N

N = 2 ==> Probability the system is down = (0.2)*(0.2) = 0.04

N = 3 ==> Probability the system is down = (0.2)(0.2)(0.2) = 0.008

N = 4 ==> Probability the system is down = (0.2)(0.2)(0.2)*(0.2) = 0.0016

Goal = 0.007 ≥ Probability the system is down = 0.0016

So, the minimum number of servers needed to reduce the down time to 0.7%, including the master, is 4. That means only 3 additional replicas are needed (since the question expressly asked you to not count the master server in the count of replicas).

196
Q

Imagine an e-commerce system consisting of many software components, including client, server, and database objects, that are deployed on different devices and communicate and coordinate with each other to provide the ecommerce functionality.

True or False: This system would typically be considered a distributed system.

A

TRUE

197
Q

What does the term “durable” in ACID mean for transactions?

A

Once a transaction commits, the changes are permanent.

198
Q

notify

A

install a handler to be called when a message is put into the specified queue

199
Q

poll

A

Check a specified queue for messages, and remote the first. Never block

200
Q

In some period of time, the probability of a component successfully working the entire time (without failure) is 70%. If you want the system as a whole to fail with probability less than or equal to 0.02%, what is the minimum number of replicas (INCLUDING a “master”) that will be required? Presume component failure is independent.

A

8

We need “F” (Fail) probability, and all we’re given is the probability of success for each component. This is easily resolved:

F = 1 - Success

n >= (log Goal)/(log F)

so find the smallest n (the “ceiling”) that meets this inequality. Since “n” has to be an integer (you can’t use a part of a component), you’ll typically have to go up to the smallest integer greater than or equal to this calculation (this is also called the “ceiling”). Finally: Do we include the “master” in this case? As discussed in class, people aren’t consistent in their terminology so you have to look at it for each question. In this case, the question clearly states that we do include the “master” system in this case, so you don’t subtract by one.

201
Q

True or False: MOM achieves fault-tolerance by implementing message queues that store messages temporarily on persistent storage. The sender writes the message into the message queue and if the receiver is unavailable due to a failure, the message queue retains the message until the receiver is available again.

A

true

202
Q

A remote procedure call (RPC) is implemented by a number of steps. Order the steps below (presuming that the RPC must go over a network).

A

The client procedure calls the client stub in the normal way.

The client stub builds the message (e.g., “marshalling” the parameters) and calls the local operating system asking the message to be sent via the network.

The client’s operating system sends the message to the remote (server) operating system.

The remote operating system gives the message to the server stub.

The server stub unpacks the parameter(s) (aka “unmarshalls” the parameters) and calls the server.

The server does the work and returns the result to the stub.

The server stub packs the result into a message (including “marshalling” the parameters) and calls its local operating system.

The server’s operating system sends the message to the client’s operating system.

The client’s operating system gives the message to the client stub.

The client stub unpacks the result and returns it to the caller within the client.

202
Q

Which of the following is the most accurate description of “persistence” as an attribute of a messaging system?

A

Persistence refers to whether the messages endure in the system (regardless of the sender being active and the recipient being available) or not.

202
Q

Which of the following communication approaches would be most useful if the communicating components are not aware of each other’s locations, and the communicating components may not be available at the same time?

A

publish-subscribe

202
Q

There are two models for activating service supply: factory & pool.

In a factory model, the system instantiates all the instances that might be needed ahead-of-time, and then when a request shows up, one of those pre-created instances is allocated to the request. In a pool model, an instance is created at the time of the request.

An advantage of pre-creating instances is that requests can often be handled more quickly, since the instance is already ready to go (this effect is more important if instantiation takes a long time). A disadvantage of pre-creating instances is that the instance typically requires resources (at least storage) even when it is not being used.

Is all of the above generally true, or is it false?

A

false

Read carefully!

It’s true that “There are two models for activating service supply: factory & pool.”

The later text is also true, because it’s true that “An advantage of pre-creating instances is that requests can often be handled more quickly, since the instance is already ready to go (this effect is more important if instantiation takes a long time). A disadvantage of pre-creating instances is that the instance typically requires resources (at least storage) even when it is not being used.”

However, the definitions of factory and pool are completely reversed.

In the pool model, “the system instantiates all the instances that might be needed ahead-of-time, and then when a request shows up, one of those pre-created instances is allocated to the request”.

In the factory model, an instance is created at the time of the request.