Tutorial Flashcards

1
Q

A distributed system

A

A system in which hardware or software components located at networked computers communicate and coordinate their actions only by passing messages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A computer network

A

A collection of spatially separated, interconnected computers that exchange messages based on specific protocols. Computers are addressed by IP address.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

reasons for using a distributed system

A
  1. Economy (cost-effective)
  2. Reliability (fault tolerance)
  3. Availability (high uptime)
  4. Scalability (extendible)
  5. Functional separation (modularity)
  • Resource Sharing - hardware resources (disks, printers, scanners etc.), software resources (files, databases etc), other (processing power, memory, bandwidth)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

consequences when using distributed systems

A
  1. Concurrency
  2. Heterogeneity
  3. No global clock
  4. Independent failures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sockets

A
  1. Bound to a local port (socket address = IP + port number)
  2. Sockets are used for communication between two networked processes (sending and receiving data)
  3. Each socket is associated with a protocol (UDP or TCP)
  • Acts as a programming interface to application code and transport layer * Socket handle is mostly as a file handle
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

possible Failures when using UDP

A
  1. Data corruption
  2. Omission failures (no guaranteed delivery)
  3. Order
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

TCP issues but not UDP

A
  1. Connection-oriented - The communicating processes establish a connection before communicating. The connection involves a connect request from the client to the server followed by an accept request from the server to the client.
  2. Messages sizes - There is no limit on data size applications can use.
  3. Lost messages - TCP uses an acknowledgment scheme, unlike UDP. if acknowledgments are not received, the messages are retransmitted.
  4. Flow control - TCP attempts to match the speed of the process that reads the message and writes to the stream.
  5. Message duplication or ordering - Message identifiers are associated with IP packets to enable the recipient to detect and reject duplicates and reorder messages in case of messages arrive out of order.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

TCP stream socket connection steps

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

UDP vs TCP

A

UDP (User Datagram Protocol)

  • Provides a message passing abstraction
  • the simplest form of connectionless interprocess communication (IPC)
  • Transmits a single message (called a datagram) to the receiving process
  • Video stream (real-time applications), DNS, NTP (query response)

TCP (Transmission Control Protocol)

  • Provides an abstraction for a two-way stream (called packets)
  • Streams do not have message boundaries
  • Stream provides the basis for producer/consumer communication
  • Data sent by the producer are queued until the consumer is ready to receive them
  • The consumer must wait when no data is available
  • HTTP, FTP
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Thread

A
  • A Thread is a piece of code that runs concurrently with other threads.
  • Each thread is a statically ordered sequence of instructions.
  • Threads are used to express concurrency on both single and multiprocessors machines.
    *
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Thread vs Process

A
  • Advantages of thread-based parallelism
    • Threads share the same address space
    • Context-switching between threads is normally inexpensive
    • Communication between threads is normally inexpensive
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Thread Lifecycle

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Synchronous access to shared resources

A
  • If one thread tries to read the data and another thread tries to update the same data, it leads to inconsistent state.
  • This can be prevented by synchronising access to the data.
  • Use “synchronized” to methods or objects
    • public synchronized void update() { }
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Worker pool architecture - thread architecture

A

The server creates a fixed number of threads called a worker pool. As requests arrive at the server, they are put into a queue by the I/O thread and from there assigned to the next available worker thread.

The server creates a worker pool -> request comes in, put into a queue -> assigned to an available worker thread

Useful in a highly concurrent system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Thread-per-request - thread architecture

A

Thread created for each request, when the request is finished, the thread is deallocated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

External Data Representation and Marshalling

A
  • Data structures in programs are flattened to a sequence of bytes before Transmission
  • Different computers have different data representations
    • e.g., a number of bytes for an integer, floating-point representation, ASCII vs Unicode.
  • Two ways to enable computers to interpret data in different formats
    • Data is converted to an agreed external format before transmission and converted to the local form on receipt
    • Values transmitted in the sender’s format, with an indication of the format used
  • Marshalling - Process of converting the data to the form suitable for transmission
  • Unmarshalling - Process of disassembling the data at the receiver
  • External data representation - Agreed standard for representing data structures and primitive data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Extensible Markup Language (XML)

A

A markup language is a textual encoding representing data and the details of the structure (or appearance)

XML is:

  • a markup language defined by the world wide web consortium (W3C)
  • tags describe the logical structure of the data
  • is extensible - unlike HTML where tags give display instructions
  • self-describing - tags describe the data
  • tags together with namespaces allow the tags to be meaningful
  • since data is textual, it can be read by humans and platform-independent
  • since data is textual, the messages are large causing longer processing, transmission times and more space to store
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

JavaScript Object Notation (JSON)

A
  • JSON is a lightweight data-interchange format
  • JSON is a syntax for storing and exchanging data
  • JSON is an easier-to-use alternative for XML
  • It is based on the subset of JavaScript Programming Language
  • It is text-based and completely language independent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

IPC Data formates - JSON vs XML

A
  • JSON is lightweight thus simple to read and write. XML is less simple than JSON.
  • JSON supports array data structure. XML doesn’t.
  • JSON files are more human-readable. XML provides the capability to display data because it is a markup language. But JSON has no display capabilities.
  • JSON provides scalar data types and the ability to express structured data through arrays and objects. XML doesn’t, and one must rely on XML schema for adding type information.
  • Native object support. Similarities between JSON. For XML, objects have to be expressed by conventions, often through a mixed-use of attributes and elements.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

IPC (Inter-process communication​)

A

Inter-process communication or interprocess communication (IPC) refers specifically to the mechanisms an operating system provides to allow the processes to manage shared data. Typically, applications can use IPC, categorized as clients and servers, where the client requests data and the server responds to client requests.[1] Many applications are both clients and servers, as commonly seen in distributed computing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Client-server architecture

A
  • A client requests some processing or information from a server that it needs.
  • It waits in a blocking fashion for the reply containing the result, then it can proceed with its execution.
  • There can be many variants of client-server models.
22
Q

Peer-to-Peer architecture

A
  • Peer model suits ad-hoc groupings of participants
  • No central point of failure (reliable)
  • No central point of control (difficult to deny service for adversaries)
  • Some peers will typically contribute more than others (I.e., see or super-peer)
  • Napster, BitTorrent
23
Q

A service provided by multiple servers

(distributed system architecture variations)

A

Service is provided by several server processes interacting with each other. Objects may be partitioned (e.g., web servers) or replicated across servers (e.g., Sun Network Information Service (NIS)).

24
Q

Mobile Code and Agents

(distributed system architecture variations)

A
  • Mobile Code is downloaded to the client and is executed on the client (e.g., applet).
  • Mobile Agents are running programs that include both code and data that travels from one computer to another.
    *
25
Q

Proxy servers and caches

(distributed system architecture variations)

A
  • A cache is a store of recently used objects that is closer to the client
  • New objects are added to the cache replacing existing objects
  • When an object is requested, the caching service is checked to see if an up-to-date copy is available (fetched in not available)
26
Q

Network Computers and Thin Clients

(distributed system architecture variations)

A
  • Network Computers
    • download their operating system and application software from a remote file system. Applications are run locally.
  • Thin Clients
    • application software is not downloaded but runs on the computer server - e.g. UNIX
27
Q

Tiered architecture

(distributed system architecture variations)

A

Tiered architectures are complementary to layering.

Layering deals with the vertical organisation of services.

28
Q

A network operating system vs a distributed operating system

A
  • A networked operating system provides support for networking operations. The users are generally expected to make intelligent use of the network commands and operations that are provided. Each host remains autonomous in the sense that it can continue to operate when disconnected from the networking environment.
  • A distributed operating system tries to abstract the network from the user and thereby remove the need for the user to specify how the networking commands and operations should be undertaken. This is sometimes referred to as providing a single system image. each host may not have everything that would be required to operate on its own, when disconnected from the network.
29
Q

Network operating system

A
  • users retain autonomy in managing their own processing resources
  • It does not manage processes across the nodes
  • Provides support for networking operations
30
Q

Distributed operating system

A
  • users are never concerned with where their programs run, or the location of any resources
  • has control over all the nodes in the system, and it transparently locates new processes at whatever node suits its scheduling policies
  • Each host may not have everything that would be required to operate on its own
  • Single system image
31
Q

Core OS components

A
  • Process manager
    • Handles the creation of processes, which is a unit of resource management, encapsulating the basic resources of memory (address space) and processor time (threads).
  • Thread manager
    • Handles the creation, synchronization and scheduling of one or more threads for each process. Threads can be scheduled to receive processor time.
  • Communication manager
    • Handles interprocess communication, i.e., between threads from different processes. In some cases, this can be across different hosts.
  • Memory manager
    • Handles the allocation and access to physical and virtual memory. Provides translation from virtual to physical memory and handles paging of memory.
  • Supervisor
    • Handles privileged operations, i.e., those that directly affect shared resources on the host, e.g., to and from an I/O device. The supervisor is responsible for ensuring that the host continues to provide proper service to each client.
32
Q

Kernel

A
  • Part of the Operating System
  • Has full access to the host’s resources
  • Kernel begins execution after the host is powered up and continues to execute while the host is operational
  • The kernel has access to all resources and shares access to all other processes that executing on the host
33
Q

Disadvantages of Monolithic OS

A
  • It is massive
    • codebase
  • It is undifferentiated
    • non-modular (traditionally), although modern ones are much more layered.
  • It is intractable
    • Altering any individual software component to adapt to new requirements is difficult
34
Q

Supervisor Mode vs User Mode

A
  • Operating modes supported by the hardware at the machine instruction level.
  • Supervisor / Kernel mode
    • instructions that execute while the processor is in user (or privileged) mode are capable of accessing and controlling every resource on the host
  • user mode
    • Instructions that execute while the processor is in user (or unprivileged) mode are restricted, by the processor, to only those accesses defined or granted by the kernel.
35
Q

Remote Invocation

A

A set of information exchange protocols at the middleware layer.

36
Q

Three Types of Protocols - Remote Invocation

A

Three different types of protocols typically used to address the design issues of Remote Invocation

  1. Request - The client sends a message
  2. Request-Reply - The client sends a message, the server sends a reply
  3. Request-Reply-Acknowledgement - Same as above, but once the client receives the message, the client will acknowledge it
37
Q

Possible Issues in Request-Reply

A
  • Request timeouts
    • Retry request message after a timeout
    • Do nothing
  • Reply timeouts
    • Retransmit results after a timeout
    • Do nothing
  • Receiving duplicate requests/replies
    • Discard duplicate messages
    • Perform the same action again if the logic is idempotent
38
Q

different Invocation Semantics

A

* Different issue handling strategies lead to different invocation semantics

  • Invocation semantics
    • Define what the client can assume about the execution of the remote procedure
    • Offer different reliability guarantees in terms of the number of times that the remote procedure is executed
  1. Maybe - The remote procedure call may be executed once or not at all. Unless the caller receives a result, it is unknown whether the remote procedure was called.
  2. At least once - Either the remote procedure was executed at least once, and the caller received a response, or the caller received an exception to indicate the remote procedure was not executed at all. Suitable for idempotent operations.
  3. At most once - The remote procedure call was either executed exactly once, in which case the caller received a response, or it was not executed at all and the caller receives an exception. Suitable for non-idempotent operations.
39
Q

Remote Procedure Call (RPC)

A
  • RPCs enable clients to execute procedures in server (remote) processes based on a defined service interface.
  • Used in procedural languages such as Fortran, C, and Go.
  • Key components of RPC
    • Communication Module
      • Implements the desired design choices in terms of retransmission of requests, dealing with duplicates and retransmission of results
    • Client stub Procedure
      • Behaves like a local procedure to the client. Marshals the procedure identifiers and arguments which is handed to the communication module
      • Unmarshalls the results in the reply
    • Dispatcher
      • Selects the server stub based on the procedure identifier and forwards the request to the server stub
    • Server Stub Procedure
      • Unmarshalls the arguments in the request message and forwards them to the Service Procedure. marshalls the arguments in the result message and returns it to the client
40
Q
A
41
Q

Java RMI

A
  • Defining the interface for remote objects
    • The interface is defined using the interface definition mechanism supported by the particular RMI software.
  • Compiling the interface
    • Compiling the interface generates proxy, dispatcher and skeleton classes.
  • Writing the server program
    • The remote object classes are implemented and complied with the classes for the dispatchers and skeletons. The server is also responsible for creating and initializing the objects and registering them with the binder.
  • Writing client programs
    • Client programs implement invoking code and contain proxies for all remote classes. Uses a binder to lookup remote objects.
42
Q

Security Threats

A
  1. leakage - acquisition of information by unauthorised recipients
  2. Tampering - Unauthorised alteration of information
  3. Vandalism - Interference with the proper operation of systems
43
Q

Method of Attacks

A
  • Eavesdropping
    • A form of leakage obtaining private or secret information or copies of messages without authority.
  • Masquerading
    • A form of impersonating assuming the identity of another user/principal - i.e., sending or receiving messages using the identity of another principal without their authority.
  • Message tampering
    • altering the content of messages in transit man in the middle attack (tampers with the secure channel mechanism)
  • Replaying
    • Storing secure messages and sending them at a later date
  • Denial of service
    • Vandalism flooding a channel or other resource, denying access to others
44
Q

Worst-case assumptions designing a secure system

A
  • Networks are insecure
    • Messages can be looked at, copied, modified and transmitted.
    • Attackers can obtain information that they shouldn’t and can pretend to be a legitimate party.
  • The source code is known to the attacker
    • Knowing the source code can help the attacker discover vulnerabilities.
  • Interfaces are exposed
    • Communication interfaces are necessarily open to allow clients to access them.
    • Attackers can send messages to any interface.
  • The attacker has unlimited computing resources
    • Assume that attackers will have access to the largest and most powerful computers projected in the lifetime of a system.
45
Q

Encryption and two main Keys

A
  • Encryption
    • Process of encoding a message in such a way as to hide its contents.
  1. Shared secret keys (symmetric)
    • Sender and recipient share knowledge of the key and it must not be revealed to anyone else.
  2. Public/private key pairs (asymmetric)
    • The sender uses a public key to encrypt the message.
    • The recipient uses a corresponding private key to decrypt the message.
    • Only the recipient can decrypt the message because they have the private key.
    • Typically require 100 to 1000 times as much processing power as secret-key algorithms.
46
Q

A digital certificate

A
  • A digital certificate is a digital form of identification, like a passport.
  • A digital certificate provides information about the identity of an entity.
  • A digital certificate is issued by a Certification Authority (CA).
    • Examples of trusted CA across the world are Verisign, Entrust, etc.
    • The CA guarantees the validity of the information in the certificate
  • The issue of distributing Public Key is massive because the Public Key should be distributed in a scalable and truthful way
47
Q

Public Key Infrastructure (PKI)

A
  • Public Key Infrastructure (PKI) consists of protocols, standards and services, that allows users to authenticate each other using digital certificates that are issued by CA. For a digital certificate to be useful, it has to be structured in a standard way so that information within the certificate can be retrieved and understood regardless of who issued the certificate. The X.509, PKI X.509 and Public Key Cryptography Standards (PKCS) are the building blocks of a PKI system that defines the standard formats for certificates and their use.
48
Q

The process to obtain a digital certificate

A
  1. Generate Key-pair
    • User-A generates a Public and Private key-pair or is assigned a key-pair by some authority in their organisation.
  2. Request CA Certificate
    • User-A first requests the certificate of the CA Server.
  3. CA Certificate Issued
    • The CA responds with its Certificate. This includes its Public Key and its Digital Signature signed using its Private Key.
  4. Gather Information
    • User-A gathers all information required by the CA Server to obtain its certificate. This information could include User-A email address, fingerprints, etc., that the CA needs to be certain that User-A claims to be who she is.
  5. Send Certificate Request
    • User-A sends a certificate request to the CA consisting of her Public Key and additional information. The certificate request is signed by CA’s Public Key.
  6. CA verifies User-A
    • The CA gets the certificate request, verifies User-A’s identity and generates a certificate for User-A, binding her identity and her Public Key. The signature of CA verifies the authenticity of the Certificate.
  7. CA issues the Certificate
    • The CA issues the certificate to User-A.
49
Q

Transparencies that should be addressed by distributed file systems (DFS)

A
  • Access transparency
    • Client programs don’t know if the file is local or remote
  • Location transparency
    • Client programs don’t know where the file is stored
    • Files can be relocated without changing their pathname
  • Mobility transparency
    • neither client programs nor system administration tables in client nodes need to be changed when files are moved
  • Scaling transparency
    • Service can be expanded without loss of performance
  • Performance transparency
    • maintain acceptable performance while the load on the service varies within a specified range
50
Q

Network File System (NFS)

A

Network File System (NFS) is a distributed file system protocol originally developed by Sun Microsystems (Sun) in 1984, allowing a user on a client computer to access files over a computer network much like local storage is accessed. NFS, like many other protocols, builds on the Open Network Computing Remote Procedure Call (ONC RPC) system.

51
Q
A