Tutorial Flashcards
A distributed system
A system in which hardware or software components located at networked computers communicate and coordinate their actions only by passing messages.
A computer network
A collection of spatially separated, interconnected computers that exchange messages based on specific protocols. Computers are addressed by IP address.
reasons for using a distributed system
- Economy (cost-effective)
- Reliability (fault tolerance)
- Availability (high uptime)
- Scalability (extendible)
- Functional separation (modularity)
- Resource Sharing - hardware resources (disks, printers, scanners etc.), software resources (files, databases etc), other (processing power, memory, bandwidth)
consequences when using distributed systems
- Concurrency
- Heterogeneity
- No global clock
- Independent failures
Sockets
- Bound to a local port (socket address = IP + port number)
- Sockets are used for communication between two networked processes (sending and receiving data)
- Each socket is associated with a protocol (UDP or TCP)
- Acts as a programming interface to application code and transport layer * Socket handle is mostly as a file handle
possible Failures when using UDP
- Data corruption
- Omission failures (no guaranteed delivery)
- Order
TCP issues but not UDP
- Connection-oriented - The communicating processes establish a connection before communicating. The connection involves a connect request from the client to the server followed by an accept request from the server to the client.
- Messages sizes - There is no limit on data size applications can use.
- Lost messages - TCP uses an acknowledgment scheme, unlike UDP. if acknowledgments are not received, the messages are retransmitted.
- Flow control - TCP attempts to match the speed of the process that reads the message and writes to the stream.
- Message duplication or ordering - Message identifiers are associated with IP packets to enable the recipient to detect and reject duplicates and reorder messages in case of messages arrive out of order.
TCP stream socket connection steps
UDP vs TCP
UDP (User Datagram Protocol)
- Provides a message passing abstraction
- the simplest form of connectionless interprocess communication (IPC)
- Transmits a single message (called a datagram) to the receiving process
- Video stream (real-time applications), DNS, NTP (query response)
TCP (Transmission Control Protocol)
- Provides an abstraction for a two-way stream (called packets)
- Streams do not have message boundaries
- Stream provides the basis for producer/consumer communication
- Data sent by the producer are queued until the consumer is ready to receive them
- The consumer must wait when no data is available
- HTTP, FTP
Thread
- A Thread is a piece of code that runs concurrently with other threads.
- Each thread is a statically ordered sequence of instructions.
- Threads are used to express concurrency on both single and multiprocessors machines.
*
Thread vs Process
- Advantages of thread-based parallelism
- Threads share the same address space
- Context-switching between threads is normally inexpensive
- Communication between threads is normally inexpensive
Thread Lifecycle
Synchronous access to shared resources
- If one thread tries to read the data and another thread tries to update the same data, it leads to inconsistent state.
- This can be prevented by synchronising access to the data.
- Use “synchronized” to methods or objects
- public synchronized void update() { }
Worker pool architecture - thread architecture
The server creates a fixed number of threads called a worker pool. As requests arrive at the server, they are put into a queue by the I/O thread and from there assigned to the next available worker thread.
The server creates a worker pool -> request comes in, put into a queue -> assigned to an available worker thread
Useful in a highly concurrent system
Thread-per-request - thread architecture
Thread created for each request, when the request is finished, the thread is deallocated.
External Data Representation and Marshalling
- Data structures in programs are flattened to a sequence of bytes before Transmission
- Different computers have different data representations
- e.g., a number of bytes for an integer, floating-point representation, ASCII vs Unicode.
- Two ways to enable computers to interpret data in different formats
- Data is converted to an agreed external format before transmission and converted to the local form on receipt
- Values transmitted in the sender’s format, with an indication of the format used
- Marshalling - Process of converting the data to the form suitable for transmission
- Unmarshalling - Process of disassembling the data at the receiver
- External data representation - Agreed standard for representing data structures and primitive data
Extensible Markup Language (XML)
A markup language is a textual encoding representing data and the details of the structure (or appearance)
XML is:
- a markup language defined by the world wide web consortium (W3C)
- tags describe the logical structure of the data
- is extensible - unlike HTML where tags give display instructions
- self-describing - tags describe the data
- tags together with namespaces allow the tags to be meaningful
- since data is textual, it can be read by humans and platform-independent
- since data is textual, the messages are large causing longer processing, transmission times and more space to store
JavaScript Object Notation (JSON)
- JSON is a lightweight data-interchange format
- JSON is a syntax for storing and exchanging data
- JSON is an easier-to-use alternative for XML
- It is based on the subset of JavaScript Programming Language
- It is text-based and completely language independent
IPC Data formates - JSON vs XML
- JSON is lightweight thus simple to read and write. XML is less simple than JSON.
- JSON supports array data structure. XML doesn’t.
- JSON files are more human-readable. XML provides the capability to display data because it is a markup language. But JSON has no display capabilities.
- JSON provides scalar data types and the ability to express structured data through arrays and objects. XML doesn’t, and one must rely on XML schema for adding type information.
- Native object support. Similarities between JSON. For XML, objects have to be expressed by conventions, often through a mixed-use of attributes and elements.
IPC (Inter-process communication)
Inter-process communication or interprocess communication (IPC) refers specifically to the mechanisms an operating system provides to allow the processes to manage shared data. Typically, applications can use IPC, categorized as clients and servers, where the client requests data and the server responds to client requests.[1] Many applications are both clients and servers, as commonly seen in distributed computing.