Distributed Systems Flashcards

Question

Availability

Answer 1

Systems is available to function over a period of time. If the system is under maintenance, then it will not be available to the user. Thus reduces the availability. Even though the system is available, it should be reliable to provide the functionalities. If it is not reliable, it will impact system availability.

Answer 2

How easy to maintain the system ? ex: how easy to replace failed node ? How easy to upgrade the system without downtime ? Diagnosing the problems and taking action.

Answer 3

Latency/Response Time | Throughput

Answer 4

Load balancing helps to distribute the requests across servers. even if one server fails users does not have the impact. Load balancing also used for health checks. Load Balancing Algorithms: - Least Connection Method - Least response Time - Least Bandwidth - server serving with less bandwidth/sec - Round Robin - Weighted Round Robin - Ip Hash ``` Hardware Load balancers: - F5, Cisco Software Load balancers: HAProxy, Amazon ELB, Envoy - ```

Answer 5

Caching helps to access most used data without going to disk(which is always takes time). - Application can maintain data in its local memory, thus avoids disk seek/network call. - CDN provides a set of servers to store the data. If the same data is accessed it can deliver it. - It is also important to invalidate the cache when data modified in the database. How to keep the cache match with database: - Write Through Cache - writing to cache and database before returning the response to the client. It maintains data consistency. But it adds additional latency. - Write Around Cache - data is written into db alone. Cache gets built up with cache misses. Cache misses leads to higher latency and pressure on the database. - Write back cache - write happens only to cache. in background data will be written into database. - Even though it gives higher throughput and response time, it poses the risk of data loss in case cache system failure. ``` Caching Eviction Algorithms: FIFO LIFO Least Recently Used Most Recently Used Least Frequently Used Most Frequently Used Random Replacement ```

Answer 6

Http Request XMLHttpRequest (Ajax) WebSocket ( Full duplex communciation) - chat SSE - Server can push data to the browser. Http2 - Server Push

Answer 7

Read Committed: - No dirty reads - updates are not available to read if transaction is not committed - No Dirty writes - uncommitted data will not be over written - guarantees no concurrent transactions running on the same row by applying row level lock. Only one transaction can hold the lock. - It avoids dirty reads by holding old value and uncommitted value. Returns old value for read queries. - Default in Oracle, PostgresQL, SQL Server, MemSQL. Snapshot Isolation and Repeatable Read:

Answer 8

Why ? - As single request can expand into multiple levels of request calls to different services, there should be a way to monitor the requests and able to find out the time taken at each level. Design goals : - Low-overhead - negligible overhead to the processing. - Application-Level transparency : Application should not write their own tracing format. It should be transparent/common so that it avoids bugs. - Scalability - Tracing system should be scalable enough. Adaptive Sampling - Instead of sending each and every request to tracing system, we can sample requests. This helps for latency sensitive applications. Instrumentation libraries should exist for - HTTP requests, RPC, SQL queries to give uniform in tracing. -> trees, spans, annotations. In trace tree consists of spans and relationship between spans are edges denotes relation to parent span. Span - is a record of timestamped log denotes start/end and app specific annotations. spanid , parentId RootSpan - span without parent. Span - span name, id , traceId, parentId, app annotations, client, server send/recv timestamps. trace context - stores info about span attributes. usually stores in thread-local storage. trace data should be language-independent. trace collection can be: - stores log files > daemons can pull logs files and store to databases. - daemons can be part of image itself. Overheads to consider: - trace generation overhead - Trace collection overhead Tune sampling frequency to reduce tracing overhead. Access patterns: - trace Id - service name Components: - Reporter - Collector - Storage - UI

Answer 9

Application Dual writes: Application which writes to database also responsible for writing the change to external systems. Even though it is easy to implement, it poses consistency risks. database and external systems should have lock(PAXOS - 2 phase commit) to make sure writes are written in same order as database. Database log: - Using database log as Single source of truth and using that log to send changes to external systems. - Oracle GoldenGate - MySQL commit log

Distributed Systems Flashcards

(33 cards)