Distributed Systems Flashcards
What is a distributed system?
A system that is distributed in nature, where components work together as one cohesive unit. It is fault tolerant and horizontally scalable.
What are the main advantages of distributed systems?
- Horizontal scalability, 2. High efficiency for given infra costs, 3. High availability
What are the main disadvantages of distributed systems?
- Increased complexity, 2. Requires expertise from multiple domains, 3. Data duplicacy, 4. Difficult data migrations, 5. Increased networking costs, 6. More difficult to secure, 7. Challenging deployments and troubleshooting
What is reliability in distributed systems?
The ability of a system to perform its required functions under stated conditions for a specific period of time. It’s a measure of continuity of correct service.
What is availability in distributed systems?
The proportion of time for which a system can perform its function as seen from a client’s perspective. It’s measured in percentage units with respect to time.
What is scalability in distributed systems?
The property of a system to be able to meet increased load by adding proportional amount of resources without negatively impacting performance.
What is fault tolerance in distributed systems?
The ability of a system to detect a fault and instantaneously switch to the redundant copy of the component with almost negligible downtime.
What is consistency in distributed systems?
The ability of a system to maintain a single, up-to-date copy of the data, irrespective of how widely distributed it is.
What does the CAP theorem state?
In a distributed system, one can only have either a consistent system or an available system in a partitioned network state.
What is the PACELC theorem?
In case of network partitioning (P), choose between availability (A) and consistency (C); Else (E), choose between latency (L) and consistency (C).
What are the ACID properties?
Atomicity, Consistency, Isolation, Durability
What is atomicity in ACID properties?
A transaction must be treated as an atomic unit; either all of its operations are executed or none.
What is consistency in ACID properties?
The database must remain in a consistent state after any transaction.
What is isolation in ACID properties?
All transactions will be carried out and executed as if it is the only transaction in the system.
What is durability in ACID properties?
The database should be durable enough to hold all its latest updates even if the system fails or restarts.
What is a dirty read in concurrency control?
When one activity reads an uncommitted change made by another activity that is later rolled back.
What is a non-repeatable read in concurrency control?
When one activity reads data, and another activity deletes that data before the first activity is done.
What is a phantom read in concurrency control?
When one activity retrieves a set of data, and another activity inserts new data that would have met the first activity’s search criteria.
What is pessimistic locking?
An approach where an entity is locked in the database for the entire time that it is in application memory.
What is optimistic locking?
An approach that detects collisions when they occur and then resolves them, rather than trying to prevent them.
What are the two main types of database storage engines?
- B-Tree Based Engine, 2. Log Structured Merge (LSM) Tree Based Engine
What are the main data replication strategies?
- Log-Based Data Replication, 2. Full Table Data Replication, 3. Key-Based Incremental Data Replication
What is algorithmic sharding?
Computing hash of the key in a record and computing modulo-n of that hash where n is the number of nodes.
What is consistent hash sharding?
Uses consistent hashing technique to distribute data across nodes, with many more shards than actual number of nodes.