L9/12 - OLAP & OLTP in the cloud (DBaaS) Flashcards

Question

What is OLTP, and how has the traditional database stack evolved over time?

Answer 1

OLTP (Online Transaction Processing) deals with high-throughput, low-latency transactions. The monolithic database stack (SQL, transactions, caching, logging, storage) has remained largely unchanged for 30-40 years.

Answer 2

Sharding – Splitting data at the application layer. Shared-Nothing – Each node has its own storage and transactions. Shared-Disk – Nodes share storage but handle transactions separately.

Answer 3

Failures happen frequently (permanent vs. transient failures). Geo-distributed data centers introduce latency. Storage types vary (fast/expensive vs. slow/cheap). Ephemeral vs. persistent storage requires different strategies.

Answer 4

✅ Durability – Ensuring no data loss. ✅ High scalability – Handling up to 100TB+ databases. ✅ Fault tolerance – Fast recovery from failures. ✅ High availability – 99.999% uptime. ✅ Low tail latency – Minimize slow queries. ✅ Elasticity – Scaling resources dynamically. ✅ Cost efficiency – Balance performance and cost. ✅ Data locality guarantees – Compliance with laws like GDPR.

Answer 5

1. Non-partitioned shared-nothing log-replicated state machine – Microsoft SQL Server HADR, PostgreSQL on AWS 2. Non-partitioned shared-data – AWS Aurora, Google AlloyDB, Azure Socrates. 3. Partitioned shared-nothing log-replicated state machine – Google Spanner, CockroachDB, MongoDB. 4. Disaggregated architectures – Separate compute, storage, and logging layers. AWS Athena, MongoDB on Atlas

Answer 6

Primary node processes all updates. Follower replicas receive update logs but handle only read queries. Backups to Azure Storage every 5 minutes. Requires 4 nodes (1 primary, 3 secondary) for durability and high availability.

Answer 7

✅ Mature and widely used in Azure. ✅ High performance since each compute node has a full local copy of the database.

Answer 8

❌ Database size is limited to a single node’s storage (~4TB). ❌ Replication is tightly coupled – scaling is expensive. ❌ Long-running transactions can block log truncation, causing issues.

Answer 9

Compute nodes only write redo logs to the distributed storage layer. Storage reconstructs pages on demand from logs. Storage acts as a cache – materializing pages only when needed.

Answer 10

✅ Lower network I/O – Only log records are sent over the network. ✅ Faster recovery – No need to replay large amounts of redo logs. ✅ High availability – Aurora replicates data 6 times across 3 availability zones (AZs). ✅ Improved durability – Can survive entire data center failures.

Answer 11

Shards databases into 10GB chunks. Each chunk is replicated 6 times across a distributed storage fleet. Quorum-based replication (4/6 write quorum, 3/6 read quorum) allows quick failover.

Answer 12

Separates durability and availability: Durability → Handled by a high-speed log service. Availability → Handled by storage tier with replicas.

Answer 13

Log is the bottleneck – every update must be logged before committing. Solution: Store logs in fast storage and replicate them for fault tolerance.

Answer 14

Logs are written to a low-latency, durable storage layer (LZ). XLOG process asynchronously disseminates logs to replicas. Page servers store checkpoints and backups in Azure Storage.

Answer 15

✅ Low write latency – Log service optimizes transaction commits. ✅ Better elasticity – Compute, log, and storage can scale independently.

Answer 16

❌ Higher read latency – If cache misses occur, logs must be replayed. ❌ More complex recovery mechanisms – Logs must be carefully managed.

Answer 17

Reduce read latency using shared remote memory. Reduce duplicate data loading across compute nodes. Elasticity – Allocate memory on demand.

Answer 18

Writes: Compute node updates cache → generates redo log → logs stored durably → updates propagate. Reads: Compute node checks cache → If missing, fetches from shared buffer → If missing, fetches from storage.

Answer 19

✅ Low read latency – Reads from remote memory instead of disk. ✅ High throughput – Different compute nodes share the same buffer. ✅ Elastic memory allocation – Independent of compute/storage scaling.

Answer 20

❌ Network bottleneck – Remote memory access requires high-speed networking.

Answer 21

Prepare phase – Nodes agree to commit or abort. Commit phase – Coordinator sends final decision.

Answer 22

❌ Coordinator crashes cause blocking issues. ❌ Slow commit times if multiple nodes are involved. ✅ Solution: Use Paxos Commit to tolerate failures.

Answer 23

Global-scale distributed transactions with serializability. Uses TrueTime – Synchronizes timestamps via atomic clocks + GPS. Partitions data into “splits” – Replicated using Paxos for consistency.

Answer 24

Uses Multi-Version Concurrency Control (MVCC). Read-only transactions do not need locks.

L9/12 - OLAP & OLTP in the cloud (DBaaS) Flashcards

(48 cards)