content Flashcards
code-deployment Build System – General Overview
we can design our build system as a queue of jobs. each job has a commit identifier (the commit SHA) for what version of the code it should build and the name of the resulting binary that will be created.
A pool of servers (workers) handles these jobs. Each worker takes jobs off the q and writes the resulting binaries to blob storage.
code-deployment Build System – how should we NOT design our job q?
A naive design of the job queue is implementing it in memory, but if there’s a failure in the servers that hold this queue, we lose the entire state of our jobs: queued jobs and past jobs.
We’d be unnecessarily complicating matters by trying to optimize around this in-memory type of storage – better off implementing the queue using a SQL db.
code-deployment Build System – how should we design our job q?
jobs table in sql db. We can implement the dequeuing mechanism by looking at the oldest created_at timestamp with a QUEUED status. This means that we want to index our table on created_at AND status.
code-deployment concurrency
ACID transactions will make it safe for potentially hundreds of workers to grab jobs off the queue without unintentionally running the same job twice
code-deployment Build System – how do we handle lost jobs?
Since builds last 15m, it’s likely that a worker dies mid-build and that job remains ‘RUNNING’ forever.
We could have an extra col in our jobs table called last_heartbeat. The worker running that job will update the relevant row every 3-5m to just let us know that it’s still running the job.
We can then have a separate service that polls the table every so often (say, every 5m), checks all of the RUNNING jobs, and if their last_heartbeat was last modified longer than 2 heartbeats ago (need margin of error), then something’s likely wrong, and this service can reset the status of the relevant jobs to QUEUED, which would effectively bring them back to the front of the queue.
code-deployment Build System – how do we handle storage?
When a worker completes a build, it can store the binary in GCS before updating the jobs table. This ensures that a binary has been stored before it’s marked as SUCCEEDED.
Since we’ll deploy our binaries to machines spread across the world, we should have regional storage rather than just a single global blob store.
We can design our system based on clusters around the world (in 5-10 global regions). Each region has a blob store (a regional GCS bucket). Once a worker successfully stores a binary in our main blob store, the worker is released and can run another job, while the main blob store performs some asynchronous replication to store the binary in all of the regional GCS buckets.
code-deployment Deployment System – General Overview
our deployment system must allow for the very fast distribution of 10GB binaries to hundreds of thousands of machines across the world. We want:
- a service that tells us when a binary has been replicated in all regions
- a service that serves as the source of truth for what binary should currently be run on all machines
- a peer-to-peer-network design for our machines across the world
deployment system – Replication-Status Service
We can have a global service that continuously checks that a given binary in the main blob store has been replicated across all regional GCS buckets. Once a binary has been replicated, this service updates a separate SQL db with rows containing the name of a binary and a replication_status. Once a binary has a “complete” replication_status, it’s officially deployable.
Deployment System – how do we distribute binaries to machines across the world quickly?
Having each machine download a 10GB file one after the other from a regional blob store is extremely slow. A peer-to-peer-network approach will be much faster and will allow us to hit our 30-minute time frame for deployments.
what happens when an engineer hits a button that says “Deploy build/binary B1 to every machine globally”?
We’ll have a K-V store like Etcd or ZooKeeper storing our goal-state, which is the desired build version at that point in time and will look something like: “current_build: B1”. We’ll have a global K-V store and regional K-V stores.
Regional K-V stores will poll the global K-V store (say, every 10s) for updates and will update themselves accordingly.
Machines in each global region will poll the regional K-V store, and when the build_version changes, they’ll try to fetch that build from the P2P network and run the binary.
AlgoExpert access control
users who haven’t purchased AlgoExpert can’t access individual questions. We can implement this fairly easily by just making some internal API call whenever a user requests our static API content to figure out if the user owns the product before returning the full content for questions.
AlgoExpert – replication
we do need to keep our databases up to date with each other, since users might travel around the world and hit a different database server than their typical one.
For this, we can have some async replication between our database servers.
AlgoExpert – code execution
- rate limiting using a K-V Store like Redis to easily prevent DoS attacks
- Since we want 1-3s of latency for running code, we need to keep a set of special servers–our “workers”– ready to run code at all times. They each clean up after running code. Our backend servers can contact a worker and get a response from that worker when it’s done running code (or if the code timed out), and our servers can return that to the UI in the same request.
- logging and monitoring for run-code events to help us scale
This design scales horizontally with our number of users, and it can scale vertically to make running code even faster (more CPU == faster runs).
Stockbroker – what happens after a PlaceTrade call is placed?
Once API servers receive a PlaceTrade call, they’ll store the trade in a SQL table. This table needs to be in the same SQL database as the one that the balances table is in, because we’ll need to use ACID transactions to alter both tables in an atomic way.
Stockbroker – Trade-Execution Queue
- we use a message queue like Apache Kafka or Google Cloud Pub/Sub
- there is a set of topics that customer IDs map to. When a user makes a trade, the API server writes a row to the database and also creates a message that gets routed to that user’s topic. (this guarantees: 1. at-least-once delivery semantics (trades are never missed) 2. only 1 worker trying to execute a trade
- each topic’s subscriber is a cluster of worker servers. we use leader election for high availability. the leader executes the trade by contacting the exchange.