high level overviews, storage, tables Flashcards
Code-Deployment System high level overview
our system can actually very simply be divided into 2 clear subsystems:
- Build System that builds code into binaries
- Deployment System that deploys binaries to our machines across the world
Code-Deployment System storage
We’ll use blob storage (Google Cloud Storage or S3) to store our code binaries. Blob storage makes sense here, because binaries are literally blobs of data.
Code-Deployment System table
- jobs table
- id: pk, auto-inc integer
- created_at: timestamp
- commit_sha: string
- name: string, the pointer to the job’s eventual binary in blob storage
- status: string, QUEUED, RUNNING, SUCCEEDED, FAILED
- table for replication status of blobs
AlgoExpert high level overview
We can divide our system into 3 core components:
- Static UI content
- Accessing and interacting with questions (question completion status, saving solutions, etc.)
- Ability to run code
AlgoExpert storage
For the UI static content, we can put public assets like images and JS bundles in a blob store: S3 or GCS. Since we’re catering to a global audience and we care about having a responsive website, we want to use a CDN to serve that content. This is esp important for mobile b/c of the slow connections that phones use.
Static API content, like the list of questions and all solutions, also goes in a blob store for simplicity.
AlgoExpert table
Since this data will have to be queried a lot, a SQL db like Postgres or MySQL seems like a good choice.
Table 1. question_completion_status
- id: pk, auto-inc integer
- user_id
- question_id
- completion_status (enum)
Table 2. user_solutions
- id: pk, auto-inc integer
- user_id
- question_id
- language
- solution
Stockbroker high level overview
- the PlaceTrade API call that clients will make
- the API server(s) handling client API calls
- the system in charge of executing orders for each customer
Stockbroker table
- for trades
- id
- customer_id
- stockTicker
- type: string, either BUY or SELL
- quantity: integer (no fractional shares)
- status: string, the status of the trade; starts as PLACED
- reason: string, the human readable justification of the trade’s status
- created_at: timestamp, the time when the trade was created - for balances
- id, customer_id, amount, last_modified
Amazon high level overview
There’s a USER side and a WAREHOUSE side.
Within a region, user and warehouse requests will get round-robin-load-balanced to respective sets of API servers, and data will be written to and read from a SQL database for that region.
We’ll go with a SQL db because all of the data is, by nature, structured and lends itself well to a relational model.
Amazon table
6 SQL tables
1. items (name, description, price, etc)
2. carts
3. orders
4. aggregated stock (all of the item stocks on Amazon that are relevant to users)
5. warehouse orders
6. warehouse stock (must have physicalStock and availableStock)
FB news feed high level overview
- 2 API calls, CreatePost and GetNewsFeed
- feed creation and storage strategy, then tie everything together
FB news feed storage
We can have one main relational database to store most of our system’s data, including posts and users. This database will have very large tables.
Google Drive high level overview
- we’ll need to support the following operations:
- files: upload, download, delete, rename, move
- folders: create, get, rename, delete, move - design storage solution for
- entity (files and folders) metadata
- file content
Google Drive storage
To store entity info, we use K-V stores. Since we need high availability and data replication, we need to use something like Etcd, Zookeeper, or Google Cloud Spanner (as a K-V store) that gives us both of those guarantees as well as consistency (as opposed to DynamoDB, for instance, which would give us only eventual consistency).
To store file chunks, GCS.
To store blob reference counts, SQL table.
Google Drive table
for files and folders.
both have:
id, is_folder (t/f), name, owner_id, parent_id
difference: files have blobs (array of blob hashes); folders have children (array of children IDs)
Netflix high level overview
- Storage (Video Content, Static Content, and User Metadata)
- General Client-Server Interaction (i.e., the life of a query)
- Video Content Delivery
- User-Activity Data Processing
Netflix storage
- Since we’re only dealing with a few hundred terabytes of video content, we can use a simple blob storage solution like S3 or GCS.
- Static content (titles, cast lists, descriptions) in a relational db or even in a document store, and we can cache most of it in our API servers.
- User metadata in a classic relational db like Postgres.
- User activity logs in HDFS
Tinder high level overview
- Overview
- Profile Creation
- Deck Generation
- Swiping
maybe super-liking and undoing