high level overviews, storage, tables Flashcards

Question

AirBNB storage

Answer 1

listings and reservations in SQL table (src of truth) since we care about the latency of browsing listings on Airbnb, and since this browsing requires querying listings based on their location, we can store our listings in a region quadtree. Since we'll be storing our quadtree in memory, we must ensure that a single machine failure doesn't bring down the entire browsing functionality. So we can set up a cluster of machines, each holding an instance of our quadtree in memory, and these machines can use leader election.

Answer 2

SQL tables for - listings - reservations

Answer 3

we can have 2 primary clusters of backend servers in the 2 important regions: U.S. and India. We can have some DNS load balancing to route API requests to the cluster closest to the user, and within a region, we can have path-based load balancing to separate our services (payments, authentication, code execution, etc.), esp since the code execution platform will probably need to run on different kinds of servers compared to those of the rest of the API.

Answer 4

We can implement 2 layers of caching for static API content: client-side (1. users will only need to load questions once per session, 2. load on backend servers is reduced) and server-side.

Answer 5

PlaceTrade(customerId, stockTicker, type (BUY/SELL), quantity) => (tradeId, stockTicker, type: string, quantity, createdAt, status (always PLACED)) we can imagine that a GetTrade API call could return the statuses IP, FILLED, REJECTED, along with a reason

Answer 6

We'll need multiple API servers to handle all of the incoming requests. Since we don't need any caching when making trades (don't care about server stickiness), we can just use some round-robin load balancing to distribute incoming requests among our API servers.

Answer 7

Atomicity is a feature of databases systems dictating where a transaction must be all-or-nothing

Answer 8

only valid data will be written; data cannot be written that would violate the database’s own rules for valid data

Answer 9

users call the GetItemCatalog(search) endpoint when they're searching for items. The request is routed by API servers to the smart search-results service, which interacts directly with the items table, caches popular item searches, and returns the results.

Answer 10

CreatePost(user_id, post) GetNewsFeed(user_id, pageSize, nextPageToken) => (posts: [], nextPageToken) supports pagination

Answer 11

based on the user id, NOT post id b/c then, news feed generation would require cross-shard joins

Answer 12

We must shard our entity metadata, across multiple clusters K-V stores. Sharding on entityID means we'll lose the ability to perform batch operations, which these K-V stores give us out of the box and which we'll need when we move entities around. We should shard based on ownerID, which means we can edit the metadata of multiple entities ATOMICALLY with a transaction.

Answer 13

Given the traffic that this website needs to serve, we can have a layer of proxies for entity information, load balanced on a hash of the ownerID. The proxies could have some caching

Answer 14

we can split our user-metadata db into a handful of shards, based on user id, each managing anywhere between 1 and 10 TB of indexed data. This will maintain very quick reads and writes for a given user.

Answer 15

a storage method in which data is written into the cache and the corresponding main memory location at the same time

Answer 16

We can use round-robin load balancing to distribute user network requests across our API servers, which can then load-balance db requests according to userId.

Answer 17

we can cache static content in our API servers, periodically updating it when new movies and shows are released, and we can also cache user metadata there, using write-through.

Answer 18

Since our tables are very large, esp msgs, we must shard. Natural approach: shard based on org size. Biggest orgs in their own shards; smaller organizations grouped together in other shards. BUT we'll have problems when an org's size increases dramatically or when activity surges within an org. Hotspots mean latency goes up. So we add a service that'll asynchronously measure org activity and "rebalance" shards accordingly. This service can be a strongly consistent K-V store (Etcd / ZooKeeper), mapping orgIds to shards. Our API servers communicates w/ this service to know which shard to route requests to.

Answer 19

a property of some operations such that no matter how many times you execute them, you achieve the same result

Answer 20

We'll want a load balancer btwn the clients and the API servers that sub to Kafka topics, which will also talk to the "smart" sharding service to match clients with the right API servers.

Answer 21

Host side - requests to create and delete listings are LBed across a set of API servers using round-robin. The API servers are in charge of writing to the SQL db. Renter side - LB requests to list, get, and reserve listings across a set of API servers using an API-path-based server-selection strategy. Note: NO caching at our API servers -- we'll run into stale data as reservations and listings appear.

high level overviews, storage, tables Flashcards

(45 cards)