System Design - Random concepts Flashcards
What is a reverse index (in the context of database keyword search) ?
A reverse index is a data structure that maps each keyword to the list of documents (or statuses) that contain it, enabling fast lookups for search queries. This is important because it allows the search service to quickly retrieve relevant posts without scanning the entire database, significantly improving search performance and scalability.
https://www.cockroachlabs.com/blog/inverted-indexes/
When building a search service, what are the benefits of pruning?
You can save storage space by pruning (deleting) rarely accessed search queries.
What are bi-grams and tri-grams in the context of keyword search?
A trigram is a sequence of three consecutive characters or words used in keyword search design, commonly in the context of text analysis, search engines, or natural language processing. Trigrams help improve search accuracy by breaking down words or phrases into smaller, more manageable pieces
Describe how YouTube DB architecture changed over time?
- Single DB
- Leader with 2 Read Replicas
- Multiple Shards. Each Shard has a Leader and 2 Replicas
- Multiple Regions (availability zones) with DB replication between zones
- Starting using Vitess - which does sharing. Involves using a keyspace
How can you avoid cross shard joins queries?
1) One way is to denormalize the data. This means storing the foreign key in the same table as another. But this adds some complexity as there needs to be a process to check that denormalized data is valid compared to other places.
2) Try to have related tables in the same shard. This could include Posts, Likes, Comments
3) Another way is to do application level joins. So don’t do a database level join. Instead query data from multiple joins and combine them at the application level.
What is consistent hashing?
It is a way distributing keys evenly.
There is minimal data sent over
It is used for data partitioning and load balancing.
Hello
World