System Design - Random concepts Flashcards

1
Q

What is a reverse index (in the context of database keyword search) ?

A

A reverse index is a data structure that maps each keyword to the list of documents (or statuses) that contain it, enabling fast lookups for search queries. This is important because it allows the search service to quickly retrieve relevant posts without scanning the entire database, significantly improving search performance and scalability.

https://www.cockroachlabs.com/blog/inverted-indexes/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When building a search service, what are the benefits of pruning?

A

You can save storage space by pruning (deleting) rarely accessed search queries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are bi-grams and tri-grams in the context of keyword search?

A

A trigram is a sequence of three consecutive characters or words used in keyword search design, commonly in the context of text analysis, search engines, or natural language processing. Trigrams help improve search accuracy by breaking down words or phrases into smaller, more manageable pieces

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe how YouTube DB architecture changed over time?

A
  1. Single DB
  2. Leader with 2 Read Replicas
  3. Multiple Shards. Each Shard has a Leader and 2 Replicas
  4. Multiple Regions (availability zones) with DB replication between zones
  5. Starting using Vitess - which does sharing. Involves using a keyspace
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can you avoid cross shard joins queries?

A

1) One way is to denormalize the data. This means storing the foreign key in the same table as another. But this adds some complexity as there needs to be a process to check that denormalized data is valid compared to other places.
2) Try to have related tables in the same shard. This could include Posts, Likes, Comments
3) Another way is to do application level joins. So don’t do a database level join. Instead query data from multiple joins and combine them at the application level.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is consistent hashing?

A

It is a way distributing keys evenly.
There is minimal data sent over

It is used for data partitioning and load balancing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Hello

A

World

How well did you know this?
1
Not at all
2
3
4
5
Perfectly