System Design Week 2 - Search Engine Flashcards

1
Q

Problem 1) Search Engine

Problem statement : Design a search engine system based on available dataset.
Clarifying questions :
questions 1 : Is data constantly changing or its static data ? For now assume static data.
questions 2 : example of the query and expected output.
→ Brutus and Caesar : should respond with the documents list in which these 2 keywords are present.
→ caesar and Brutus Not calpurnia.

Question 1) List down the functional requirements

A
  • Given static content data, data is available to search.
  • User should be able to search content with keyword,
  • user persistence
  • Ignore security
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Problem 1) Search Engine

Problem statement : Design a search engine system based on available dataset.
Clarifying questions :
questions 1 : Is data constantly changing or its static data ? For now assume static data.
questions 2 : example of the query and expected output.
→ Brutus and Caesar : should respond with the documents list in which these 2 keywords are present.
→ caesar and Brutus Not calpurnia.

Question 2) List down the non-functional Requirements

A
  • size of corpus : hundreds of billions of documents
  • throughput : 70,000 qps
  • Availability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Problem 1) Search Engine

Problem statement : Design a search engine system based on available dataset.
Clarifying questions :
questions 1 : Is data constantly changing or its static data ? For now assume static data.
questions 2 : example of the query and expected output.
→ Brutus and Caesar : should respond with the documents list in which these 2 keywords are present.
→ caesar and Brutus Not calpurnia.

Question 3) What are the microservices?

A
  • Search Service
  • Indexing service
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Problem 1) Search Engine

Problem statement : Design a search engine system based on available dataset.
Clarifying questions :
questions 1 : Is data constantly changing or its static data ? For now assume static data.
questions 2 : example of the query and expected output.
→ Brutus and Caesar : should respond with the documents list in which these 2 keywords are present.
→ caesar and Brutus Not calpurnia.

Question 4) What is the logical architecture?

A

Diagram: https://drive.google.com/file/d/1pyhZdUyBzeEhK5K6TipXgaoWnt8l9KLg/view?usp=sharing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Problem 1) Search Engine

Problem statement : Design a search engine system based on available dataset.
Clarifying questions :
questions 1 : Is data constantly changing or its static data ? For now assume static data.
questions 2 : example of the query and expected output.
→ Brutus and Caesar : should respond with the documents list in which these 2 keywords are present.
→ caesar and Brutus Not calpurnia.

Question 5) What is the schema design?

A
  • K-V : keyword with list of documents (nosql)
  • Document of large size (Object store)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Problem 1) Search Engine

Problem statement : Design a search engine system based on available dataset.
Clarifying questions :
questions 1 : Is data constantly changing or its static data ? For now assume static data.
questions 2 : example of the query and expected output.
→ Brutus and Caesar : should respond with the documents list in which these 2 keywords are present.
→ caesar and Brutus Not calpurnia.

Question 6) What is the API design?

A
  • Search(word)
  • Search(expression)
  • Index(document)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Problem 1) Search Engine

Problem statement : Design a search engine system based on available dataset.
Clarifying questions :
questions 1 : Is data constantly changing or its static data ? For now assume static data.
questions 2 : example of the query and expected output.
→ Brutus and Caesar : should respond with the documents list in which these 2 keywords are present.
→ caesar and Brutus Not calpurnia.

Question 7) What is the Business Logic?

A

Core business logic includes the solution with the indexing of documents, and store the data into different shards.
Reverse indexer keep track of the keyword is present in which all databases(shards)
If a keyword is searched with the union of another word then the reverse indexer will give the shards on which to look for, the common shards are scanned to find the actual document.
The Vertical sharding is key to shard the data on the values.Sharded data to be stored as sorted.
[keyword] -> doc1, doc2, doc3 . . .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Problem 1) Search Engine

Problem statement : Design a search engine system based on available dataset.
Clarifying questions :
questions 1 : Is data constantly changing or its static data ? For now assume static data.
questions 2 : example of the query and expected output.
→ Brutus and Caesar : should respond with the documents list in which these 2 keywords are present.
→ caesar and Brutus Not calpurnia.

Question 7) What are the Design Considerations?

A

Design Considerations: https://drive.google.com/file/d/1_4ZAENR6XOtRIctvrfixoD55g0nGWx2w/view?usp=sharing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly