System Design Week 2 - Search Engine Flashcards
Problem 1) Search Engine
Problem statement : Design a search engine system based on available dataset.
Clarifying questions :
questions 1 : Is data constantly changing or its static data ? For now assume static data.
questions 2 : example of the query and expected output.
→ Brutus and Caesar : should respond with the documents list in which these 2 keywords are present.
→ caesar and Brutus Not calpurnia.
Question 1) List down the functional requirements
- Given static content data, data is available to search.
- User should be able to search content with keyword,
- user persistence
- Ignore security
Problem 1) Search Engine
Problem statement : Design a search engine system based on available dataset.
Clarifying questions :
questions 1 : Is data constantly changing or its static data ? For now assume static data.
questions 2 : example of the query and expected output.
→ Brutus and Caesar : should respond with the documents list in which these 2 keywords are present.
→ caesar and Brutus Not calpurnia.
Question 2) List down the non-functional Requirements
- size of corpus : hundreds of billions of documents
- throughput : 70,000 qps
- Availability
Problem 1) Search Engine
Problem statement : Design a search engine system based on available dataset.
Clarifying questions :
questions 1 : Is data constantly changing or its static data ? For now assume static data.
questions 2 : example of the query and expected output.
→ Brutus and Caesar : should respond with the documents list in which these 2 keywords are present.
→ caesar and Brutus Not calpurnia.
Question 3) What are the microservices?
- Search Service
- Indexing service
Problem 1) Search Engine
Problem statement : Design a search engine system based on available dataset.
Clarifying questions :
questions 1 : Is data constantly changing or its static data ? For now assume static data.
questions 2 : example of the query and expected output.
→ Brutus and Caesar : should respond with the documents list in which these 2 keywords are present.
→ caesar and Brutus Not calpurnia.
Question 4) What is the logical architecture?
Diagram: https://drive.google.com/file/d/1pyhZdUyBzeEhK5K6TipXgaoWnt8l9KLg/view?usp=sharing
Problem 1) Search Engine
Problem statement : Design a search engine system based on available dataset.
Clarifying questions :
questions 1 : Is data constantly changing or its static data ? For now assume static data.
questions 2 : example of the query and expected output.
→ Brutus and Caesar : should respond with the documents list in which these 2 keywords are present.
→ caesar and Brutus Not calpurnia.
Question 5) What is the schema design?
- K-V : keyword with list of documents (nosql)
- Document of large size (Object store)
Problem 1) Search Engine
Problem statement : Design a search engine system based on available dataset.
Clarifying questions :
questions 1 : Is data constantly changing or its static data ? For now assume static data.
questions 2 : example of the query and expected output.
→ Brutus and Caesar : should respond with the documents list in which these 2 keywords are present.
→ caesar and Brutus Not calpurnia.
Question 6) What is the API design?
- Search(word)
- Search(expression)
- Index(document)
Problem 1) Search Engine
Problem statement : Design a search engine system based on available dataset.
Clarifying questions :
questions 1 : Is data constantly changing or its static data ? For now assume static data.
questions 2 : example of the query and expected output.
→ Brutus and Caesar : should respond with the documents list in which these 2 keywords are present.
→ caesar and Brutus Not calpurnia.
Question 7) What is the Business Logic?
Core business logic includes the solution with the indexing of documents, and store the data into different shards.
Reverse indexer keep track of the keyword is present in which all databases(shards)
If a keyword is searched with the union of another word then the reverse indexer will give the shards on which to look for, the common shards are scanned to find the actual document.
The Vertical sharding is key to shard the data on the values.Sharded data to be stored as sorted.
[keyword] -> doc1, doc2, doc3 . . .
Problem 1) Search Engine
Problem statement : Design a search engine system based on available dataset.
Clarifying questions :
questions 1 : Is data constantly changing or its static data ? For now assume static data.
questions 2 : example of the query and expected output.
→ Brutus and Caesar : should respond with the documents list in which these 2 keywords are present.
→ caesar and Brutus Not calpurnia.
Question 7) What are the Design Considerations?
Design Considerations: https://drive.google.com/file/d/1_4ZAENR6XOtRIctvrfixoD55g0nGWx2w/view?usp=sharing