Distributed Email System Flashcards

1
Q

What are the functional requirements ?

A
  • Send and receive emails.
  • Fetch all emails.
  • Filter emails by read and unread status.
  • Search emails by subject, sender, and body.
  • Anti-spam and anti-virus.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the non-functional requirements ?

A

Reliability. We should not lose email data.
Availability. Email and user data should be automatically replicated across multiple nodes to ensure availability. Besides, the system should continue to function despite partial system failures.
Scalability. As the number of users grows, the system should be able to handle the increasing number of users and emails. The performance of the system should not degrade with more users or emails.
Flexibility and extensibility. A flexible/extensible system allows us to add new features or improve performance easily by adding new components. Traditional email protocols such as POP and IMAP have very limited functionality (more on this in high-level design). Therefore, we may need custom protocols to satisfy the flexibility and extensibility requirements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Back-of-the-envelope estimation

A

Let’s do a back-of-the-envelope calculation to determine the scale and to discover some challenges our solution will need to address. By design, emails are storage heavy applications.

  • 1 billion users.
  • Assume the average number of emails a person sends per day is 10. QPS for sending emails = 10^9 * 10 / (10^5) = 100,000.
  • Assume the average number of emails a person receives in a day is 40 [3] and the average size of email metadata is 50KB. Metadata refers to everything related to an email, excluding attachment files.
  • Assume metadata is stored in a database. Storage requirement for maintaining metadata in 1 year: 1 billion users * 40 emails / day * 365 days * 50 KB = 730 PB.
  • Assume 20% of emails contain an attachment and the average attachment size is 500 KB.
  • Storage for attachments in 1 year is: 1 billion users * 40 emails / day * 365 days * 20% * 500 KB = 1,460 PB

From this back-of-the-envelope calculation, it’s clear we would deal with a lot of data. So, it’s likely that we need a distributed database solution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is SMTP ?

A

Simple Mail Transfer Protocol (SMTP) is the standard protocol for sending emails from one mail server to another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is POP (Post Office Protocol) ?

A

POP is a standard mail protocol to receive and download emails from a remote mail server to a local email client. Once emails are downloaded to your computer or phone, they are deleted from the email server, which means you can only access emails on one computer or phone. The details of POP are covered in RFC 1939 [4]. POP requires mail clients to download the entire email. This can take a long time if an email contains a large attachment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is IMAP ?

A

IMAP is also a standard mail protocol for receiving emails for a local email client. When you read an email, you are connected to an external mail server, and data is transferred to your local device. IMAP only downloads a message when you click it, and emails are not deleted from mail servers, meaning that you can access emails from multiple devices. IMAP is the most widely used protocol for individual email accounts. It works well when the connection is slow because only the email header information is downloaded until opened.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is HTTP to email ecosystem ?

A

HTTPS is not technically a mail protocol, but it can be used to access your mailbox, particularly for web-based email. For example, it’s common for Microsoft Outlook to talk to mobile devices over HTTPS, on a custom-made protocol called ActiveSync

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is in for DNS (Domain Name Service) ?

A

A DNS server is used to look up the mail exchanger record (MX record) for the recipient’s domain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to raise Email deliverability ?

A

Dedicated IPs. It is recommended to have dedicated IP addresses for sending emails. Email providers are less likely to accept emails from new IP addresses that have no history.

Classify emails. Send different categories of emails from different IP addresses. For example, you may want to avoid sending marketing and important emails from the same servers because it might make ISPs mark all emails as promotional.

Email sender reputation. Warm up new email server IP addresses slowly to build a good reputation, so big providers such as Office365, Gmail, Yahoo Mail, etc. are less likely to put our emails in the spam folder. According to Amazon Simple Email Service [20], it takes about 2 to 6 weeks to warm up a new IP address.

Ban spammers quickly. Spammers should be banned quickly before they have a significant impact on the server’s reputation.

Feedback processing. It’s very important to set up feedback loops with ISPs so we can keep the complaint rate low and ban spam accounts quickly. If an email fails to deliver or a user complains, one of the following outcomes occurs:

  • Hard bounce. This means an email is rejected by an ISP because the recipient’s email address is invalid.
  • Soft bounce. A soft bounce indicates an email failed to deliver due to temporary conditions, such as ISPs being too busy.
  • Complaint. This means a recipient clicks the “report spam” button.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly