sdi 3 Flashcards
fb messenger - the avg msg is how large?
100 bytes
messenger high level design
we will need a chat server that will be the central piece. When a user wants to send a message to another user, they will connect
to the chat server and send the message to the server; the server then passes that message to the other user and also stores it in the database.
pull model
Users periodically ask the server if there are any new messages for them. Server keeps track of undelivered msgs; when user connects to the server, server returns all the pending msgs. To minimize latency, users have to check the server quite frequently, and a lot of the time they’ll be getting an empty response. This will waste a lot of resources.
push model
Users can keep a connection open with the server and can depend upon the server to notify them whenever there are new messages. The server does not need to keep track of the pending messages, and we will have minimum latency.
How can the central chat server keep track of all opened connections to redirect messages to users efficiently?
The server can maintain a hash table, where “key” would be the UserID and “value” would
be the connection object. So whenever the server receives a message for a user, it looks up that user in the ht.
What will happen when the central chat server receives a message for a user who has gone offline?
If the
receiver has disconnected, the server can notify the sender about the delivery failure. If it is a
temporary disconnect, e.g., the receiver’s long-poll request just timed out, we can ask the sender to retry sending. This retry
could be embedded in the client’s logic so that users don’t have to retype the message. The server can also store the message for a while and retry sending it once the receiver reconnects.
How should the central chat server process a ‘deliver message’ request?
upon receiving a new message:
1) Store msg in db
2) Send msg to receiver
3) Send acknowledgment to sender
The chat server will first find the server that holds the connection for the receiver and pass the message
to that server
How can we maintain sequencing of the messages?
Storing timestamps won’t ensure correct ordering of messages for clients.
we need to keep a sequence number for every message for each client. This number will determine the ordering of messages for EACH user. Both clients will see a different view of the message sequence, but this view will be consistent for them on all devices.
messenger - what storage system CAN’T we use?
We cannot use RDBMS like MySQL or NoSQL like MongoDB because we cannot afford to read/write
a row from the database every time a user receives/sends a message. This will not only make the basic
operations of our service run with high latency, but also create a huge load on databases.
messenger - what storage system should we use?
- must support very high rate of
small updates and also fetch a range of msgs quickly - solution: wide-column database solution like HBase. HBase is a column-oriented key-value NoSQL db.
- HBase groups data together to store new data in a memory buffer and, once the buffer is full, it dumps the data to the disk.
- HBase is also an efficient database to store variably sized data.
pagination
Clients should paginate while fetching
data from the server. Page size could be different for different clients, e.g., cell phones have smaller
screens, so we need a fewer number of message/conversations in the viewport.
messenger data partitioning
Partitioning based on MessageID: If we store different msgs of a user on separate shards, fetching a range of msgs would be very slow, so bad idea.
Partitioning based on UserID: find shard number by “hash(UserID) % (num of shards)” and then store/retrieve the data from there. very quick to fetch chat history
messenger caching
We can cache a few recent messages (say last 15) in a few recent conversations that are visible in a
user’s viewport (say last 5). Since we decided to store all of the user’s messages on one shard, cache for
a user should entirely reside on one machine too.
What will happen when a chat server fails?
It’s extremely hard to failover TCP connections to other servers; an easier approach can be to
have clients automatically reconnect if the connection is lost.
Should we store multiple copies of user messages?
We cannot have only one copy of the user’s data, because if the server holding the data crashes or is down permanently, we can’t recover that data. Either we have to store multiple copies of the data on different servers or use techniques like Reed-Solomon encoding to distribute and replicate it.