sdi 2 Flashcards
dropbox design considerations
- We should expect huge read and write volumes.
- Read to write ratio is expected to be nearly the same.
- Internally, files can be stored in small parts or chunks (say 4MB); this can provide a lot of
benefits i.e. if a user fails to
upload a file, then only the failing chunk will be retried. - We can reduce the amount of data exchange by transferring updated chunks only.
- ACIDity of all file operations is required
dropbox capacity estimation
- ask total users and DAU
- Let’s assume on average each user connects from 3 different devices.
- Each user has 200 files/photos
- average file size is 100KB
- 1M active connections per min
dropbox high level design
- The user will specify a folder as the workspace on their device. Anything placed in this
folder will be uploaded to the cloud, and whenever a file is modified or deleted, it will be reflected in the same way in the cloud storage. The user can specify similar workspaces on all their devices and any modification done on one device will be propagated to all other devices - 3 types of “main” servers:
1. Block servers works w/ clients to upload/download files from cloud storage
2. Metadata servers
3. Synchronization servers will notify clients about changes
insta high level design
we need to support two scenarios, one to upload photos and the other to view/search
photos.
- storage includes:
1. obj storage servers for photos
2. db servers for metadata
pastebin high level design
application layer that will serve all the read and write requests. Application layer will talk to a storage layer to store and retrieve data.
- storage layer is divided into obj storage and metadata storage
dropbox - what does client application do?
- monitor workspace folder on user’s machine to detect changes
- work with the storage servers to upload, download, and modify actual files to backend Cloud Storage
- interacts with the remote Synchronization Service to handle any file metadata updates
dropbox metadata
Keeping a local copy of metadata not only enables us to do offline updates but also saves a lot of round trips to update remote metadata.
http long polling
A way for clients maintain an open connection with the server.
Client requests information from server w/ expectation that the server may not respond immediately.
If the server has no new data for the client when the poll is received, instead of sending an empty response, the server holds the request open and waits for response information to become available.
Once it does have new info, the server immediately sends an HTTP/S response to the client. Upon receipt of the server response, the client can immediately issue another request.
dropbox client consists of:
- Internal Metadata Database will keep track of all the files, chunks, their versions, and their
location in the file system. - chunker
- watcher
- indexer
chunker
- splits files into chunks
- reconstructs file from its chunks
- chunking algorithm will detect the parts of the files that have
been modified by the user and transfer only those parts to the Cloud Storage
We can statically calculate what could be an optimal
chunk size based on
1) Storage devices we use in the cloud
2) Network bandwidth
3) Average file size in the storage
watcher
- monitor the local workspace folders and notify the Indexer of any action performed by the users
- also listens to any changes happening on other clients that are broadcasted by
Synchronization service.
indexer
- process events received from Watcher and update internal metadata database w/ info about the chunks
- Once chunks are successfully submitted to Cloud Storage, Indexer communicates w/ Sync Service to broadcast changes to other clients and update remote metadata database.
Should mobile clients sync remote changes immediately?
Unlike desktop or web clients, mobile
clients usually sync on demand to save user’s bandwidth and space.
dropbox metadata db
Sync Service should be able to provide a consistent view of the files using this db, esp if more than 1 user is working w/ the same file simultaneously.
if we choose noSQL such as DynamoDB:
ACID properties not supported in favor of scalability and performance, we need to incorporate the support for ACID properties programmatically in the logic of our Sync Service
if we use relational database such as MySQL, the Sync Service implementation will be simpler b/c rel DBs natively support ACID properties.