Basics Flashcards
Why is sharding/data partitioning is used?
Because sometimes the only viable option in terms of cost and scaling for an application is adding more servers instead of using a more powerful server.
What are the two sharding methods? Quickly explain.
- Horizontal: rows of the same table are stored in different servers
- Vertical: tables of features are stored in different servers (ex. Users, Photos, UserLikes)
What is dictionary based sharding?
It is a technique of extracting the sharding logic to a lookup service. This moves the complexity away from the app, which queries the lookup service to know where to store/get data from.
Cite three sharding/partition criteria.
- Key/hash based: hash function applied to a key of the record yields the partition number
- List based (partition per data characteristic): each partition has a list of values (ex: one partition stores users from Norway, Sweden and Finland).
- Round robin: rows are inserted in partition nodes in order (using for ex: row_id % n)
Cite three challenges of sharding.
- Difficulty of Joins and need for denormalization of data
- Loss of referential integrity enforcement
- Need of rebalancing (re-sharding)
What does the locality of reference principle say?
recently requested data is likely to be requested again
What is a “distributed cache”?
Cache layer is composed of many nodes, each of which stores a piece of the overall cache, in memory.
Usually a consistent hashing is used to determine which node to query for the data.
What are the three main schemes for cache invalidation during write?
- Write through: write both on the cache and the dB (con: higher write latency)
- Write around: write to DB and only evict the cache (con : next read will cache miss)
- Write back: write to cache and it writes to DB async (con: data loss in case of cache failure)
What are indexes used for?
To improve performance of read operations
What is the drawback of using indexes?
All write operations are degraded because you have to write also on the index.
What is a proxy server?
A proxy server is an intermediary piece of hardware/software that sits between the client and the back-end server.
Give 3 uses for a proxy server.
Request logging
Request filtering
Batch several requests into one
What are queues used for?
To enable async communications between systems.
What is redundancy?
Redundancy means duplication of critical data or services with the intention of increased reliability of the system
What does the CAP theorem states?
CAP theorem states that it is impossible for a distributed software system to simultaneously provide more than two out of three of the following guarantees (CAP): Consistency, Availability and Partition tolerance
How is Consistency achieved in a distributed system?
Reads are not allowed until all nodes are updated.
How is Availability achieved in distributed systems?
Data is replicated across multiple servers.
What does partition tolerance means?
Means that a system continues to work despite message loss or partial failure.