06 - MongoDB Sharding and Replication Flashcards
Define
shard
subset of your data.
Each shard should be deployed in a ____.
replica set
Define
Config Servers
Store metadata and configuration settings for your cluster
Define
mongos
Query router to interface with client applications
When does MongoDB split chunks?
How does this get triggered?
- splits chunks when they grow beyond the configured chunk size.
- Both inserts and updates can trigger a chunk split.
Can a chunk with 1 shard key value be split?
no
What happens when you shard a collection?
- Distributing a single dataset across multiple databases, which then allows for storage on multiple machines
- Larger dataset -> smaller chunks
How does MongoDB partition a collection?
MongoDB partitions a collection of documents based on the shard key
- Data is divided into chunks and chunks are allocated to each server.
Why should we choose a shard key carefully?
- shard key has a direct impact on the cluster’s performance
- suboptimal shard key performance or scaling issues due to uneven chunk distribution
- You can always change your data distribution strategy by changing your shard key.
What are the benefits of sharding a collection?
Can now handle more requests
Horizontal-scaling or scale out
Differentiate
vertical vs horizontal scaling
Vertical: increasing power of a single machine
Horizontal: distributing across diff machines
Ranged/Dynamic Sharding
takes a field/attribute as an input and creates predefined ranges
List
Key attributes of an effective shard key
2
high cardinality
well-distributed frequency
Define
cardinality
number of possible values of that key
- Ex. If a shard key only has 3 possible values, then there can only be 3 shards max
Define
frequency
distribution of the data along the possible values
- Ex. If 95% of records occur with a single shard key value then, due to this hotspot, 95% of the records will be allocated to a single shard.