06 - MongoDB Sharding and Replication Flashcards
Define
shard
subset of your data.
Each shard should be deployed in a ____.
replica set
Define
Config Servers
Store metadata and configuration settings for your cluster
Define
mongos
Query router to interface with client applications
When does MongoDB split chunks?
How does this get triggered?
- splits chunks when they grow beyond the configured chunk size.
- Both inserts and updates can trigger a chunk split.
Can a chunk with 1 shard key value be split?
no
What happens when you shard a collection?
- Distributing a single dataset across multiple databases, which then allows for storage on multiple machines
- Larger dataset -> smaller chunks
How does MongoDB partition a collection?
MongoDB partitions a collection of documents based on the shard key
- Data is divided into chunks and chunks are allocated to each server.
Why should we choose a shard key carefully?
- shard key has a direct impact on the cluster’s performance
- suboptimal shard key performance or scaling issues due to uneven chunk distribution
- You can always change your data distribution strategy by changing your shard key.
What are the benefits of sharding a collection?
Can now handle more requests
Horizontal-scaling or scale out
Differentiate
vertical vs horizontal scaling
Vertical: increasing power of a single machine
Horizontal: distributing across diff machines
Ranged/Dynamic Sharding
takes a field/attribute as an input and creates predefined ranges
List
Key attributes of an effective shard key
2
high cardinality
well-distributed frequency
Define
cardinality
number of possible values of that key
- Ex. If a shard key only has 3 possible values, then there can only be 3 shards max
Define
frequency
distribution of the data along the possible values
- Ex. If 95% of records occur with a single shard key value then, due to this hotspot, 95% of the records will be allocated to a single shard.
Hashing
set of fields determines the allocation of the record to a given shard.
* allows more even distribution across shards even when there is not a suitable shard key
* no lookup table needs to be maintained.
List
Drawbacks of hashing
- query operations for multiple records are more likely to get distributed across multiple shards. Whereas ranged sharding reflects the natural structure of the data across shards, hashed sharding typically disregards the meaning of the data. This is reflected in increased broadcast operation occurrence.
- expensive
LIst
Advantages - Sharded & replicated cluster
3
Increased read/write throughput
Increased storage capacity
High availability
List
Disadvantages - Sharded & replicated cluster
3
Query Overhead
* Additional latency on every operation, need to query each shard and merge results
Complexity of administration
* Increased upkeep and maintenance
Increased infrastructure costs
* Additional compute and machines
List
Alternatives to Sharded & Replicated Cluster
2
Vertical Scaling
Replication
Define
MongoDB replication
process of creating a copy of the same data set in more than one MongoDB server.
- One drawback of all shards having secondary server for each shard is the cost.