06 - MongoDB Sharding and Replication Flashcards

1
Q

Define

shard

A

subset of your data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Each shard should be deployed in a ____.

A

replica set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define

Config Servers

A

Store metadata and configuration settings for your cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define

mongos

A

Query router to interface with client applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When does MongoDB split chunks?
How does this get triggered?

A
  • splits chunks when they grow beyond the configured chunk size.
  • Both inserts and updates can trigger a chunk split.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Can a chunk with 1 shard key value be split?

A

no

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What happens when you shard a collection?

A
  • Distributing a single dataset across multiple databases, which then allows for storage on multiple machines
  • Larger dataset -> smaller chunks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does MongoDB partition a collection?

A

MongoDB partitions a collection of documents based on the shard key

  • Data is divided into chunks and chunks are allocated to each server.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why should we choose a shard key carefully?

A
  • shard key has a direct impact on the cluster’s performance
  • suboptimal shard key performance or scaling issues due to uneven chunk distribution
  • You can always change your data distribution strategy by changing your shard key.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the benefits of sharding a collection?

A

Can now handle more requests
Horizontal-scaling or scale out

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Differentiate

vertical vs horizontal scaling

A

Vertical: increasing power of a single machine
Horizontal: distributing across diff machines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Ranged/Dynamic Sharding

A

takes a field/attribute as an input and creates predefined ranges

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

List

Key attributes of an effective shard key

2

A

high cardinality
well-distributed frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define

cardinality

A

number of possible values of that key

  • Ex. If a shard key only has 3 possible values, then there can only be 3 shards max
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define

frequency

A

distribution of the data along the possible values

  • Ex. If 95% of records occur with a single shard key value then, due to this hotspot, 95% of the records will be allocated to a single shard.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Hashing

A

set of fields determines the allocation of the record to a given shard.
* allows more even distribution across shards even when there is not a suitable shard key
* no lookup table needs to be maintained.

17
Q

List

Drawbacks of hashing

A
  1. query operations for multiple records are more likely to get distributed across multiple shards. Whereas ranged sharding reflects the natural structure of the data across shards, hashed sharding typically disregards the meaning of the data. This is reflected in increased broadcast operation occurrence.
  2. expensive
18
Q

LIst

Advantages - Sharded & replicated cluster

3

A

Increased read/write throughput
Increased storage capacity
High availability

19
Q

List

Disadvantages - Sharded & replicated cluster

3

A

Query Overhead
* Additional latency on every operation, need to query each shard and merge results

Complexity of administration
* Increased upkeep and maintenance

Increased infrastructure costs
* Additional compute and machines

20
Q

List

Alternatives to Sharded & Replicated Cluster

2

A

Vertical Scaling
Replication

21
Q

Define

MongoDB replication

A

process of creating a copy of the same data set in more than one MongoDB server.

  • One drawback of all shards having secondary server for each shard is the cost.