Database Specialty - DocumentDB Flashcards
1
Q
Amazon DocumentDB – Overview
A
- Fully-managed (non-relational) document database for MongoDB workloads
- JSON documents (nested key-value pairs) stored in collections (≈ tables)
- Compatible w/ majority of MongoDB applications, drivers, and tools
- High performance, scalability, and availability
- Support for flexible indexing, powerful ad-hoc queries, and analytics
- Storage and compute can scale independently
- Supports 15 low-latency read replicas (Multi-AZ)
- Auto scaling of storage from 10 GB to 64 TB
- Fault-tolerant and self-healing storage
- Automatic, continuous, incremental backups and PITR
2
Q
Document Database
A
- Stores JSON documents (semi- structured data)
- Key-value pairs can be nested
3
Q
Why document database?
A
- JSON is the de-facto format for data exchange
- DocumentDB makes it easy to insert, query, index, and perform aggregations over JSON data
- Store JSON output from APIs straight into DB and start analysing it
- flexible document model, data types, and indexing
- Add / remove indexes easily * Run ad hoc queries for operational and analytics
workloads - for known access patterns – use DynamoDB instead
4
Q
DocumentDB Architecture
A
- 6 copies of your data across 3 AZ (distributed design)
- Lock-free optimistic algorithm (quorum
model) - 4 copies out of 6 needed for writes (4/6
write quorum - data
considered durable when at least 4/6
copies acknowledge the write) - 3 copies out of 6 needed for reads (3/6
read quorum) - Self healing with peer-to-peer replication,
Storage is striped across 100s of volumes
- Lock-free optimistic algorithm (quorum
- One DocumentDB Instance takes writes (master)
- Compute nodes on replicas do not need to write/replicate (=improved read performance)
- Log-structured distributed storage layer – passes incremental log records from compute to storage layer (=faster)
- Master + up to 15 Read Replicas serve reads
- Data is continuously backed up to S3 in real time, using storage nodes (compute node performance is unaffected)
5
Q
DocumentDB Cluster
A
- Recommended to connect using the cluster endpoint in replica set mode (enables your SDK to auto-discover the cluster arrangement as instances get added or removed
from the cluster.
6
Q
DocumentDB Replication
A
- Up to 15 read replicas
- ASYNC replication
- Replicas share the same underlying
storage layer - Typically take 10s of milliseconds
(replication lag) - Minimal performance impact on the
primary due to replication process - Replicas double up as failover targets
(standby instance is not needed)
7
Q
DocumentDB HA failovers
A
- Failovers occur automatically
- A replica is automatically promoted to be the new primary during DR
- DocumentDB flips the CNAME of the DB
instance to point to the replica and promotes it - Failover to a replica typically takes 30 seconds (minimal downtime)
- Creating a new instance takes about 8
-10 minutes (post failover) - Failover to a new instance happens on a best-effort basis and can take longer
8
Q
DocumentDB Backup and Restore
A
- Supports automatic backups
- Continuously backs up your data to S3 for
PITR (max retention period of 35 days) - latest restorable time for a PITR can be up
to 5 mins in the past - The first backup is a full backup.
Subsequent backups are incremental - Take manual snapshots to retain beyond
35 days - Backup process does not impact cluster
performance
9
Q
DocumentDB Backup and Restore
A
- Can only restore to a new cluster
- Can restore an unencrypted snapshot to an
encrypted cluster (but not the other way
round) - To restore a cluster from an encrypted
snapshot, you must have access to the KMS
key - Can only share manual snapshots (can copy
and share automated ones) - Can’t share a snapshot encrypted using the
default KMS key of the a/c - Snapshots can be shared across accounts, but within the same region
10
Q
DocumentDB Scaling
A
- MongoDB sharding not supported (instead offers read replicas / vertical scaling / storage scaling)
- Vertical scaling (scale up / down) – by resizing instances
- Horizontal scaling (scale out / in) – by adding / removing up to 15 read replicas
- Can scale up a replica independently from other replicas (typically for analytical workloads)
- Automatic scaling storage – 10 GB to 64 TB (no manual intervention needed)
11
Q
DocumentDB Security – IAM & Network
A
- You use IAM to manage DocumentDB resources
- Supports MongoDB default auth SCRAM (Salted Challenge
Response Authentication Mechanism) for DB authentication - Supports built-in roles for DB users with RBAC (role-based access control)
- DocumentDB clusters are VPC-only (use private subnets)
- Clients (MongoDB shell) can run on EC2 in public subnets within VPC
- Can connect to your on-premises IT infra via VPN
12
Q
DocumentDB Security
– Encryption
A
- Encryption at rest – with AES-256
using KMS
* Applied to cluster data/replicas/
indexes/logs/backups / snapshots - Encryption in transit – using TLS
* To enable TLS, set tls parameter in the
cluster parameter group - To connect over TLS:
* Download the certificate (public key)
from AWS
* Pass the certificate key while connecting
to the cluster
13
Q
DocumentDB Pricing
A
- On-demand instances – pricing per second
with a 10-minute minimum - IOPS – per million IO requests
- Each DB page reads operation from the
storage volume counts as one IO (one page = 8KB) - Write IOs are counted in 4KB units.
- DB Storage – per GB per month
- Backups – per GB per month (backups up to
100% of your cluster’s data storage is free) - Data transfer – per GB
- Can temporarily stop compute instances for
up to 7 days
14
Q
DocumentDB Monitoring
A
- API calls logged with CloudTrail
- Common CloudWatch metrics
- CPU or RAM utilization – CPUUtilization /
FreeableMemory - IOPS metrics –VolumeReadIOPS /
VolumeWriteIOPS / WriteIOPS / ReadIOPS - Database connections –
DatabaseConnections - Network traffic – NetworkThroughput
- Storage volume consumption –
VolumeBytesUsed
- CPU or RAM utilization – CPUUtilization /
- Two types of logs can be published/exported to CloudWatch Logs
- Profiler logs
- Audit logs
15
Q
DocumentDB Profiler (profiler logs)
A
- Logs (into CloudWatch Logs) the details of ops performed
on your cluster - Helps identify slow operations and improve query
performance - Accessible from CloudWatch Logs
- To enable profiler:
- Set the parameters – profiler,
profiler_threshold_ms, and
profiler_sampling_rate - Enable Logs Exports for Audit logs by
modifying the instance - Both the steps above are mandatory
- Set the parameters – profiler,