Database Specialty - DocumentDB Flashcards

1
Q

Amazon DocumentDB – Overview

A
  • Fully-managed (non-relational) document database for MongoDB workloads
  • JSON documents (nested key-value pairs) stored in collections (≈ tables)
  • Compatible w/ majority of MongoDB applications, drivers, and tools
  • High performance, scalability, and availability
  • Support for flexible indexing, powerful ad-hoc queries, and analytics
  • Storage and compute can scale independently
  • Supports 15 low-latency read replicas (Multi-AZ)
  • Auto scaling of storage from 10 GB to 64 TB
  • Fault-tolerant and self-healing storage
  • Automatic, continuous, incremental backups and PITR
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Document Database

A
  • Stores JSON documents (semi- structured data)
  • Key-value pairs can be nested
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why document database?

A
  • JSON is the de-facto format for data exchange
  • DocumentDB makes it easy to insert, query, index, and perform aggregations over JSON data
  • Store JSON output from APIs straight into DB and start analysing it
  • flexible document model, data types, and indexing
  • Add / remove indexes easily * Run ad hoc queries for operational and analytics
    workloads
  • for known access patterns – use DynamoDB instead
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

DocumentDB Architecture

A
  • 6 copies of your data across 3 AZ (distributed design)
    • Lock-free optimistic algorithm (quorum
      model)
    • 4 copies out of 6 needed for writes (4/6
      write quorum - data
      considered durable when at least 4/6
      copies acknowledge the write)
    • 3 copies out of 6 needed for reads (3/6
      read quorum)
    • Self healing with peer-to-peer replication,
      Storage is striped across 100s of volumes
  • One DocumentDB Instance takes writes (master)
  • Compute nodes on replicas do not need to write/replicate (=improved read performance)
  • Log-structured distributed storage layer – passes incremental log records from compute to storage layer (=faster)
  • Master + up to 15 Read Replicas serve reads
  • Data is continuously backed up to S3 in real time, using storage nodes (compute node performance is unaffected)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

DocumentDB Cluster

A
  • Recommended to connect using the cluster endpoint in replica set mode (enables your SDK to auto-discover the cluster arrangement as instances get added or removed
    from the cluster.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

DocumentDB Replication

A
  • Up to 15 read replicas
  • ASYNC replication
  • Replicas share the same underlying
    storage layer
  • Typically take 10s of milliseconds
    (replication lag)
  • Minimal performance impact on the
    primary due to replication process
  • Replicas double up as failover targets
    (standby instance is not needed)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

DocumentDB HA failovers

A
  • Failovers occur automatically
  • A replica is automatically promoted to be the new primary during DR
  • DocumentDB flips the CNAME of the DB
    instance to point to the replica and promotes it
  • Failover to a replica typically takes 30 seconds (minimal downtime)
  • Creating a new instance takes about 8
    -10 minutes (post failover)
  • Failover to a new instance happens on a best-effort basis and can take longer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

DocumentDB Backup and Restore

A
  • Supports automatic backups
  • Continuously backs up your data to S3 for
    PITR (max retention period of 35 days)
  • latest restorable time for a PITR can be up
    to 5 mins in the past
  • The first backup is a full backup.
    Subsequent backups are incremental
  • Take manual snapshots to retain beyond
    35 days
  • Backup process does not impact cluster
    performance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

DocumentDB Backup and Restore

A
  • Can only restore to a new cluster
  • Can restore an unencrypted snapshot to an
    encrypted cluster (but not the other way
    round)
  • To restore a cluster from an encrypted
    snapshot, you must have access to the KMS
    key
  • Can only share manual snapshots (can copy
    and share automated ones)
  • Can’t share a snapshot encrypted using the
    default KMS key of the a/c
  • Snapshots can be shared across accounts, but within the same region
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

DocumentDB Scaling

A
  • MongoDB sharding not supported (instead offers read replicas / vertical scaling / storage scaling)
  • Vertical scaling (scale up / down) – by resizing instances
  • Horizontal scaling (scale out / in) – by adding / removing up to 15 read replicas
  • Can scale up a replica independently from other replicas (typically for analytical workloads)
  • Automatic scaling storage – 10 GB to 64 TB (no manual intervention needed)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

DocumentDB Security – IAM & Network

A
  • You use IAM to manage DocumentDB resources
  • Supports MongoDB default auth SCRAM (Salted Challenge
    Response Authentication Mechanism) for DB authentication
  • Supports built-in roles for DB users with RBAC (role-based access control)
  • DocumentDB clusters are VPC-only (use private subnets)
  • Clients (MongoDB shell) can run on EC2 in public subnets within VPC
  • Can connect to your on-premises IT infra via VPN
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

DocumentDB Security
– Encryption

A
  • Encryption at rest – with AES-256
    using KMS
    * Applied to cluster data/replicas/
    indexes/logs/backups / snapshots
  • Encryption in transit – using TLS
    * To enable TLS, set tls parameter in the
    cluster parameter group
  • To connect over TLS:
    * Download the certificate (public key)
    from AWS
    * Pass the certificate key while connecting
    to the cluster
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

DocumentDB Pricing

A
  • On-demand instances – pricing per second
    with a 10-minute minimum
  • IOPS – per million IO requests
  • Each DB page reads operation from the
    storage volume counts as one IO (one page = 8KB)
  • Write IOs are counted in 4KB units.
  • DB Storage – per GB per month
  • Backups – per GB per month (backups up to
    100% of your cluster’s data storage is free)
  • Data transfer – per GB
  • Can temporarily stop compute instances for
    up to 7 days
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

DocumentDB Monitoring

A
  • API calls logged with CloudTrail
  • Common CloudWatch metrics
    • CPU or RAM utilization – CPUUtilization /
      FreeableMemory
    • IOPS metrics –VolumeReadIOPS /
      VolumeWriteIOPS / WriteIOPS / ReadIOPS
    • Database connections –
      DatabaseConnections
    • Network traffic – NetworkThroughput
    • Storage volume consumption –
      VolumeBytesUsed
  • Two types of logs can be published/exported to CloudWatch Logs
    • Profiler logs
    • Audit logs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

DocumentDB Profiler (profiler logs)

A
  • Logs (into CloudWatch Logs) the details of ops performed
    on your cluster
  • Helps identify slow operations and improve query
    performance
  • Accessible from CloudWatch Logs
  • To enable profiler:
    • Set the parameters – profiler,
      profiler_threshold_ms, and
      profiler_sampling_rate
    • Enable Logs Exports for Audit logs by
      modifying the instance
    • Both the steps above are mandatory
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

DocumentDB audit logs

A
  • Records DDL statements, authentication, authorization, and user management events to CloudWatch Logs
  • Exports your cluster’s auditing records (JSON
    documents) to CloudWatch Logs
  • Accessible from CloudWatch Logs
  • To enable auditing:
    • Set parameter audit_logs=enabled
    • Enable Logs Exports for Audit logs by
      modifying the instance
    • Both the steps above are mandatory
17
Q

DocumentDB Performance Management

A
  • Use explain command to identify slow queries
    db.runCommand({explain: {<query>}})</query>
  • Can use db.adminCommand to find and terminate queries
  • Example – to terminate long running / blocked queries
    db.adminCommand({killOp: 1, op: <opid>});</opid>