DynamoDB Flashcards
With DynamoDB you don’t have to worry about
Hardware provisioning, setup and configuration, replication, software patching, or cluster scaling.
DynamoDB is a _______ database.
NoSQL
How much amount of data can DynamoDB table store?
Any amount of data.
Two types of backup
- On-demand
2. Point-in-time recovery
With point-in-time recovery, how many days can you go back in time
35 days with per second granularity.
How is data made highly available and durable?
All of your data is stored on solid-state disks (SSDs) and is automatically replicated across multiple Availability Zones in an AWS Region, providing built-in high availability and data durability.
DynamoDB table terminology
Tables - items and attributes.
What uniquely identifies each item?
Primary key
_______ provides more flexible querying.
Secondary Index
Which attributes are schemaless in a DD table
Except Primary keys
DynamoDB supports nested attributes up to __ levels deep.
32
Types of primary keys
- Partition keys
2. Partition keys and sort key
A simple primary key, composed of one attribute is known as ___________________
Partition key
What is a composite key?
Composite key is made of two attributes - Partition key and sort key.
DynamoDB uses the partition key value as input to an _____________
internal hash function.
The output from the hash function determines the _______ in which the item will be stored.
partition
All items with the _________ value are stored together, in sorted order by _________.
same partition key, sort key value
The partition key of an item is also known as its ________
hash attribute
The sort key of an item is also known as its________
range attribute.
Each primary key attribute must be a ______________
scalar (meaning that it can hold only a single value)
The only data types allowed for primary key attributes are _____________________
string, number, or binary.
What is secondary index?
Secondary indexes are optional allows you to query data using alternate keys, in addition to the partition keys.
Types of secondary indexes.
- Global secondary index.
2. Local secondary index.
Secondary indexes quota
20 Global secondary indexes and 5 local secondary indexes.
How do you update indexes once tables are updated?
DD maintains indexes automatically.
What are DD streams?
DynamoDB Streams is an optional feature that captures data modification events in DynamoDB tables.
How does events appear in DD streams?
The data about DD events appear in the stream in near-real time, and in the order that the events occurred.
Each event is represented by a _________
stream record.
If you enable a stream on a table, DynamoDB Streams writes a stream record whenever _________ events occurs:
- New item is added
- An item is updated
- An item is deleted from table
Each stream record also contains the ______, _______ and __________
Name of the table, the event timestamp, and other metadata.
Stream records have a lifetime of _________ after that, they are automatically removed from the stream.
24 hours;
Table names and index names must be between____ and _____ characters long, and can contain only the following characters:
3 and 255; a-z; A-Z, 0-9, _, -, .
Attribute names must be at least ___characters long, but no greater than ____ long.
one; 64 KB
These attribute names must be no greater than 255 characters long -
- Secondary index partition key names.
2. Secondary index sort key names.
The minimum length of a string can be zero, if the attribute is not used as a key for an index or table, and is constrained by the maximum DynamoDB item size limit of _____
400 KB.
When your application writes data to a DynamoDB table and receives an _____________
HTTP 200 response (OK)
Data consistency
The data is eventually consistent across all storage locations, usually within one second or less.
DynamoDB supports ______________ and _________ reads.
eventually consistent and strongly consistent
Default read type
eventually consistent
Disadvantages of strongly consistent reads
- A strongly consistent read might not be available if there is a network delay or outage. In this case, DynamoDB may return a server error (HTTP 500).
- Strongly consistent reads may have higher latency than eventually consistent reads.
- Strongly consistent reads are not supported on global secondary indexes.
- Strongly consistent reads use more throughput capacity than eventually consistent reads.
DynamoDB uses __________, unless you specify otherwise
eventually consistent reads
What are the types of read/write capacity modes
- On-demand
2. Provisioned (default, free-tier eligible)
What is the purpose of capacity modes
Capacity modes decide how you are charged for read/write throughput and how you manage capacity.
How do you allocate capacity modes for LSIs?
LSIs inherit the capacity mode from the base table.
How to serve requests without capacity planning?
With on-demand capacity mode
How on-demand capacity mode charges for DD?
On-demand capacity mode offers pay-per-request for reads/writes.
How on-demand mode works?
When enabled, on-demand accommodates the workload as they ramp up and down.
DD tables that uses on-demand mode offers:
Single-digit millisecond latency, SLA commitment and security that DD already offers.
When is on-demand capacity mode good?
- You have new tables with unknown workloads
- You have unpredictable application traffic.
- You prefer the ease of paying for only what you use.
Default throughput on table, account and indexes for on-demand DD tables
40k read request units and 40k write request units per table. Per account and index quotas are not applicable for on-demand
Can you make a table on-demand once it is created?
Yes. You can enable on-demand mode either using create or update commands.
How often can you switch between capacity modes?
You can switch between read/write capacity modes once for every 24 hours.
How much read/write throughput do you specify for on-demand?
You don’t have to specify the read/write throughput for DD
How DD charges for reads for on-demand
For reads, upto 4KB data
- 1 RRU for one strong consistent read
- 1 RRU for one eventually consistent read
- 2 RRU for one transactional read.
- For items more than 4KB data, more RRUs are required.
How DD charges for writes for on-demand mode?
For writes, upto 1KB
- 1 WRU per 1KB write
- 2 WRU per transactional write upto 1KB
- For items more than 1KB, additional WRUs are required.
How does DD on-demand accommodate to previous peak traffic?
Given a new peak traffic volume, DD immediately accommodates double peak traffic volume. making it the new traffic peak and the former - previous traffic peak.
Given when on-demand is enabled, DD scales up and down as per the traffic peaks, does it experience throttling?
Throttling can occur if you exceed the double your previous peak within 30 minutes.
What are the previous peak settings for a DD table with newly created table with on-demand capacity mode?
The previous peak is 2,000 write request units or 6,000 read request units. You can drive up to double the previous peak immediately, which enables newly created on-demand tables to serve up to 4,000 write request units or 12,000 read request units, or any linear combination of the two.
What are the previous peak settings for a DD table with a updated table with on-demand capacity mode?
The previous peak is half the maximum write capacity units and read capacity units provisioned since the table was created, or the settings for a newly created table with on-demand capacity mode, whichever is higher. In other words, your table will deliver at least as much throughput as it did prior to switching to on-demand capacity mode.
Table Behavior while Switching Read/Write Capacity Mode
When you switch a table from provisioned capacity mode to on-demand capacity mode, DynamoDB makes several changes to the structure of your table and partitions. This process can take several minutes. During the switching period, your table delivers throughput that is consistent with the previously provisioned write capacity unit and read capacity unit amounts. When switching from on-demand capacity mode back to provisioned capacity mode, your table delivers throughput consistent with the previous peak reached when the table was set to on-demand capacity mode.
What is the throughput for provision mode
Specified through the number of reads and writes per second that you require for your application.
When is Provisioned mode is a good option
- You have predictable traffic
- You run application whose traffic ramps up gradually or traffic is consistent
- You can forecast capacity requirements to control costs.
How DD charges for writes for provisioned mode?
For writes, upto 1KB
- 1 WCU per 1KB write
- 2 WCU per transactional write upto 1KB
- For items more than 1KB, additional WRUs are required.
How DD charges for reads for provisioned mode?
For reads, upto 4KB data
- 1 RCU for one strong consistent read
- 1 RCU for one eventually consistent read
- 2 RCU for one transactional read.
- For items more than 4KB data, more RCUs are required.
When calling DescribeTable on an on-demand table, read capacity units and write capacity units are set to ___
0
For provisioned mode, __________ is the maximum amount of capacity that an application can consume from a table or index.
Provisioned throughput
For provisioned mode, when does an application experience throttling?
If your application exceeds your provisioned throughput capacity on a table or index, it is subject to request throttling.
When a request is throttled, it fails with an ________
HTTP 400 code
If you use the AWSManagement Console to create a table or a global secondary index, DynamoDB ________ is enabled by default.
auto scaling
With ____________, you pay a one-time upfront fee and commit to a minimum provisioned usage level over a period of time.
reserved capacity
With ___________, you realize significant cost savings compared to on-demand or provisioned throughput settings.
Reserved Capacity
Reserved capacity is not available in _____________
on-demand mode.
Any capacity that you provision in excess of your reserved capacity is billed at _________ rates
standard provisioned capacity.
Amazon DynamoDB stores data in ___________.
partitions
What is a partition?
A partition is an allocation of storage for a table, backed by solid state drives (SSDs) and automatically replicated across multiple Availability Zones within an AWS Region.
Partition management is handled entirely by ________
DynamoDB
DynamoDB allocates additional partitions to a table in the following situations:
- If you increase the table’s provisioned throughput settings beyond what the existing partitions can support.
- If an existing partition fills to capacity and more storage space is required.
Global secondary indexes in DynamoDB are composed of _________
partitions.
The data in a ______________ is stored separately from the data in its base table
global secondary index
DynamoDB stores and retrieves each item based on its ____________
partition key value.
To read an item from the table, you must specify the _________ for the item.
partition key value
In most cases, the DynamoDB response times can be measured in single-digit milliseconds. However, there are certain use cases that require response times in microseconds. For these use cases, __________ delivers fast response times for accessing_______
DynamoDB Accelerator (DAX) ; eventually consistent data.
What is DAX
DAX or DynamoDB Accelerator is a DynamoDB-compatible caching service that enables you to benefit from fast in-memory performance for demanding applications.
DAX supports _________ encryption.
server-side encryption and encryption in transit.
DAX supports encryption in transit by ensuring all requests and responses between your application and the cluster are encrypted by ___________, and connections to the cluster can be authenticated by ____________
transport level security (TLS); verification of a cluster x509 certificate.
DAX writes data to disk as part of _____________
propagating changes from the primary node to read replicas.
DAX provides access to __________ data from DynamoDB tables, with ____________
eventually consistent; microsecond latency.
A ________ DAX cluster can serve millions of requests per second.
Multi-AZ
DAX is ideal for
- Applications that require the fastest possible response time for reads.
- Applications that read a small number of items more frequently than others.
- Applications that are read-intensive, but are also cost-sensitive. With DynamoDB, you provision the number of reads per second that your application requires. If read activity increases, you can increase your tables’ provisioned read throughput (at an additional cost). Or, you can offload the activity from your application to a DAX cluster, and reduce the number of read capacity units that you need to purchase otherwise.
- Applications that require repeated reads against a large set of data.
DAX is not ideal for the following types of applications:
- Applications that require strongly consistent reads (or that cannot tolerate eventually consistent reads).
- Applications that do not require microsecond response times for reads, or that do not need to offload repeated read activity from underlying tables.
- Applications that are write-intensive, or that do not perform much read activity.
- Applications that are already using a different caching solution with DynamoDB, and are using their own client-side logic for working with that caching solution.
DAX supports applications written in ________, using AWS-provided clients for those programming languages.
Go, Java, Node.js, Python, and .NET
DAX is only available for the EC2_______(There is no support for the ____________.)
EC2-VPC platform; EC2-Classic platform
What is memory exhaustion in the DAX cluster?
DAX clusters maintain metadata about the attribute names of items they store. That metadata is maintained indefinitely (even after the item has expired or been evicted from the cache). Applications that use an unbounded number of attribute names can, over time, cause memory exhaustion in the DAX cluster. This limitation applies only to top-level attribute names, not nested attribute names. (Example: When key value is timestamp it is a pbm)
Amazon DynamoDB Accelerator (DAX) is designed to run within an _________ environment.
Amazon Virtual Private Cloud (Amazon VPC)
You can launch a DAX cluster in your virtual network and control access to the cluster by using __________.
Amazon VPC security groups
To create a DAX cluster, you use the _________. Unless you specify otherwise, your DAX cluster runs within your _______.
AWS Management Console; default VPC
To run your application, you launch an Amazon EC2 instance into your Amazon VPC. You then deploy your _________ on the EC2 instance.
application (with the DAX client)
How are requests handled in DAX?
At runtime, the DAX client directs all of your application’s DynamoDB API requests to the DAX cluster. If DAX can process one of these API requests directly, it does so. Otherwise, it passes the request through to DynamoDB.
How DAX Processes Requests
A DAX cluster consists of one or more nodes. Each node runs its own instance of the DAX caching software. One of the nodes serves as the primary node for the cluster. Additional nodes (if present) serve as read replicas.
Your application can access DAX by specifying the endpoint for the DAX cluster. The DAX client software works with the cluster endpoint to perform intelligent load balancing and routing.
If the request specifies _________, it tries to read the item from DAX:
eventually consistent reads (the default behavior)
If DAX has the item available called as __________, DAX returns the item to the application without accessing DynamoDB.
a cache hit
If DAX does not have the item available called as _______, DAX passes the request through to DynamoDB. When it receives the response from DynamoDB, DAX returns the results to the application. and __________________
a cache miss; it also writes the results to the cache on the primary node.
If there are any read replicas in the cluster, _______ automatically keeps the replicas in sync with the ______
DAX ; primary node.
What results from DynamoDB are not cached in DAX?
If the request specifies strongly consistent reads, DAX passes the request through to DynamoDB. The results from DynamoDB are not cached in DAX. Instead, they are simply returned to the application.
DAX does not recognize any DynamoDB operations for ___________
managing tables
When is Throttling Exception received?
If the number of requests sent to DAX exceeds the capacity of a node, DAX limits the rate at which it accepts additional requests by returning a ThrottlingException. DAX continuously evaluates your CPU utilization to determine the volume of requests it can process while maintaining a healthy cluster state.
You can monitor the ThrottledRequestCount metric that DAX publishes to ________. If you see these exceptions regularly, you should consider ______
Amazon CloudWatch.; scaling up your cluster.
DAX maintains an _______ to store the results from GetItem and BatchGetItem operations.
item cache
The items in the cache represent __________ from DynamoDB, and are stored by their ________values.
eventually consistent data; primary key
The item cache has a ________, which is 5 minutes by default.
Time to Live (TTL) setting
What is Time to Live (TTL) setting in DAX?
DAX assigns a timestamp to every item that it writes to the item cache. An item expires if it has remained in the cache for longer than the TTL setting.
If you issue a GetItem request on an expired item, this is considered a _________,, and DAX sends the ________ request to DynamoDB.
cache miss; GetItem
You can specify the TTL setting for the item/query cache when you ___________
create a new DAX cluster
DAX also maintains a _________ list for the item cache.
least recently used (LRU)
_______ tracks when an item was first written to the cache, and when the item was last read from the cache.
The LRU list
If the________ becomes full, ________ evicts older items (even if they haven’t expired yet) to make room for new items.
item cache; DAX
_________ is always enabled for the item cache and is ___________
The LRU algorithm; not user-configurable.
If you specify zero as the ___________, items in the item cache will only be refreshed due to an __________
item cache TTL setting; LRU evection or a “write-through” operation.
DAX also maintains a __________ to store the results from Query and Scan operations.
query cache
The items in this query cache represent _____________
result sets from queries and scans on DynamoDB tables.
DAX also maintains an _________ list for the query cache
LRU
The ________ tracks when a result set was first written to the cache, and when the result was last read from the cache
LRU List
If you specify ___________, the query response will not be cached.
zero as the query cache TTL setting
A ______ is the smallest building block of a DAX cluster.
node
Each node runs ____________ and _____________
an instance of the DAX software, and maintains a single replica of the cached data.
You can scale your DAX cluster by
- By adding more nodes to the cluster. This increases the overall read throughput of the cluster.
- By using a larger node type. Larger node types provide more capacity and can increase throughput. (You must create a new cluster with the new node type.)
A ________ is a logical grouping of one or more nodes that DAX manages as a ________
cluster; unit
One of the nodes in the cluster is designated as the __________ and the other nodes (if any) are ___________
primary node,; read replicas.
The primary node is responsible for
- Fulfilling application requests for cached data.
- Handling write operations to DynamoDB.
- Evicting data from the cache according to the cluster’s eviction policy.
Read replicas are responsible for
- Fulfilling application requests for cached data.
2. Evicting data from the cache according to the cluster’s eviction policy.
However, unlike the _________, ___________ don’t write to DynamoDB.
- primary node
2. read replicas
Read replicas additional purposes:
- Scalability
2. High availability
For maximum fault tolerance, you should deploy read replicas in ____________
separate Availability Zones.
A DAX cluster in an AWS Region can interact with DynamoDB tables that are in the _______ Region.
same
What are parameter groups?
Parameter groups are used to manage runtime settings for DAX clusters.
___________ ensures that all the nodes in that cluster are configured in exactly the same way.
Parameter groups
A ____________ acts as a virtual firewall for your VPC, allowing you to control inbound and outbound network traffic.
security group
When you launch a cluster in your VPC, you add an
_________ to your security group to allow_________ traffic.
ingress rule; incoming network
The ingress rule specifies the________ for your cluster.
protocol (TCP) and port number (8111)
The applications that are running within your VPC can access the DAX cluster only after ____________
Adding ingress rule to security groups.
Every DAX cluster provides a __________ for use by your application.
cluster endpoint
Usage of cluster end point
By accessing the cluster using its endpoint, your application does not need to know the hostnames and port numbers of individual nodes in the cluster. Your application automatically “knows” all the nodes in the cluster, even if you add or remove read replicas.
Your application can access a node directly by using its _________. However, we recommend that you treat the DAX cluster as a single unit and access it using the___________ instead.
node endpoint; cluster endpoint
Access to DAX cluster nodes is restricted to ___________. You can use ______ to grant cluster access from Amazon EC2 instances running on specific subnets.
applications running on Amazon EC2 instances within an Amazon VPC environment; subnet groups
What are Events in DAX
DAX records significant events within your clusters, such as management of nodes,
You can access events using the _________________
AWS Management Console or the DescribeEvents action in the DAX management API.
After you create your DAX cluster, you can access it from an ____________ running in the ___________.
Amazon EC2 instance; same VPC
For your DAX cluster to access DynamoDB tables on your behalf, you must create a__________
service role.
Amazon DynamoDB Accelerator (DAX) is a ________ caching service that is designed to simplify the process of ________
write-through; adding a cache to DynamoDB tables.
In many use cases, the way that your application uses DAX affects the _____________________
consistency of data within the DAX cluster, and the consistency of data between DAX and DynamoDB.
To achieve high availability for your application, we recommend that you provision your DAX cluster with at least _________. Then place those nodes in ______________
three nodes; multiple Availability Zones within a Region.
If you are building an application that uses DAX, that application should be designed so that it can tolerate _________
eventually consistent data.
Every DAX cluster has two distinct caches—____________
an item cache and a query cache
DAX caches the results from _________- requests in its query cache.
Query and Scan
DAX does not invalidate Query or Scan result sets based on _______________
updates to individual items.
The PutItem operation is only reflected in the DAX query cache when the __________
TTL for the Query expires.
To perform a strongly consistent GetItem, BatchGetItem, Query, or Scan request, you set the __________ parameter to true.
ConsistentRead
DAX can’t serve ___________ reads by itself because ___________
strongly consistent; it’s not tightly coupled to DynamoDB.
Any subsequent strongly consistent reads would have to be ___________
passed through to DynamoDB.
DAX handles __________requests the same way it handles strongly consistent reads.
TransactGetItems
DAX passes all TransactGetItems requests to DynamoDB. When it receives a response from DynamoDB, DAX returns the results to the client, but it ________________
doesn’t cache the results.
What is Negative Cache?
A negative cache entry occurs when DAX can’t find requested items in an underlying DynamoDB table. Instead of generating an error, DAX caches an empty result and returns that result to the user.
DAX supports negative cache entries in both the ________________
item cache and the query cache.
A negative cache entry remains in the DAX item cache until ________________
its item TTL has expired,
its LRU is invoked
the item is modified using PutItem, UpdateItem, or DeleteItem.
For the DAX management APIs, you can’t scope API actions to _______. ___________ This is different from DAX data plane API operations, such as GetItem, Query, and Scan. Data plane operations are exposed through the DAX client, and those operations can be scoped to _______
a specific resource
The Resource element must be set to “*”.
specific resources.
Establish a ______ for normal DAX performance in your environment, by measuring performance at various times and under different load conditions.
baseline
To establish a baseline, you should, at a minimum, monitor the following items both during load testing and in production:
- CPU utilization and throttled requests, so that you can determine whether you might need to use a larger node type in your cluster. The CPU utilization of your cluster is available through the CPUUtilization CloudWatch metric.
- Operation latency (as measured on the client side) should remain consistently within your application’s latency requirements.
- Error rates should remain low, as seen from the ErrorRequestCount, FaultRequestCount, and FailedRequestCount CloudWatch metrics.
- Estimated database size and evicted size, so that you can determine whether the cluster’s node type has sufficient memory to hold your working set
- Client connections, so that you can monitor for any unexplained spikes in connections to the cluster.