All Flashcards

Question

How can we calculate the number of partitions in a DDB ?

Answer 1

``` P = max(Pt, Ps,Pp) where: Pt = roundUp[(RCUs/3000) + (WCUs/1000)] Ps = roundUp[(Storage required in gb / 10gb)] Pp = the previous max. number of partitions the table has ever had (does not change after altering table storage/throughput) ```

Answer 2

Each partition in a DDB holds 10GB, and so we have a new partition for each 10gbs. However, since our throughput is still low, it is shared between each partition, and so the throughput for each partition can begin to plummet (this would mean throughput would need to be scaled with size even if the total throughput is not necessary - not very cost efficient). This should be resolved by extracting the active data into a new table (active means it will require a higher throughput), and less frequently needed data can either be archived or moved to other tables.

Answer 3

Use as many unique values for partition keys as possible Segregate 'hot' and 'cold' data into separate tables (e.g. student attendance data, where the latest data is most likely to be accessed)

Answer 4

Since RCUs and WCUs are equally distributed among partitions, we want an equal access-requirement across partitions (avoid 'hot' partitions)

Answer 5

Filters are applied after the entire read, and so more data is read (therefore more RCUs) than is necessary.

Answer 6

Consider creating a table replica with different LSIs/GSIs

Answer 7

Simple partition key: 2kb Composite: 1kb Sort key: 1kb

Answer 8

An application's query access patterns should be analysed to see what the most used indexes should be

Answer 9

Splitting a partition into further partitions, allowing multiple writes to a partition via its shard. This can be used to deal with hot partitions.

Answer 10

Sort keys/composite keys should be used for: - Large item sizes - If querying multiple items within a partition key is required Set types should be used for: - Small item sizes - If querying individual item attributes in sets is not needed

Answer 11

10 simultaneous requests (CRUD)

Answer 12

100 items (or up to 16 MB)

Answer 13

25 (or up to 16 MB)

Answer 14

Error code - error in request, authentication failure or missing required parameters. Will normally contain an error message/stack trace.

Answer 15

Error code - 500 is server side error, 503 is service unavailable

Answer 16

AWS automatically retries request that receive this exception via a mechanism called Exponential Backoff (the request is retried until successful with an exponentially increasing time gap between attempts)

Answer 17

An automatic response to non-uniform workloads/increased throughputs. This is not to be relied up, and is a short term solution (along with the Burst Capacity). Adaptive Capacity can take from 5-30 mins to kick in.

Answer 18

This limit is composed of both the attribute values and the attribute names of an item - can be imagined that an item is composed of a JSON structure of names and values.

Answer 19

Composite keys (i.e. partition and sort) lead to a non-uniform workload when retrieving data, since all the data being requested falls under a single partition.

Answer 20

Use - when the applications requires strongly consistent reads. Should be avoided in general since LSIs share the same physical partition space that is used by the table - more indexes reduces the available storage size for the table.

Answer 21

We provide the min and max capacity throughput with a target utilization percentage. AWS DDB will then auto scale with demand to average the target utilization between the capacity boundaries provided.

Answer 22

At first, DDB will use some Burst Capacity to manage the throughput. However, if this increase is sustained, it will scale up the throughput capacity.

Answer 23

Cost-savings: most results should be processed by DAX, therefore many reads will not affect the table throughput. Microsecond latency: DAX acts as a cache to provide faster response times. Prevents Hot Partitions: As a cache, it naturally provides the more frequently accessed data without needing table reads.

Answer 24

Only provide eventual consistency; not useful for write-heavy applications.

Answer 25

Query cache stores the results of query and scan operations. Updates to DDB and the Item cache do not affect/invalidate the items of the Query cache, and so the Time To Live (TTL) of the Cache values should be chosen based on how long the application can tolerate inconsistent results.

Answer 26

SCRs bypass DAX entirely and read straight from DDB.

Answer 27

A 24 hour log of all write operations to a table.

Answer 28

We can set a TTL on each item, and then associate a Lambda via a Trigger to watch for these deletes. The Lambda can copy over the delete item data to a new table with a lower throughput capacity.

Answer 29

Global tables require participating tables to be empty at the time of adding the tables to the global tables pool, they must have only one replica per region, they must have the same table name and keys across regions, and all streams must be enabled with new and old images. It is recommended to use identical settings for table and indexes across regions (e.g. throughput capacity settings, GSIs).

Answer 30

CloudTrail - this will provide more in-depth information about API calls without needing to be set up manually (such as with CloudWatch Logs).

Answer 31

AWS Data Pipeline - can export as DDB JSON structure and reimport. The exported data goes to an S3 bucket (the export can also be scheduled to occur periodically).

Answer 32

AWS Redshift

Answer 33

AWS Cloudsearch - allows up to upload documents/items from a DDB table and perform full text searches (can be used for locating items with key words for example).

All Flashcards

(58 cards)