All Flashcards

1
Q

What is the difference between Scalar and non-scalar data type?

A

Scalar is a single value (e.g. string, number, boolean). Non-scalar is a set of values (e.g. a set of numbers or strings).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a document data type?

A

A complex structure with nested attributes (e.g. list, map)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is automatic synchronous replication, and how is it achieved?

A

DDB replicates your data across at least 3 facilities within a region an near real-time speeds. This allows for durability of data - the copies at different facilities act as independent failure domains.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What two types of read does DDB offer?

A

Strong Consistency reads and Eventual Consistency reads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Strong Consistency read?

A

Provides the most up-to-date data. This type of read must be requested explicity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an Eventual Consistency read?

A

Provides data that may or may not reflect the latest copy of data (it is accessed from any of the data copies). This is the default consistency for all operations, and is 50% cheaper.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the main attributes of RCUs and WCUs

A

Read Capacity Units: 1 RCU is equal to either 1 strongly consistent table read/sec or 2 eventually consistent table reads/sec. Read in 4KB blocks.
Write Capacity Units: 1 WCU is equal to 1 table write/sec. Written in blocks of 1KB.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is burst capacity?

A

If read/write load overloads provisioned RCU/WCU capacity (during a burst or spike of activity), DDB provides burst capacity - up to 5 minutes of unused read/write capacity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the maximum capacity a table partition can support?

A

1000 WCUs or 3000 RCUs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is On-Demand Capacity mode, and when does it work best?

A

DynamoDB charges you for the data reads and writes your application performs on your tables - you do not need to specify this throughput in advance.
On-demand capacity mode might be best if you:

Create new tables with unknown workloads.

Have unpredictable application traffic.

Prefer the ease of paying for only what you use.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a partition?

A

A block of memory located by DDB for storage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How much data can a partition hold, and what is its optimum throughput?

A

10gb, about 3000 RCUs and 1000 WCUs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How are partitions managed?

A

DDB manages partitions automatically. Additional partitions are provided for a table if the data storage or capacity is exceeded.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can we remove a partition allocated to a table?

A

Once a partition is allocated to a table, it will not be re-allocated if you scale down a table’s capacity. This means we must be careful when bumping up capacity of a table in the short term, since removing capacity will result in a table with low throughput due to multiple unnecessary partitions.
The only way to improve the throughput at this point would be to simply increase the table throughput (which would also increase costs), or to recreate the entire table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a partition key?

A

A partition key is also known as a hash key. It can be the whole primary key or part of a composite primary key (along with a sort or range key) to a table. The purpose of the partition key (in DDB’s eyes) is to identify the exact partition the table’s data is stored in.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are Local Secondary Indexes?

A

LSIs act as an alternative sort key to data, but use the same partition key. They must be decided upon table creation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are Global Secondary Indexes?

A

An index with a partition key and a sort key that can be different form those on the base table. A GSI is stored in its own partition space away from the base table, and as such can be created whenever desired.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What types of reads can you perform with LSIs and GSIs?

A

LSIs - Strongly and Eventually. GSIs - Only Eventually.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Why should we avoid using Scan operations?

A

They operate across all partitions of a table, and so use up a lot of RCUs (resulting in a potentially very expensive operation).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

A conditional write to a DDB is said to be Idempotent. What does this mean?

A

Idempotent - an operation that can be applied multiple times without changing the result beyond the initial application. We can make the same request for a conditional write multiple times - only the first request can affect a change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Does a ConditionalCheckFailedException consume WCUs?

A

Yes - despite there being no write operation, WCUs are still consumed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How much data can DDB return per request?

A

1MB can be returned per read/scan.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the LastEvaluatedKey?

A

The set of index attributes of the next item up to which the response was returned.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How can we use LastEvaluatedKey?

A

We can use this data from one query to get the next set of data - we pass it in as the parameter ExclusiveStartKey.

25
Q

How can we calculate the number of partitions in a DDB ?

A
P = max(Pt, Ps,Pp) where:
Pt = roundUp[(RCUs/3000) + (WCUs/1000)]
Ps = roundUp[(Storage required in gb / 10gb)]
Pp = the previous max. number of partitions the table has ever had (does not change after altering table storage/throughput)
26
Q

You have a table with a large amount of data (100s of GBs) but a low throughput (standard of 3000RCUs/1000WCUs). What are the consequences of this setup, and how should it be fixed?

A

Each partition in a DDB holds 10GB, and so we have a new partition for each 10gbs. However, since our throughput is still low, it is shared between each partition, and so the throughput for each partition can begin to plummet (this would mean throughput would need to be scaled with size even if the total throughput is not necessary - not very cost efficient).
This should be resolved by extracting the active data into a new table (active means it will require a higher throughput), and less frequently needed data can either be archived or moved to other tables.

27
Q

How to we encourage uniform data distribution across partitions?

A

Use as many unique values for partition keys as possible

Segregate ‘hot’ and ‘cold’ data into separate tables (e.g. student attendance data, where the latest data is most likely to be accessed)

28
Q

Why is it important to have uniform data distribution across partitions

A

Since RCUs and WCUs are equally distributed among partitions, we want an equal access-requirement across partitions (avoid ‘hot’ partitions)

29
Q

Why are filters not as cost-efficient as use of a direct query?

A

Filters are applied after the entire read, and so more data is read (therefore more RCUs) than is necessary.

30
Q

What should be done if you run out of LSIs/GSIs?

A

Consider creating a table replica with different LSIs/GSIs

31
Q

What is the maximum size of a partition key?

A

Simple partition key: 2kb
Composite: 1kb
Sort key: 1kb

32
Q

What is the maximum size of a item in a table?

A

400kb

33
Q

How should you decide on what indexes to use in a table?

A

An application’s query access patterns should be analysed to see what the most used indexes should be

34
Q

What is Write Sharding?

A

Splitting a partition into further partitions, allowing multiple writes to a partition via its shard. This can be used to deal with hot partitions.

35
Q

When should you use sort keys/composite keys vs. set types?

A

Sort keys/composite keys should be used for:
- Large item sizes
- If querying multiple items within a partition key is required
Set types should be used for:
- Small item sizes
- If querying individual item attributes in sets is not needed

36
Q

What is the maximum number of table-level operations occurring at once?

A

10 simultaneous requests (CRUD)

37
Q

What is the maximum number of items returned per BatchGetItem request?

A

100 items (or up to 16 MB)

38
Q

What is the maximum number of PutItem or DeleteItem requests per BatchWriteItem request?

A

25 (or up to 16 MB)

39
Q

What is an HTTP 400?

A

Error code - error in request, authentication failure or missing required parameters. Will normally contain an error message/stack trace.

40
Q

What is an HTTP 5XX?

A

Error code - 500 is server side error, 503 is service unavailable

41
Q

What happens when a Provisioned Throughput Exceeded exception occurs?

A

AWS automatically retries request that receive this exception via a mechanism called Exponential Backoff (the request is retried until successful with an exponentially increasing time gap between attempts)

42
Q

What is DDB Adaptive Capacity?

A

An automatic response to non-uniform workloads/increased throughputs. This is not to be relied up, and is a short term solution (along with the Burst Capacity). Adaptive Capacity can take from 5-30 mins to kick in.

43
Q

What makes up the 400kb item limit?

A

This limit is composed of both the attribute values and the attribute names of an item - can be imagined that an item is composed of a JSON structure of names and values.

44
Q

When splitting a large attribute across multiple items, why should you use a simple Partition key over a composite one?

A

Composite keys (i.e. partition and sort) lead to a non-uniform workload when retrieving data, since all the data being requested falls under a single partition.

45
Q

Why should you avoid using LSIs where possible, and what situation should you use LSIs?

A

Use - when the applications requires strongly consistent reads.
Should be avoided in general since LSIs share the same physical partition space that is used by the table - more indexes reduces the available storage size for the table.

46
Q

How does auto scaling work?

A

We provide the min and max capacity throughput with a target utilization percentage. AWS DDB will then auto scale with demand to average the target utilization between the capacity boundaries provided.

47
Q

What is DDB’s response to a sustained increase in throughput?

A

At first, DDB will use some Burst Capacity to manage the throughput. However, if this increase is sustained, it will scale up the throughput capacity.

48
Q

What are the benefits of using DAX?

A

Cost-savings: most results should be processed by DAX, therefore many reads will not affect the table throughput.
Microsecond latency: DAX acts as a cache to provide faster response times.
Prevents Hot Partitions: As a cache, it naturally provides the more frequently accessed data without needing table reads.

49
Q

What are the limitations of DAX?

A

Only provide eventual consistency; not useful for write-heavy applications.

50
Q

What does the Query cache do, and how do updates to the Item cache affect this?

A

Query cache stores the results of query and scan operations. Updates to DDB and the Item cache do not affect/invalidate the items of the Query cache, and so the Time To Live (TTL) of the Cache values should be chosen based on how long the application can tolerate inconsistent results.

51
Q

How does DAX manage Strongly Consistent Reads?

A

SCRs bypass DAX entirely and read straight from DDB.

52
Q

What information does a DDB stream contain?

A

A 24 hour log of all write operations to a table.

53
Q

How can we use TTL to manage hot and cold partitions in time series data?

A

We can set a TTL on each item, and then associate a Lambda via a Trigger to watch for these deletes. The Lambda can copy over the delete item data to a new table with a lower throughput capacity.

54
Q

How can implement cross-region replication with global tables?

A

Global tables require participating tables to be empty at the time of adding the tables to the global tables pool, they must have only one replica per region, they must have the same table name and keys across regions, and all streams must be enabled with new and old images.
It is recommended to use identical settings for table and indexes across regions (e.g. throughput capacity settings, GSIs).

55
Q

What feature can we use to log API calls to DDB?

A

CloudTrail - this will provide more in-depth information about API calls without needing to be set up manually (such as with CloudWatch Logs).

56
Q

What feature can we use to import and export data to/from DDB, and where does the export go?

A

AWS Data Pipeline - can export as DDB JSON structure and reimport. The exported data goes to an S3 bucket (the export can also be scheduled to occur periodically).

57
Q

How can we migrate data to/from SQL workbench?

A

AWS Redshift

58
Q

Scan operations on a table can be expensive. What other feature could we use in place of Scans?

A

AWS Cloudsearch - allows up to upload documents/items from a DDB table and perform full text searches (can be used for locating items with key words for example).