Databases Flashcards
Does DynamoDB have query join capability?
No
Does DynamoDB support aggregations such as SUM or AVG?
No
How do NoSQL databases scale?
Horizontally
Instead of rows and columns, what is the lexicon used in Dynamo DB?
Items = Rows
Attributes = columns
What is the maximum size for an item in DynamoDB?
400KB
Using DynamoDB, what is the accepted solution to store BLOB data?
Store the object in S3 and store the metadata
What is provisioned mode in DynamoDB?
You specify the number of reads and writes and pay for the provisioned capacity regardless if it is all used.
Can you auto-scale using DynamoDB provisioned capacity mode?
No. You must use on-demand mode.
What is cheaper, DynamoDB provisioned or on-Demand mode?
Provisioned mode.
What can be used in DynamoDB provisioned capacity to temporarily exceed throughput?
Burst capacity. Once exceeded, a ProvisionedThroughputExceededException will be thrown.
When reaching a ProvisionedThroughputExceededException error in DynamoDB, what is the prferred solution?
An exponential backoff retry
In DynamodDB, how large is 1 WCU?
1 WCU = 1 write up to 1KB in size per second.
What are the two types of reads in DynamoDb?
Strongly consistent and eventually consistent reads.
What is an eventually consistent read in DynamoDB?
A read that has the potential to get stale data because the write has not replicated to all the servers on the backend yet.
What is a strongly consistent read in dynamoDB?
It returns data only after it has been fully replicated on the backend.
How do you set a strongly consistent read in DynamoDB?
Set the consistentRead parameter to TRUE in API calls.
How many RCUs is a strongly consistent read?
It is twice the cost of an eventually consistent read.
In DynamoDB, how large is 1RCU?
One strongly consistent read or two eventually consistent reads per second for items up to 4KB in size.
In DynamoDB, how are WCUs and RCUs spread?
evenly amongst all partitions
In DynamoDB, what is a projectedExpression?
It can be specified to only retrieve certain attributes.
Can a FilterExpression be used with key attributes in DynamoDB?
No, You cannot use this with HASH or Range Attributes.
In DynamoDB, what is the max value of data allowed to be returned in a Query API call?
1MB
How does the SCAN operation work in DynamoDB?
It loads the entire table and then filters out the data you want. This is inefficient.
How can you improve DynamoDB scan performance if you must use it?
Parallel Scans
Can you retry Items that fail in Batch operations in DynamoDB?
Yes
How many PutItem or DeleteItems can be present in 1 batchWriteItem API request
25 items or 16MB of data. Still has a 400KB per item maximum
What is PartiQL in DynamoDB?
It is a SQL compatible query language for dynamodDB. It can only handle CRUD. Joins are still not possible.
Are dynamoDB filters performed on the server or client side.
Client side.
What is an LSI in Dynamo DB?
It uses the same partition key as the base table, but you get an additional sort key.
Up to 5 LSI per table
Must be defined at table creation
What is an GSI in Dynamo DB?
It uses an alternative primary key.
Requires WCU and RCU
Can be added AFTER table creation
What happens in DynamoDB when the GSI is throttled?
The main table will also be throttled.
When it comes to WCU and RCU with LSI and GSI, what is the big difference?
GSI has provisioned capacity
LSI uses the RCU AND WCU of the main table
What is DynamoDB DAX?
Seamless in-memory cache for DynamoDB.. Solves HOT key problem.
How many DynamoDB DAX nodes can be in a cluster?
10
How many AZs should you use for production?
A minimum of three
When would you use Elasticache instead of DAX with DynamoDB?
When you have an aggregate cached result.
What is DynamoDB Streams?
It is an ordered stream of item-level modifications.
What AWS services can consume a DynamoDB stream?
Lambda, Kinesis Data Streams, Kinesis Client Library
What is the max retention for DynamoDB Streams?
24 hours
What are the DynamoDB stream types?
KEYS_ONLY - Key attributes of the modified item
NEW_IMAGE - Entire Item as it appears after it was modified
OLD_IMAGE -Entire Item as it appears before it was modified
NEW_AND_OLD_IMAGES - Both new and old images of the item
What is the max time an expired item is held in DynamoDB?
48 hours
Does DynamoDB have backup and restore capability?
Yes. It has PITR like RDS
What are DynamoDB Global Tables?
Multi-Region, Multi-Active, fully replicated tables.
What is DynamoDB Local?
Allows you to develop and test apps locally without accessing the web service.
Does DynamoDB support Federated Logins?
Yes.
Is RDS ACID compliant?
Yes. All database services are compliant
What is the maximum database volume size in Aurora?
128TB
What is the maximum amount of read replicas in Aurora?
15
Can Aurora backup to S3?
Yes, continuous backup to S3 is available.
Using Aurora, if you want automatic scaling; what version must you use for automatic scaling?
Aurora Serverless
What are the two types of LOCKS in RDS?
Shared locks - Allows reads and prevents writes
Exclusive Locks: prevent all reads and writes to a resource.`
What should the TTL on your DB instance DNS be to support failover?
30 seconds or less.
What is DocumentDB?
It is a NoSQL database similar to MongoDB. JSON Based.
In what increments does DocumentDB grow?
10GB Increments
What is Amazon MemoryDB for Redis?
It is a Redis compatible, durable, in-memory database service.
What is Amazon Keyspaces?
Managed Apache Cassandra NoSQL distributed database.
What language do you use to query Amazon Keyspaces?
CQL - Cassandra Query Language
What is Amazon Neptune?
A fully managed graph database.
Is Redshift geared for OLAP or OLTP?
OLAP
What connection types does Redshift support?
ODBC, JDBC
What are the types of nodes used in a Redshift Cluster?
Leader node and compute nodes. Leader nodes create execution plans and delegate the work to the compute nodes.
What is the maximum amount of compute nodes you can have in a Redshift Cluster?
128 compute nodes
Does each compute node in Redshift have its’ own compute, memory, and storage?
Yes. This is dependent on the type you choose though.
What node type would you use if your want to optimize for storage capacity in Redshift?
Dense Storage nodes. These use HDD volumes
What node type would you use if your want to optimize for compute in Redshift?
Dense compute nodes. These use SSD volumes
What sizes do Redshift nodes come in?
xlarge or 8xlarge
In Redshift, what are compute nodes divided into?
Node slices, these use a portion of the resources assigned to the compute node to perform a task.
What is Redshift Spectrum?
It allows you to query data in S3 (datalake) and allows you to join to your redshift tables.
What compression does Redshift support?
Gzip and Snappy
Why is RedShift so performant?
It uses MPP, massive parallel processing, columnar data storage, and column compression.
Where does Redshift backup to?
It replicates in the cluster, has automated snapshots, and replicates to S3
What happens to failed nodes in RedShift?
They are automatically replaced.
Do single node clusters support replication in Redshift?
No. There is nowhere to replicate to.
If you want multi-az Redshift support, what cluster type must you use?
You must use an RA3 cluster.
How does Redshift scale?
Vertically and horizontally. A new cluster is created while your old one is available for reads. The CNAME is flipped to the new cluster and data is moved in parallel to new compute nodes.
What are the Redshift Node Distribution types?
Auto - Based on the size of your data
Even - Distributed across slices in round-robin
Key - Rows are distributed based on a single column
All - The entire table is copied to every node
What are Sort Keys in RedShift?
They are like indexes and make for fast range queries.
What is a single sort key in RedShift?
A single column to sort the data
What is a compound sort key in RedShift?
Made of all columns in the sort key definition. Order is important. Default sort type.
What is a interleaved sort key in RedShift?
Gives equal weight to each sort key in the list.
What is the most efficient way to get data into RedShift?
use the COPY command. When using S3 it will need a manifest file.
When you want to move a RedShift table into S3, what command would you use?
The UNLOAD command.
What does enhanced VPC routing do in RedShift
When it is enabled, traffic uses the AWS backbone. When it is not enabled, it routes through the internet.
What is auto-copy in RedShift?
It automatically copies files loaded into S3 into RedShift
What is Aurora zero-etl?
It automatically replicates from Aurora to Redshift.
What is Redshift Streaming Ingestion?
It loads data from a Kinesis Data Stream or MSK
When you want to copy data that is already in Redshift, what command should you use?
INSERT INTO or CREATE TABLE AS
Can the COPY command decrypt data?
Yes, it can do this as it is being loaded into RedShift.
What is the best practice when loading data that has few columns, but lots of rows (narrow table).
Load it using a single COPY command. Do not break this up.
How do you copy a Redshift snapshot from region to region when it is encrypted?
You use a copy grant. You create a KMS key, procide a unique namem and specify the KMS ID in the destination region. You enable copying in the source region.
What is Redshift DBLINK?
It connects Redshift to PostgreSQL
Can you import data from DynamoDB to Redshift?
Yes. You can load tables using COPY
Does Redshift work with Data Pipeline?
Yes
What is Redshift WLM?
It prioritizes short fast queries vs. long and slow queries.
What is Redshift Concurrency scaling?
It automatically adds cluster capacity to handle increase n concurrent read queries.
What queries in redshift get the most and least concurrency.
short queries get more concurrency and long queries get little. This is configurable though and concurrency scaling can be used.
What are Redshift query monitoring rules?
They monitor your queries and will abort them if they are running longer than allowed.
Can Redshift Automatic WLM support Query hopping?
No. This must be done in manual mode. Hopping means that you are sending the query to a different queue because it timed out.
What is RedShift SQA
Short Query Acceleration. This can be used in place of WLM if you only want to accelerate short queries. Can be used with Create Table As and read only queries. You configure the value of what “short” is.
What does Vacuum commands do?
Recovers space from deleted rows
What are the Vacuum types
FULL - Default. Resorts and reclaims space
DELETE ONLY - Only reclaims space
SORT ONLY - Only resorts records
REINDEX - Re-Analyze interleaved
What is Elastic Resize in Redshift?
It allows you to add or remove nodes of the same type.
The cluster goes down for a few min, but it tries to keep connections open.
What is classic resize in RedShift?
Allows you to change node types or number of nodes
What can you do to minimize downtime during resizing your cluster
Snapshot, restore, and then resize before changing the primary.
What is Redshift data lake export?
It UNLOADs your query to S3 in Apache Parquet format. 2x faster and 6x more compressed
What can I use in RedShift to UNLOAD data to EMR or SageMaker?
Redshift Data Lake Export
What are RA3 Nodes in RedShift?
It has managed storage. SSD based. Can be independently scaled.
What are the spatial data types in RedShift?
Geometry and Geography
What is required to use cross-region replication in RedShift?
The RA3 node type is required. Allows you to share live data.
What is AQUA
It accelerates the processing of data from S3. Only available on RA3 types.
What commands are used to define access privileges for a user or group in RedShift?
GRANT or REVOKE
Does Redshift automatically scale?
No, You must use RedShift Serverless.
What is a good use case for Redshift serverless?
Ad-hoc business analysis and lower environments.
How is RedShift serverless billed?
By RPU + Storage fee. RPU = Redshift Processing Units.
Does Redshift serverless support spectrum?
No
What are materialized views in RedShift?
It stores the results of the query. Not the query itself like a view. Good for performance.
Can you share data across redshift clusters for read purposes?
Yes using RedShift data Sharing
What is a Redshift Lambda UDF?
You can use Lambda inside your SQL queries.
How do you define a Redshift Lambda UDF
CREATE EXTERNAL FUNCTION
What are RedShift Federated Queries?
It ties Redshift into RDS and Aurora for PostgreSQL and MySQL. Allows access to live data and removes the ETL process. THIS IS READ ONLY.
Using RedShift Federated Queries, can you use RDS to query RedShift?
No. This is one way.