AWS Data Analytics Flashcards by Dan Kositzke

In a single data dashboard, Amazon ___________ can include AWS data, third-party data, big data, spreadsheet data, SaaS data, B2B data, and more.

Quicksight

How well did you know this?

Not at all

Perfectly

CloudWatch detailed monitoring sends data from your EC2 instance to CloudWatch in ______ intervals.

1-minute

How well did you know this?

Not at all

Perfectly

____________ is an ETL service that captures, transforms, and delivers streaming data to data lakes, data stores, and analytics services.

Kinesis Data Firehose

How well did you know this?

Not at all

Perfectly

When Kinesis Data Firehose is configured to send data to Redshift, behind the scenes it has to load the streaming data to _______ first and then issue a ______ command to move the data to Redshift.

S3… COPY…

How well did you know this?

Not at all

Perfectly

Within Kinesis Data Analytics, using _________ __________ is a windowing method for analyzing time-based, overlapping groups of data that arrive at inconsistent times by aggregating the data.

stagger windows

How well did you know this?

Not at all

Perfectly

What are the three windows you can use to process data in Kinesis Data Analytics?

Stagger Windows
Tumbling Windows
Sliding Windows

How well did you know this?

Not at all

Perfectly

___________ includes a built-in ML algorithm that can easily provide reliable forecasts for your data.

Amazon QuickSight

How well did you know this?

Not at all

Perfectly

_______ is a fast, open-source, distributed SQL query engine designed for interactive analytic queries over large datasets from multiple sources (built by Facebook).

Presto

How well did you know this?

Not at all

Perfectly

AWS Glue ETL scripts can be coded in _________ or _________ .

Python… Scala…

How well did you know this?

Not at all

Perfectly

Amazon Redshift automatically integrates with ________ but not with an ________ (for encryption keys).

AWS KMS… HSM…

How well did you know this?

Not at all

Perfectly

With Amazon Redshift, you can’t migrate to an _______-encrypted cluster by modifying the cluster. This is only possible if you want to enable _______ encryption.

HSM… KMS…

How well did you know this?

Not at all

Perfectly

To load data from S3 to Redshift, you can use a __________ _________ that lists out the specific S3 paths you want to be copied over.

manifest file

How well did you know this?

Not at all

Perfectly

Using the AWS Glue crawler for compressed files will cause the run time to ____________.

increase… It will take longer because the crawler has to download and decompress the file before reading it.

How well did you know this?

Not at all

Perfectly

AWS Glue ___________ crawls only crawl folders that were added since the last crawler run, which can save significant time and cost.

incremental

How well did you know this?

Not at all

Perfectly

To enable permissions between S3 and QuickSight, you would need to configure the permissions from the _________ console.

QuickSight

How well did you know this?

Not at all

Perfectly

The _________ process re-sorts rows and reclaims space in either a specified table or all tables in the current database in Amazon Redshift.

VACUUM

How well did you know this?

Not at all

Perfectly

If QuickSight connects to the data store by using a ________ ________, the data automatically refreshes when you open an associated dataset, analysis, or dashboard.

direct query

How well did you know this?

Not at all

Perfectly

________ ______ is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data.

Amazon EMR

How well did you know this?

Not at all

Perfectly

Can you use AWS Glue triggers to execute a job to run directly after a crawler completes?

No, but you can create an AWS Glue workflow with two triggers: one for the crawler and one for the job. This will achieve the same effect.

How well did you know this?

Not at all

Perfectly

The capacity limits of an Amazon Kinesis data stream are defined by the ________ _____ ________ within the data stream.

number of shards

How well did you know this?

Not at all

Perfectly

When creating an EMR cluster and you want to have the log files archived to Amazon S3, you must enable this feature __________ (while / after) launching the cluster.

while

How well did you know this?

Not at all

Perfectly

Does Amazon SQS support real time streaming of data?

No.

How well did you know this?

Not at all

Perfectly

What are the two Amazon EMR cluster types (regarding the time it takes for each to initialize) ?

(1) persistent / long-running
(2) transient

How well did you know this?

Not at all

Perfectly

In Kinesis Data Streams, you can create up to _____ registered consumers per stream.

How well did you know this?

Not at all

Perfectly

The two Kinesis Data Streams capacity modes are _________ and _________. These refer to whether the data stream shards are automatically or manually created.

on-demand... provisioned

To detect anomalies in your Kinesis Data Stream, you can use the ________________ function.

RANDOM_CUT_FOREST

Kinesis Data Analytics (KDA) supports _____________, _____________, and _____________ as destinations.

Kinesis Data Streams... Kinesis Data Firehose... Lambda

A common architecture using Kinesis Data Analytics (KDA) might look like this: ___________ --> Kinesis Data Analytics --> ___________ --> S3

Kinesis Data Stream --> Kinesis Data Analytics --> Kinesis Data Firehose --> S3

Apache _______ is a data warehousing system that uses SQL-like queries to analyze structured data stored in Hadoop Distributed File System (HDFS).

Hive

When creating an EMR cluster, what two configuration options can you choose from? The selected option is applied to each node type (primary, core, task) of the cluster.

1. Instance Fleets 2. Uniform Instance Groups (simpler, provides autoscaling)

Can Glue Data Catalog be used to store data, in a similar way to S3?

No, it is only used to store schema information on data gathered from the Glue crawler.

By default, Amazon Redshift clusters are created and situated in _______ AZ(s) within an AWS Region.

1... However, a multi-AZ deployment is also an option

If you have customized networking requirements for using Amazon Redshift, you will need to enable _________________ _______ _________________.

Enhanced VPC Routing

S3 Transfer Acceleration is enabled at the ________ level.

bucket

What are three common CLI commands for moving data to and from S3?

cp (copy) mv (move) sync (sync)

What are the 3 API calls for an S3 Multi Part upload?

CreateMultipartUpload UploadPart CompleteMultipartUpload

What is the max number of parts for an S3 Multi Part upload?

10,000

Does Elasticache for Memcached support snapshots and replication?

No. Snapshots and replication are not supported for memcached, just for Redis.

Which AWS database stores data as nodes connected with edges?

Neptune

In relational databases, row-based storage is ideal for OL__P and columnar storage is ideal for OL__P.

OLTP... OLAP

Apache _______ is an analytics framework for processing large datasets. (hint: Databricks is built on top of this)

Spark

What are the 3 data storage options for Amazon EMR?

1. HDFS 2. EMRFS (uses S3) 3. Local Storage (Instance Store / EBS)

An Amazon EMR cluster can have either ____ or ____ primary (aka master) nodes.

1 or 3

How many AZ's are used for Amazon EMR clusters?

Only 1 AZ

What are the three node types in an Amazon EMR cluster?

1. Primary/Master Node 2. Core Node 3. Task Node (optional)

What is the name of the API you can use to launch an Amazon EMR cluster?

RunJobFlow API

What is the name of the API you can use to terminate an Amazon EMR cluster?

TerminateJobFlows API

The default limit for Amazon EMR instances is _____. This can be increased upon request.

20 instances (across all your clusters)

When using Amazon EMR, can you SSH directly into a task node?

No, you must first SSH into the master node, and then SSH into the desired node.

Which Amazon EMR node type (primary/master, core, task) hosts data using Hadoop Distributed File System (HDFS) and also runs Hadoop tasks?

Core Node

What are 5 implementations of how you can run Amazon EMR applications? (i.e. Amazon EMR on ______, Amazon EMR _____)

1. Amazon EMR Serverless 2. Amazon EMR on EC2 3. Amazon EMR on AWS Outposts 4. Amazon EMR on EKS 5. Amazon EMR on Local Zones

For Amazon EMR billing, __________ rounds up the runtime duration to the nearest minute, whereas __________ tracks runtime duration to the nearest second.

BilledResourceUtilization... TotalResourceUtilization

Amazon EMR supports what two types of Hive clusters?

1. interactive (customer can run Hive scripts directly on master node) 2. batch (Hive script stored in S3 and referenced)

Amazon Redshift can automatically generate recommendations for managing your warehouse with the feature called _________ __________

Redshift Advisor

Does Redshift support native integration with Amazon SageMaker?

Yes

____________ is a feature of Amazon Redshift that lets you run queries against your data lake in Amazon S3, with no data loading or ETL required.

Redshift Spectrum

Using Amazon Redshift ______ nodes with managed storage allows you to pay separately for storage and compute.

RA3

For Amazon Redshift instances using Dense Compute (DC) and Dense Storage (DS2) clusters, where is the data stored?

On the compute nodes (as opposed to S3 for RA3 clusters and Redshift Serverless)

How is the data stored when using an Amazon Redshift RA3 instance?

Frequently processed data (hot data) is stored on high performance SSDs, and cold data stored in S3.

What service would a customer use to integrate (and/or aggregate) Amazon Redshift with their own on-premises data warehouse?

AWS Data Exchange

How are Redshift Multi-AZ and Redshift Relocation different, regarding RTO?

Redshift Relocation is free and has a 10-60 minute recover time. Redshift Multi-AZ is more expensive, but has an RTO measured in seconds.

_____________ allows SQL users to create, train, and deploy machine learning models using familiar SQL commands.

Redshift ML

The Amazon Redshift _______ _______ simplifies access to Amazon Redshift because you don’t need to configure drivers and manage database connections. Instead, you can run SQL commands to an Amazon Redshift cluster by simply calling a secured API endpoint

Data API

How long are Amazon Redshift automatic backups retained vs manual backups?

Automatic: 24 hours Manual: Indefinitely

How would you monitor the performance of your Amazon Redshift data warehouse cluster?

AWS Management Console, or CloudWatch APIs

Is there a charge for using the Amazon Redshift Data API?

When you launch an Amazon Redshift cluster, what option determines the CPU, RAM, storage capacity, and storage drive type for each node?

The node type

For datasets under 1 TB (compressed), what is the recommended Redshift node type?

DC2 (Dense Compute node)

What are the two EC2 platforms used for launching an Amazon Redshift cluster?

1. EC2-Classic 2. EC2-VPC

In Amazon Redshift, you need to associate a __________ _________ with each cluster that you create in order to configure database settings such as query timeout and date style.

parameter group

The charges that you accrue for using Amazon Redshift are based on _______ nodes and billed at an _________ rate.

compute... hourly...

Which EC2 instance categories does Amazon EMR support (i.e. on-demand, etc.) ?

on-demand spot reserved

An Amazon Redshift cluster is a set of nodes, which consists of a ________ node and one or more ________ nodes.

leader... compute...

A Quicksight _________ is a user who can create and publish dashboards.

Author

A Quicksight _________ is a user who consumes interactive dashboards.

Reader

Amazon QuickSight __________ Edition offers enhanced functionality such as QuickSight Readers, Private VPC connectivity, and AD connectivity.

Enterprise

_________ _________ __________ __________ are of 30-minute duration each. Each session is charged at $0.30 with maximum charges of $5 per Reader in a month.

Amazon QuickSight Reader sessions

Will an Amazon Quicksight Reader be charged if QuickSight is open in a browser in a background tab?

No, only charged when user interacts with page via a page refresh, filtering, clicking, etc.

Can Amazon Quicksight “Authors” or “Readers” invite more users?

No. This can only be done with a QuickSight "Admin" account.

Does Amazon QuickSight connect to both Amazon EC2 and on-premises databases?

Yes

The “Augment with SageMaker” option for Amazon __________ allows your SageMaker ML models to run inferences on your data.

QuickSight

Does QuickSight leverage SageMaker models to perform inference on incremental data or the full data every time it runs?

Inference runs on the full data every time it refreshes.

Amazon QuickSight has an innovative technology called ________ that allows it to select the most appropriate visualizations based on the properties of the data.

AutoGraph

You can use AWS Glue _________ to visually clean up and normalize data without writing code.

DataBrew

How does AWS Glue relate to AWS Lake Formation?

AWS Lake Formation encompasses AWS Glue PLUS additional features.

With AWS Glue _______, data engineers can visually create, run, and monitor ETL workflows.

Studio

The metadata stored in the AWS Glue Data Catalog can be readily accessed from _________________, ______________, _____________, _________________, and third-party services.

AWS Glue ETL Amazon Athena Amazon EMR Amazon Redshift Spectrum

The AWS Glue ________ ____________ is a new feature that allows you to centrally discover, control (i.e. enforce), and evolve data stream schemas.

Schema Registry

The AWS Glue ________ ____________ supports Apache Avro and JSON Schema data formats and Java client applications

Schema Registry

Does the AWS Glue Schema Registry provide encryption at rest and in transit?

Yes

After you define the flow of your data sources, transformations, and targets in the visual (no-code) interface, AWS Glue Studio will generate __________ __________ code on your behalf.

Apache Spark

Which programming languages does AWS Glue ETL support?

Python and Scala

When building an AWS Glue workflow, what are the two ways to trigger AWS Glue ETL jobs within your workflow?

AWS Glue ETL jobs can either be triggered on a schedule or on a job completion event.

AWS Glue provides default retry behavior that will retry all failures _____ times before sending an error notification to CloudWatch.

three

AWS Glue supports ETL on streams from _______________, _____________, and _____________.

Amazon KDS Apache Kafka Amazon MSK

Do you have to use both the Data Catalog and AWS Glue ETL together for the service to work?

No, they can be used independently.

Both AWS Glue and Kinesis Data Analytics can be used to process streaming data. ____________ is recommended when your use cases are primarily ETL and you want to run jobs on a serverless Apache Spark-based platform. ____________ is recommended when your use cases are primarily analytics and you want to run jobs on a serverless Apache Flink-based platform.

AWS Glue Kinesis Data Analytics

Apache Spark is primarily used for ______ processing, whereas Apache Flink is primarily used for ______ processing.

batch... stream...

Both AWS Glue and Kinesis Data Firehose can be used for streaming ETL. ___________ is recommended for complex ETL, including joining streams, and partitioning the output in Amazon S3 based on the data content. ___________ is recommended when your use cases focus on data delivery and preparing data to be processed after it is delivered.

AWS Glue Kinesis Data Firehose

The AWS Glue __________ ML Transform can solve record linkage and data deduplication problems.

FindMatches

AWS Glue _______ __________ is a feature of AWS Glue that automatically measures and monitors the quality of data in data lakes and pipelines.

Data Quality

For the following AWS Glue features: Data __________ use DataBrew to transform data without writing any code. Data __________ use the Data Catalog to manage metadata. Data __________ use AWS Glue Studio to author scalable data integration pipelines.

analysts engineers engineers

Can Amazon Athena process unstructured, semi-structured, and structured datasets?

Yes, it can process all three

AWS strongly recommends using the ______ command to load large amounts of data into Redshift, as opposed to the _______ command.

COPY... INSERT...

To grant or revoke privilege to load data into a table using a Redshift COPY command, grant or revoke the __________ privilege.

INSERT

To load data from Amazon S3, the Redshift COPY command must have _______ access to the bucket and _______ access for the bucket objects.

LIST... GET...

For Redshift to obtain authorization to access a resource, your cluster must be authenticated using either __________ access control or __________ access control. (________ access control is recommended by AWS)

role-based... key-based... (role-based)

With ___________ access control, your Redshift cluster temporarily assumes an AWS Identity and Access Management (IAM) role on your behalf.

role-based

When loading data into Redshift, you can use a ___________ file to ensure that your COPY command loads only your specified files from Amazon S3.

manifest

When you load data into Redshift from S3 using a COPY command, what do you need to do differently when S3 server-side encryption is enabled?

Nothing. The process is the same whether S3 is encrypted or not.

When using the COPY command to load a table into Amazon Redshift, does the table to be loaded need to already exist in the Redshift database?

Yes

By default, when loading data from DynamoDB into Redshift, do these two services need to be in the same AWS Region?

Yes, but you can also specify a different region using the REGION parameter

When loading data from DynamoDB into Redshift, what happens when DynamoDB attributes do not match a column in the Amazon Redshift table?

These attributes are discarded. Additionally, they consume part of DynamoDB's provisioned throughput since the attributes still have to be read.

After a Redshift load operation is complete, you can query the ______________ system table to verify that the expected files were loaded.

STL_LOAD_COMMITS

To validate the data in the Amazon S3 input files or Amazon DynamoDB table before you actually load the data into Redshift, you can use the __________ option with the COPY command.

NOLOAD

To apply automatic compression when loading data to Redshift, run the COPY command with the __________ option set to ON.

COMPUPDATE

When loading data files from Amazon S3 into Redshift, does the order of the columns matter?

Yes, the columns must be in the same order as the Redshift table

The category of SQL commands that manipulate data in a database (INSERT, UPDATE, DELETE) are referred to as _______ _____________ ____________ commands.

Data Manipulation Language (DML)

Does Amazon Redshift support a single merge (or upsert) command to update a table from a single data source?

No, but you can essentially do the same thing with a combination of updates and inserts.

The category of SQL commands that can be used to define the database schema, such as CREATE, DROP, ALTER, are referred to as _______ _____________ ____________.

Data Definition Language (DDL)

In a Redshift cluster, each node is further broken down into ___________, which have their own compute and storage associated with each.

slices

AWS recommends creating your Redshift tables with __________ ______, which uses automatic table optimization to choose the sort key.

SORTKEY AUTO

When you create a Redshift table, you can optionally specify one column as the ____________ ______. When the table is loaded with data, the rows are distributed to the node slices according to this key.

distribution key

What are the two types of Redshift table sort keys, and which is preferred?

COMPOUND (preferred) INTERLEAVED

With compression in Redshift, can the sort key column be compressed?

No, it must always be in its raw form so it is always available for Redshift to use.

Which type of Redshift sort key performs better when using lots of WHERE clauses?

INTERLEAVED

Which type of Redshift sort key performs better when using lots of ORDER BY clauses?

COMPOUND

AWS recommends which distribution style for your Redshift tables?

DISTSTYLE AUTO

When you create a Redshift table, you can designate one of four distribution styles. What are they?

AUTO EVEN KEY ALL

When creating a Redshift table with a NOT NULL constraint on a column, does Redshift enforce this?

No, Redshift can still accept data into that column

Redshift Spectrum supports ________ and ________ operations. It does NOT support ________ and ________ operations.

SELECT... INSERT... UPDATE... DELETE...

When resizing a Redshift cluster, the source cluster goes into ____________ mode while the resized cluster is being created.

read-only

The two types of resize operations you can choose for resizing a Redshift cluster are __________ and __________.

classic resize... elastic resize.

The ______ resize operation for a Redshift cluster takes minutes, while a ______ resize operation can take hours to days.

elastic... classic...

When performing an elastic resize of a Redshift cluster, what are the two main constraints?

1. Can't be used from or to a single-node cluster 2. Only available for clusters that use the EC2-VPC platform

For classic resize and elastic resize operations for Redshift clusters, can you cancel the resize operation after it has been started?

For classic resize, yes. For elastic resize, no.

Are the Redshift pause/resume options supported for EC2-Classic clusters?

No, you can only pause/resume EC2-VPC clusters

Which type of Redshift cluster resize uses a snapshot for the operation?

elastic resize

What Redshift operation can sort rows and will only sort tables that are less than 95% sorted?

VACUUM SORT ONLY

What Redshift operation can reclaim disc space and will only run on tables that have more than 5% of the rows marked for deletion?

VACUUM DELETE ONLY

What Redshift VACUUM option will ensure that the operation is not interrupted by (i.e. resources are not diverted to) incoming queries.

BOOST

A faster alternative to performing a full vacuum operation on a Redshift cluster table could be to do a _______ _______. This can be beneficial when you have an extremely unsorted table.

Deep Copy

What AWS service can transfer data to and from AWS at a huge scale (i.e. 10GB/s per agent, which is approximately 100TB/day) ?

AWS DataSync

What is an Amazon EMR cluster composed of?

A collection of EC2 instances (referred to as "nodes")

Each EC2 instance in an Amazon EMR cluster is called a _______.

node

Every Amazon EMR cluster has a ___________ node, and it's possible to create a single-node cluster with only this node.

primary

The following is an example process using four steps for which AWS service? 1. Submit an input dataset for processing. 2. Process the output of the first step by using a Pig program. 3. Process a second input dataset by using a Hive program. 4. Write an output dataset.

Amazon EMR

When you set up an Amazon EMR cluster in a private subnet, AWS recommends that you also set up _____________________. Otherwise, you will incur additional charges for NAT gateway as the traffic flow will not be contained within your VPC.

VPC endpoints for Amazon S3

Amazon EMR integrates with ___________ to log information about requests made by or on behalf of your AWS account. With this information, you can track who is accessing your cluster when, and the IP address from which they made the request.

CloudTrail

___________ ______ _________ is a web-based integrated development environment (IDE) for fully managed Jupyter notebooks that run on Amazon EMR clusters.

Amazon EMR Studio

What feature of Amazon EMR allows you to browse your data catalog, run SQL queries, and download results before you work with the data in a Studio notebook.

Amazon EMR Studio SQL Explorer

An Amazon EMR Studio is composed of one or more ___________.

Workspaces

___________ ______ _________ does not support EMR clusters with multiple primary nodes.

Amazon EMR Studio

The maximum number of Amazon EMR Studios you can have is _____ per AWS account.

To use SSH to log on to the master/primary node of an Amazon EMR cluster, you will need to associate an __________ ______ ______ ______with the cluster.

Amazon EC2 key pair

What are two limitations of launching an Amazon EMR cluster with multiple primary nodes?

1. Cannot use instance fleets configuration for the nodes 2. If two of the three primary nodes fail simultaneously, then the cluster will fail

When launching an Amazon EMR cluster with multiple primary nodes, how many core nodes does AWS recommend launching?

At least 4

Amazon EMR on ____________ is ideal for low latency workloads that need to be run in close proximity to on-premises data and applications.

AWS Outposts

Are spot instances or reserved instances supported for Amazon EMR on AWS Outposts?

No, only on-demand instances are supported

By default, when you create an Amazon EMR cluster, what AMI is used?

Amazon Linux AMI

When launching an Amazon EMR cluster and choosing between instance fleets or uniform instance groups, which category of nodes does this decision apply to (primary, core, task) ?

All of them

When launching an Amazon EMR cluster with the uniform instance groups configuration, your cluster can include up to _____ instance groups: _____ primary instance group(s) _____ core instance group(s), and up to _____ optional task instance groups.

50 1 1 48

Which EMR node type does not store data?

Task nodes

The DataNode daemons run on which Amazon EMR node type?

Core nodes

Which Amazon EMR storage option is ephemeral, distributed, and best suited for caching results between intermediate job flow steps?

HDFS

Which Amazon EMR storage option would you use to separate your compute and storage and persist data outside of the lifecycle of your cluster?

EMRFS (because it stores data to S3)

Is Kinesis Data Streams a fully managed and serverless AWS service?

Yes

Is Amazon EMR a fully managed and serverless AWS service?

No. However, there is a new option you can use called Amazon EMR Serverless.

The __________ is a Java library that acts as an intermediary between your record processing logic and Kinesis Data Streams.

Kinesis Client Library (KCL)

Can multiple Kinesis Data Streams applications consume data from the same stream?

Yes

A __________ ______ ________ is a set of shards.

Kinesis data stream

A Kinesis data stream _________ contains a sequence of data records. Each data record has a __________ __________, which is the unique identifier of each data record within a shard, but this number may overlap for a data record in a different shard.

shard sequence number

Data records within a Kinesis data stream shard are composed of what three attributes?

1. sequence number 2. partition key 3. data blob

A data blob is one of three attributes within a __________ within a ________ within a Kinesis data stream. The data blob can be up to ____ MB in size.

data record... shard... 1 MB

By default, the retention period of the data records within a Kinesis data stream is __________, and the max retention period is _________.

24 hours... 365 days

Each data record within a Kinesis data stream shard gets assigned a unique ___________ ____________.

sequence number

In Kinesis Data Streams, a ___________ ______ is used to logically separate sets of data. This is generally not a 1:1 ratio to the shards. Often, one shard will have 100+ of these.

partition key

Kinesis Data Streams uses ______________ for encryption.

AWS KMS master keys

In Kinesis Data Streams, to read from or write to an encrypted stream, producer and consumer applications must have permission to access the __________________.

KMS master key

In Kinesis Data Streams, does using server-side encryption incur AWS KMS costs?

Yes

In Kinesis Data Streams, by default, you can create up to _____ data streams with the on-demand capacity mode. This can be increased with a support ticket.

In Kinesis Data Streams, what is the limit for the number of streams per account, using KDS provisioned mode?

No limit

The Kinesis Data Streams GetRecords command can retrieve up to _____ MB of data per call from a single shard, and up to _________ records per call.

10... 10,000...

In Kinesis Data Streams, one read transaction is also referred to as one ________________ call. They are the same thing.

GetRecords

Each Kinesis Data Stream shard can support up to a maximum total data read rate of ____ MB per _________ via GetRecords

2 MB... second

In Kinesis Data Streams, can you switch the capacity mode of your stream? How often?

Yes. You can switch 2x within 24 hours.

A Kinesis data stream in the on-demand mode accommodates up to ________ the peak write throughput observed in the previous 30 days.

double

The Kinesis Data Streams ___________ capacity mode is suited for predictable traffic with capacity requirements that are easy to forecast.

provisioned

In Kinesis Data Streams, can you enable server-side encryption after the stream has been created?

Yes

AWS recommends (for better Kinesis Data Stream scalability) that you migrate all of your producers and consumers that call the _____________ API to instead call the ______________ and _____________ API's.

DescribeStream... DescribeStreamSummary... ListShards

What API would you use to reshard a KDS stream?

UpdateShardCount

In Kinesis Data Streams, what are the two types of resharding operations?

shard split shard merge

When changing the data retention period for your KDS stream, how quickly does the change take effect?

Within minutes

You can assign your own metadata to streams you create in Amazon Kinesis Data Streams by using _______.

AWS Data Analytics Flashcards

(385 cards)