Collection: 18% (Kinesis Streams/Firehose, MSK, SQS, Data Pipeline, Snow, DMS, IoT Core) Flashcards

Question

Name five differences between Kinesis Firehose and Kinesis Data Streams

Answer 1

a) Firehose is fully managed whereas Streams requires some manual configuration b) Firehose has a somewhat greater latency c) Firehose does not support data storage or replay d) Firehose can load data directly into storage services d) Firehose does not support KCL

Answer 2

Kinesis Data Streams

Answer 3

Direct Connect

Answer 4

Log data can be shared cross-region and cross-account by configuring Kinesis Data Stream subscriptions

Answer 5

Consolidating CloudTrail audit logs OR implementing serverless log analytics using Kinesis Analytics [uncertain, question from Milner post]

Answer 6

It can't be

Answer 7

No Consumers running on EC2 or as Lambda functions must use the Kinesis Client Library to retrieve records from the stream and then emit them using the a storage service connector from the Kinesis Connector Library

Answer 8

S3, Elasticsearch, and Redshift Integration with DynamoDB is not supported, and Kinesis Analytics and Splunk are not storage services

Answer 9

a) decouples collection and processing, which may be operating at different rates b) multiple ingestion streams can be merged to a combined stream for consumption c) multiple endpoints can work on the same data in parallel

Answer 10

a) low latency, achieved through local caching of frequently accessed data b) transfer optimisation, through sending only modified data and by compressing data prior to transfer c) native integration with S3

Answer 11

a) backups and archives to the cloud b) reduction of on-prem storage by using cloud-backed file shares c) on-prem applications that require low latency access to data stored in AWS since data is cached

Answer 12

A user record is a blob of data that has particular meaning to the user A Streams record is an instance of the service API Record structure

Answer 13

Kinesis Streams is designed for the real-time processing of large volumes of data SQS is designed to as a polled buffer for much smaller data packets where the messages are processed independently

Answer 14

a) preserve record ordering (note that SQS FIFO queues also do this) b) route related records to the same consumer c) allow multiple consumers to consume the same stream concurrently

Answer 15

a) track the successful completion of each item independently b) support parallel processing since each message can only be retrieved once c) support priority queues d) scale transparently (Kinesis requires shard numbers to be adjusted manually)

Answer 16

Message Queuing Telemetry Transport, a lightweight pub/sub protocol Using X.509 certificates

Answer 17

A secure, scalable service that manages all active device connections to enable their efficient communication with IoT Core, over one of three low level protocols

Answer 18

a) HTTPS b) Websockets c) MQTT Note that FTP is not supported

Answer 19

A central location for storing attributes related to each connected device, or device that may be connected in the future

Answer 20

The last known state of each connected device, stored as a JSON document, and used as a message channel to send commands to that device through a uniform interface

Answer 21

The Rules Engine enables continuous filtering, transformation and routing of incoming device messages according to configured rules specified using an SQL based syntax

Answer 22

Rule actions specify which action the Rules Engine should take when a rule is triggered on an incoming device message

Answer 23

a) filter and transform incoming device data b) routing device data to other AWS services directly or via Lambda c) triggering CloudWatch alerts

Answer 24

a) Kinesis b) Simple Notification Service c) Simple Queue Service d) CloudWatch e) IoT Analytics

Answer 25

The Zookeeper connection string [uncertain, from Milner post]

Answer 26

a) accelerated log and data feed intake b) realtime metrics and reporting c) loading of aggregate data into a warehouse or MR cluster d) complex stream processing

Answer 27

a) decoupling microservices b) scheduling batch jobs c) distributing tasks to worker nodes

Answer 28

Lambda (function blueprints are available for common transformations) Firehose buffers data using a specified size or interval, then invokes the specified Lambda function on each batch Transformed data is sent back to Firehose for further buffering before being sent to the consumer

Answer 29

Transformations can be retried up to three times Failed records are sent to S3 Errors can also be sent to CloudWatch

Answer 30

Kinesis Connector Library (not to be confused with Kinesis Client Library)

Answer 31

Lambda functions

Answer 32

a) accepting data as soon as it has been produced, without the need for batching b) enabling custom applications to process and analyse streaming data

Answer 33

A sequence of records in a Kinesis stream

Answer 34

b) using PutRecords to write multiple records to one or more shards per request c) integrating with the KCL to provide consumer record aggregation and disaggregation a) providing an automatic and configurable retry mechanism d) submitting CloudWatch metrics to provide performance visibility

Answer 35

a) connecting to a Stream and enumerating its shards b) instantiating a record processor for every shard managed c) pulling records from the stream and pushing them to the corresponding record processor d) checkpointing processed records e) rebalancing shard-worker associations when the worker or shard counts change

Answer 36

a) partition key b) sequence number c) aggregated data blob

Answer 37

A Task Runner installed on the on-premise hosts polls Data Pipeline for work, and issues appropriate commands to run the specified activity, eg running a stored procedure

Answer 38

5 seconds | 24 hours, following which the data is discarded

Answer 39

0 - 7200 seconds (2 hours)

Answer 40

Default: 4 days Maximum: 14 days

Answer 41

Firehose delivers the skipped files to the S3 bucket as a manifest file in the errors folder. The manifest can then be used to manually load the data into Redshift using the COPY command once the issue causing the failure has been addressed.

Answer 42

Firehose will automatically increase the buffer size

Answer 43

Parquet and ORC

Answer 44

By synchronously replicating data across three Availability Zones

Answer 45

a) Data Pipeline, using a RedshiftCopyActivity with S3 and Redshift data nodes b) Lambda, using the Redshift Database Loader

Answer 46

60 - 900 seconds

Answer 47

a) each record is uniquely identified | b) records have a fixed unit of capacity

Answer 48

a) the Rules Engine | b) Device Shadow service

Answer 49

a) Device Gateway b) Message Broker c) Registry d) Device Shadow e) Rules Engine

Answer 50

A high-throughput, topic-based pub/sub service that enables the asynchronous transmission of messages over MQTT between devices and applications

Answer 51

Kinesis Firehose

Answer 52

10,000 records | 10 MiB

Answer 53

500 records 5 MiB, including partition keys 2,500 requests per second

Answer 54

Before delivery to the destination If the stream is configured for transformation, the records are de-aggregated before delivery to Lambda

Answer 55

Standard: 200 milliseconds Fan-out: 70 milliseconds Firehose: 60 seconds

Answer 56

The splitting or merging of shards to meet changing traffic demands Can be performed without restarting the stream and without impact on producers

Answer 57

The default retention period is 24 hours, so the records would have been deleted

Answer 58

a) on EC2 instances | b) as Lambda functions

Answer 59

a) S3 b) DynamoDB c) ElasticSearch Note that RDS and Redshift are not directly integrated

Answer 60

Complete and submit the Firehose Limits form

Answer 61

1 MiB, 1,000 requests and 1,000 records per second in most regions 5 MiB, 2,000 requests and 5,000 records per second in N. Virginia, Oregon and Ireland

Answer 62

200 in most regions 500 in N. Virginia, Oregon and Ireland

Answer 63

gzip, zip. snappy gzip

Answer 64

Volume of data ingested (number of records times the size of each record rounded to nearest 5 kB) Data format conversion, and volume of outgoing data to a destination resident in a VPC

Answer 65

Use of Pipeline is free, but there may be charges for the resources used

Answer 66

Firehose is not integrated with the KCL, rather data is delivered directly to a specified data storage service.

Answer 67

Keys should be generated randomly

Collection: 18% (Kinesis Streams/Firehose, MSK, SQS, Data Pipeline, Snow, DMS, IoT Core) Flashcards

Be able to: a) determine the operational characteristics of the collection system b) select a collection system that handles the frequency, volume and source of data (94 cards)