Collection: 18% (Kinesis Streams/Firehose, MSK, SQS, Data Pipeline, Snow, DMS, IoT Core) Flashcards
Be able to: a) determine the operational characteristics of the collection system b) select a collection system that handles the frequency, volume and source of data
What is the throughput capacity of a PutRecords call to Kinesis Streams shard?
1 MiB of data (including partition keys) per second
1,000 records per second
What is the throughput capacity of a GetRecords call to the Kinesis Streams API?
2 MiB or 5 transactions per second
What are the default and maximum Kinesis Stream record retention period?
Default: 24 hours
Maximum: 7 days
What is the difference between the aggregation and collection mechanisms in Kinesis Streams?
Aggregation batches KPL user records into a single Streams record, increasing payload size, providing better throughput and improving shard efficiency
Collection batches Streams records sent to a single HTTP request, reducing request overhead
What are the three means of adding a record to a Kinesis Stream?
a) Kinesis Agent
b) Kinesis Streams REST API in the SDK
c) Kinesis Producer Library
What are the two scenarios that may cause a ProvisionedThroughputExceededException? What can be done to address them?
a) Frequent checkpointing
b) Too many shards
Provide additional throughput to the DynamoDB application state table
Name four services that move data into AWS, other than the Snowball family?
a) Direct Connect
b) Storage Gateway
c) S3 Transfer Acceleration
d) Database Migration Service
What are the three advantages of using Direct Connect?
a) reduced costs
b) increased bandwidth throughput
c) consistent network performance
Name three advantages of using Snowball
a) scales to petabytes of data
b) faster than transmitting the data over the network
b) avoids creating networking bottlenecks
Name two advantages of using Snowball Edge
a) scales to petabytes of data
b) supports generation of data despite intermittent connectivity
What is the advantage of using Snowmobile?
Scales to exabytes of data
What does Storage Gateway provide?
Hybrid on-prem/cloud storage using a hardware gateway appliance and native integration with S3
What is the advantage of using S3 Transfer Acceleration?
Supports fast uploading from locations distant to regions with S3
Name two advantages of using the Database Migration Service?
a) the source database remains fully operational during the migration
b) supports continuous data replication
Where does Kinesis Data Streams store shard state and checkpointing information?
DynamoDB, one table for each stream
Which service is MSK most similar to?
Kinesis Data Streams
Name two differences between Kinesis Data Streams and MSK?
a) MSK performs slightly better
b) Kinesis is more fully managed
What does “checkpointing” refer to in Kinesis Streams?
The tracking of records that have already been processed
Name two consumers that are supported by Streams but not by Firehose
a) Spark
b) KCL
Which Kinesis service supports multiple S3 destinations?
Kinesis Data Streams
Kinesis Connector Library can be used to emit data to which four AWS data services?
a) S3
b) DynamoDB
c) Redshift
d) ElasticSearch
What are the minimum and maximum sizes for a Kinesis Firehose delivery stream buffer?
1 MB to 128 MB
What are the two streaming models supported by Kafka, and which third model enables them to work together?
Queueing, publish and subscribe
Partitioned log
Name the four Data Pipeline components
a) datanode (end destination)
b) activity (pipeline action)
c) precondition (readiness check)
d) schedule (activity timing)
Name five differences between Kinesis Firehose and Kinesis Data Streams
a) Firehose is fully managed whereas Streams requires some manual configuration
b) Firehose has a somewhat greater latency
c) Firehose does not support data storage or replay
d) Firehose can load data directly into storage services
d) Firehose does not support KCL
Which service, Kinesis Firehose or Kinesis Data Streams, supports connection to multiple destinations?
Kinesis Data Streams
Which AWS service has largely replaced Data Pipeline?
Lambda
Which service creates a dedicated private network connection between a customer network and AWS?
Direct Connect
How can the CloudWatch logs agent be integrated with Kinesis?
Log data can be shared cross-region and cross-account by configuring Kinesis Data Stream subscriptions
When would it be appropriate to store application logs in S3?
Consolidating CloudTrail audit logs OR implementing serverless log analytics using Kinesis Analytics [uncertain, question from Milner post]
How can the Managed Service for Kafka be integrated with Kinesis Data Analytics?
It can’t be
Can Kinesis Data Streams integrate directly with any data storage services? If yes, how is it done? If no, what should be done instead?
No
Consumers running on EC2 or as Lambda functions must use the Kinesis Client Library to retrieve records from the stream and then emit them using the a storage service connector from the Kinesis Connector Library
Kinesis Firehose can integrate directly with which three data storage services?
S3, Elasticsearch, and Redshift
Integration with DynamoDB is not supported, and Kinesis Analytics and Splunk are not storage services
Name three beneficial abilities of stream processing
a) decouples collection and processing, which may be operating at different rates
b) multiple ingestion streams can be merged to a combined stream for consumption
c) multiple endpoints can work on the same data in parallel
Name three benefits of the Storage Gateway implementation
a) low latency, achieved through local caching of frequently accessed data
b) transfer optimisation, through sending only modified data and by compressing data prior to transfer
c) native integration with S3
What are the three key use cases for Storage Gateway?
a) backups and archives to the cloud
b) reduction of on-prem storage by using cloud-backed file shares
c) on-prem applications that require low latency access to data stored in AWS since data is cached
What is the difference between a KPL user record and a Kinesis Data Stream record?
A user record is a blob of data that has particular meaning to the user
A Streams record is an instance of the service API Record structure