Domain 1 - Collection Flashcards
Kinesis - with KPL PutRecords(), what happen if a single record fails?
The KPL PutRecords() operation often sends multiple records to the stream per request. If a single record fails, it is automatically added back to the KPL buffer and retried. The failure of one record does not impact the processing of other records in the request.
You are sending a lot of 100B data records and would like to ensure you can use Kinesis to receive your data. What should you use to ensure optimal throughput, that has asynchronous features ?
1) Kinesis SDK
2) Kinesis Producer Library
3) Kinesis Client Library
4) Kinesis Connector Library
5) Kinesis Agent
2) KPL - Through batching (collection and aggregation), we can achieve maximum throughput using the KPL. KPL is also supporting an asynchronous API
You would like to collect log files in mass from your Linux servers running on premise. You need a retry mechanism embedded and monitoring through CloudWatch. Logs should end up in Kinesis. What will help you accomplish this?
1) Kinesis SDK
2) KPL
3) Kinesis Agent
4) Direct Connect
3) Kinesis Agent
You would like to perform batch compression before sending data to Kinesis, in order to maximize the throughput. What should you use?
1) Kinesis SDK
2) KPL’s Compression Feature
3) KPL + Implement Compression Yourself
3) Compression must be implemented by the end user
You have 10 consumers applications consuming concurrently from one shard, in classic mode by issuing GetRecords() commands. What is the average latency for consuming these records for each application?
1) 70ms
2) 200ms
3) 1 sec
4) 2 sec
4) 2 sec - You can issue up to 5 GetRecords API calls per second, so it’ll take 2 seconds for each consuming application before they can issue their next call
You have 10 consumers applications consuming concurrently from one shard, in enhanced fan out mode. What is the average latency for consuming these records for each application?
1) 70ms
2) 200ms
3) 1 sec
4) 2 sec
1) 70ms - in enhanced fan out mode, each consumer will receive 2MB per second of throughput and have an average latency of 70ms.
Note: EFO has a limit of 20 registered consumers
You are consuming from a Kinesis stream with 10 shards that receives on average 8 MB/s of data from various producers using the KPL. You are therefore using the KCL to consume these records, and observe through the CloudWatch metrics that the throughput is 2 MB/s, and therefore your application is lagging. What’s the most likely root cause for this issue?
1) You need to split shards some more
2) There is a hot partition
3) Cloudwatch is displaying the average throughput not aggregated one
4) Your DynamoDB is under-provisioned
4) DynamoDB is for check pointing by KCL. Because it’s under provisioned, checkpointing does not happen fast enough and results in a lower throughput for your KCL based application. Make sure to increase the RCU / WCU
10 shards = 1MB / Shard ingest and 2MB/shard egress capacity. So 1) and 2) can not be the answers
Which of the following statement is wrong?
1) Spark Streaming can write to Kinesis Data Stream
2) Spark Streaming can read from Kinesis Firehose
3) Spark Streaming can read to Kinesis Data Stream
2)
I took a guess on this one since Spark stream was never mentioned when we talked about KDF
You are looking to decouple jobs and ensure data is deleted after being processes. Which technology would you choose?
1) Kinesis Data Streams
2) Kinesis Data Firehose
3) SQS
3) SQS
The key hint is “decouple jobs”
Which protocol is not supported by the IoT Device Gateway?
1) MQTT
2) Websockets
3) HTTP 1.1
4) FTP
4) FTP
I took a guess. HTTP 1.1 can be a trap
You would like to control the target temperature of your room using an IoT thing thermostat. How can you change its state for target temperature even in the case it’s temporarily offline?
1) Send a message to the IOT brokerevery 10s until it is acknowledged by the IOT thing
2) use a rule actions that triggers when the device come back online
3) Change the state of the device shadow
4) Change its metadata in the thing registry
3) That’s precisely the purposes of the device shadow, which gets synchronized with the device when it comes back online
You have setup Direct Connect on one location to ensure your traffic into AWS is going over a private network. You would like to setup a failover connection, that must be as reliable and as redundant as possible, as you cannot afford to be down for too long. What backup connection do you recommend?
1) Another Direct Connect Setup
2) Site to site VPN
3) Client side VPN
4) Snowball Connection
2) Site to Site VPN - although this is not as private as another Direct Connect setup, it is definitely more reliable as it leverages the public web. It is the correct answer here
1) Another Direct Connect is more secure than Site to Site VPN, it is less reliable as it does not leverage the public web. It is the wrong answer here, but a correct answer overall for setting up highly available Direct Connect
You would like to transfer data in AWS in less than two days from now. What should you use?
1) Set up Dx
2) Use Public Internet
3) Use AWS Snowball
4) Use AWS Snowmobile
2) Public Internet
1) Dx - When you create a public virtual interface, it can take up to 72 hours for AWS to review and approve your request.
3) and 4) obviously take too long
From which sources can the input for Kinesis analytics be obtained ?
1) MySQL and Kinesis Data Streams
2) DynamoDB and Kinesis Firehose Deliver Streams
3) Kinesis data streams and Kinesis Firehose delivery streams
4) Kinesis data streams and DynamoDB
3) Kinesis Analytics can only monitor streams from Kinesis, but both data streams and Firehose are supported.
After real-time analysis has been performed on the input source, where may you send the processed data for further processing?
1) Amazon S3
2) Redshift
3) Athena
4) Kinesis data stream or Firehose
5) All above
4) Kinesis Analytics can have Kinesis data stream, Kinesis Firehose delivery stream, and Lamda as destinations