Collection Flashcards

1
Q

How long are records accessible in stream ?

A

24 hours

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

By how many days can we raise the limit for which a record in stream is accessible ?

A

7 days

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do we raise the limit from 24 hours to upto 1 day ?

A

Enabling extend Data Retention

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the maximum size of a data blob within 1 record ?

A

1 Mb

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How many records per second can each shard support ?

A

1000 PUT records

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the maximum number of shards you can have in a stream ?

A

No upper limit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the maximum number of streams you can have in an account ?

A

No upper limit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How much data per second can a single shard ingest ?

A

1 Mb of data per second

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How many writes can a single shard ingest ?

A

1000 records

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the default shard limit for Virginia, Oregon and Ireland ?

A

500 shards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the default shard limit outside of these 3 regions ?

A

200 shards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How many read transactions per second can each shard support ?

A

5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How many records can each read transaction provide ?

A

10,000 records

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the upper limit of read transaction ?

A

10 Mb

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

GetRecords can retrieve how many Mb of data per call from a single shard ?

A

10 Mb

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

GetRecords can retrieve how many records per call ?

A

Upto 10,000

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How many read transactions is 1 call to GetRecords counted as ?

A

1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the maximum speed of each shard via GetRecords ?

A

2 Mb per second

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

If calls to GetRecords returns 10 Mb, subsequent calls made within the next 5 seconds, do what ?

A

Throw an Exception

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How many consumers per stream can be registered to use enhanced fan-out ?

A

20

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What 3 things does a record consist of ?

A
  1. Sequence Number
  2. Partition Key
  3. Data Blob
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

You are accumulating data from IoT devices and you must send data within 10 seconds to Amazon ElasticSearch service. That data should also be consumed by other services when needed. Which service do you recommend using?

  • Kinesis Data Streams
  • Kinesis Data Firehose
  • SQS
  • Database Migration Service
A

Kinesis Data Stream

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

You need a managed service that can deliver data to Amazon S3 and scale automatically for you. You want to be billed only for the actual usage of the service and be able to handle peak loads. Which service do you recommend?

  • Kinesis Data Streams
  • Kinesis Data Firehose
  • SQS
  • Kinesis Analytics
A

Kinesis Data Firehose

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

You are sending a lot of 100B data records and would like to ensure you can use Kinesis to receive your data. What should you use to ensure optimal throughput, that has asynchronous features ?

  • Kinesis SDK
  • Kinesis Producer Library
  • Kinesis Client Library
  • Kinesis Connector Library
  • Kinesis Agent
A

Kinesis Producer Library

(Through batching (collection and aggregation), we can achieve maximum throughput using the KPL. KPL is also supporting an asynchronous API)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

You would like to collect log files in mass from your Linux servers running on premise. You need a retry mechanism embedded and monitoring through CloudWatch. Logs should end up in Kinesis. What will help you accomplish this?

  • Kinesis SDK
  • Kinesis Producer Library
  • Kinesis Agent
  • Direct Connect
A

Kinesis Agent

26
Q

You would like to perform batch compression before sending data to Kinesis, in order to maximize the throughput. What should you use?

  • Kinesis SDK
  • Kinesis Producer Library Compression Future
  • Kinesis Producer Library + Implement Compression Yourself
A

Kinesis Producer Library + Implement Compression Yourself

Compression must be implemented by the end user

27
Q

You have 10 consumers applications consuming concurrently from one shard, in classic mode by issuing GetRecords() commands. What is the average latency for consuming these records for each application?

  • 70 ms
  • 200 ms
  • 1 sec
  • 2 sec
A

2 sec (You can issue up to 5 GetRecords API calls per second, so it’ll take 2 seconds for each consuming application before they can issue their next call)

28
Q

You have 10 consumers applications consuming concurrently from one shard, in enhanced fan out mode. What is the average latency for consuming these records for each application?

  • 70 ms
  • 200 ms
  • 1 sec
  • 2 sec
A

70 ms (here, no matter how many consumers you have, in enhanced fan out mode, each consumer will receive 2MB per second of throughput and have an average latency of 70ms.)

29
Q

You would like to have data delivered in near real time to Amazon ElasticSearch, and the data delivery to be managed by AWS. What should you use?

-Kinesis Client Library (KCL)
-Kinesis Connector Library
-Kinesis Firehose
-

A

Kinesis Firehose

30
Q

You are consuming from a Kinesis stream with 10 shards that receives on average 8 MB/s of data from various producers using the KPL. You are therefore using the KCL to consume these records, and observe through the CloudWatch metrics that the throughput is 2 MB/s, and therefore your application is lagging. What’s the most likely root cause for this issue?

  • You need to split shards some more
  • There’s a hot partition
  • Cloudwatch is displaying the average throughput metric, not the aggregate one
  • Your DynamoDB table is under-provisioned
A

Your DynamoDB table is under-provisioned

(Because it’s under provisioned, checkpointing does not happen fast enough and results in a lower throughput for your KCL based application. Make sure to increase the RCU / WCU)

31
Q

You would like to increase the capacity of your Kinesis streams. What should you do?

  • Split Shards
  • Merge Shards
  • Turn on Auto Scaling
A

Split Shards

32
Q

Which of the following statement is wrong?

  • Spark Streaming can read from Kinesis Data Streams
  • Spark Streaming can read from Kinesis Data Firehouse
  • Spark Streaming can read to Kinesis Data Streams
A

Spark Streaming can read from Kinesis Data Firehouse

33
Q

Which of the following Kinesis Data Firehose does not write to?

  • S3
  • Redshift
  • DynamoDB
  • ElasticSearch
  • Splunk
A

DynamoDB

34
Q

You are looking to decouple jobs and ensure data is deleted after being processes. Which technology would you choose?

  • Kinesis Data Streams
  • Kinesis Data Firehose
  • SQS
A

SQS

35
Q

You are collecting data from IoT devices at scale and would like to forward that data into Kinesis Data Firehose. How should you proceed?

  • Send that data into IoT topic and define a rule action
  • Use enhanced fanout for the IoT topic and send that data into Kinesis Data Streams
  • Create and SNS topic, send the IoT data there and use AWS Lambda
A

Send that data into an IoT topic and define a rule action

36
Q

Which protocol is not supported by the IoT Device Gateway?

  • MQTT
  • Websockets
  • HTTP 1.1
  • FTP
A

FTP

37
Q

You would like to control the target temperature of your room using an IoT thing thermostat. How can you change its state for target temperature even in the case it’s temporarily offline?

  • Send a message to the IoT broker every 10 seconds until it is acknowledged by the IoT thing
  • Use a rules actions that triggers when the device comes back online
  • Change the state of the device shadow
  • Change its metadata in the thing registery
A

Change the state of the device shadow

That’s precisely the purposes of the device shadow, which gets synchronized with the device when it comes back online

38
Q

You are looking to continuously replicate a MySQL database that’s on premise to Aurora. Which service will allow you to do so securely?

  • AWS Direct Connect
  • Database Migration Services
  • AWS Lambda
A

Database Migration Service (DMS is fully secure)

39
Q

You have setup Direct Connect on one location to ensure your traffic into AWS is going over a private network. You would like to setup a failover connection, that must be as reliable and as redundant as possible, as you cannot afford to be down for too long. What backup connection do you recommend?

  • Another Direct Connect Setup
  • Site to Site VPN
  • Client Side VPN
  • Snowball Connection
A

Site to site VPN (although this is not as private as another Direct Connect setup, it is definitely more reliable as it leverages the public web. It is the correct answer here)

40
Q

You would like to transfer data in AWS in less than two days from now. What should you use?

  • Setup Direct Connect
  • Use Public Internet
  • Use AWS Snowball
  • Use AWS Snowmobile
A

Use the Public Internet

41
Q

If Kinesis Firehose experiences data delivery issues to S3, it will retry delivery to S3 for a period of ________.

A

24 hours

42
Q

Which service does Kinesis Firehose not load streaming data into?

  • Amazon S3
  • Amazon Redshift
  • Amazon Elasticsearch Service
  • DynamoDB
  • Splunk
A

DynamoDB

43
Q

Regarding SQS, which of the following are true? (Choose 3)

  • A queue can only be created in limited regions, and you should check the SQS website to see which are supported.
  • Messages can be retained in queues for up to 14 days.
  • A queue can be created in any region.
  • Messages can be retained in queues for up to 7 days.
  • Messages can be sent and read simultaneously.
A
  • Messages can be retained in queues for up to 14 days.
  • A queue can be created in any region.
  • Messages can be sent and read simultaneously.
44
Q

For which of the following AWS services can you not create a rule action in AWS IoT? (Choose 2)

  • CloudWatch
  • Aurora
  • Kinesis Firehose
  • Redshift
  • Kinesis Streams
  • DynamoDB
A

Aurora

Redshift

45
Q

For an unknown reason, data delivery from Kinesis Firehose to your Redshift cluster has failed. Kinesis Firehose retries the data delivery every 5 minutes for a maximum period for of 60 minutes; however, none of the retries deliver the data to Redshift. Kinesis Firehose skips the files and move onto the next batch of files in S3. How can you ensure that the undelivered data is eventually loaded into Redshift?

  • Skipped files are delivered to your S3 bucket as a manifest file in an errors folder. Run the COPY command manually to load the skipped files after you have determined why they failed to load.
  • Check the STL_LOAD_ERRORS table in Redshift, find the files that failed to load and manually, and load the data in those files using the COPY command.
  • You create a Lambda function to automatically load these files into Redshift by reading the manifest after the retries have been completed and the COPY command has been run.
  • Check CloudWatch Logs to determine which files in S3 were skipped by Kinesis Firehose, fix the files, and manually load them into Redshift.
A

Skipped files are delivered to your S3 bucket as a manifest file in an errors folder. Run the COPY command manually to load the skipped files after you have determined why they failed to load.

(Amazon Kinesis Firehose retries data delivery every 5 minutes for up to a maximum period of 60 minutes. After 60 minutes, Amazon Kinesis Firehose skips the current batch of S3 objects that are ready for COPY and moves on to the next batch. The information about the skipped objects is delivered to your S3 bucket as a manifest file in the errors folder, which you can use for manual backfill. For information about how to COPY data manually with manifest files, see Using a Manifest to Specify Data Files.)

46
Q

Which of the following AWS IoT components transforms messages and routes them to different AWS services?

  • Device Shadow
  • Device Gateway
  • Rules Engine
  • Rule Actions
A

Rules Engine

47
Q

Your company is launching an IoT device that will send data to AWS. All the data generated by the millions of devices your company is going to sell will be stored in DynamoDB for use by the Engineering team. Each customer’s data, however, will only be stored in DynamoDB for 30 days. A mobile application will be used to control the IoT device, and easy user sign-up and sign-in to the mobile application are requirements. The engineering team is designing the application to scale to millions of users. Their preference is to not have to worry about building, securing, and scaling authentication for the mobile application. They also want to use their own identity provider. Which option would be the best choice for their mobile application?

  • Use LDAP.
  • Use an Amazon Cognito identity pool.
  • Use a SAML identity provider.
  • Since everyone uses Facebook, Amazon, and Google, keep it simple and use all three.
A

Use an Amazon Cognito identity pool.

48
Q

Your team has successfully migrated the corporate data warehouse to Redshift. So far, all the data coming into the ETL pipeline for the data warehouse has been from other corporate systems also running on AWS. However, after signing some new business deals with a 3rd party, they will be securely sending files directly to S3. The data in these files needs to be ingested into Redshift. Members of your team are debating the most efficient and best automated way to introduce this change into the ETL pipeline. Which of the following options would you suggest? (Choose 2)

  • Use Data Pipeline.
  • Procure a new 3rd party tool that integrates with S3 and Redshift that provides powerful scheduling capabilities.
  • Work with the 3rd party’s IT team to install the Data Pipeline Task Runner package, then coordinate a VPN connection from their data center to AWS.
  • Use Lambda (AWS Redshift Database Loader).
  • Use the SWF service to write a custom workflow to process the incoming files from the 3rd party.
  • Run a cron job on a t2.micro instance that will execute Linux shell scripts.
A
Use Data Pipeline.
Use Lambda (AWS Redshift Database Loader).

You can use Data Pipeline, with a RedshiftCopyActivity, S3 and Redshift data nodes and a schedule or Lambda (using the AWS Redshift Database Loader. Further information: https://aws.amazon.com/blogs/big-data/a-zero-administration-amazon-redshift-database-loader/

49
Q

Kinesis Firehose buffers incoming data before delivering the data to your S3 bucket. What are the buffer size ranges?

  • 8 MB to 64 MB
  • 2 MB to 128 MB
  • 1 MB to 128 MB
  • 4 MB to 256 MB
A

1 MB to 128 MB

Each delivery stream stores data records for up to 24 hours in case the delivery destination is unavailable. The PutRecordBatch() operation can take up to 500 records per call or 4 MB per call, whichever is smaller. Buffer size ranges from 1 MB to 128 MB.

50
Q

True or False: Data Pipeline does not integrate with on-premise servers.

A

False
WS provides you with a Task Runner package that you install on your on-premise hosts. Once installed, the package polls Data Pipeline for work to perform. If it detects that an activity needs to run on your on-premise host (based on the schedule in Data Pipeline), the Task Runner will issue the appropriate command to run the activity, which can be running a stored procedure or a database dump or another database activity. Further information: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-how-remote-taskrunner-client.html

51
Q

What are the main uses of Kinesis Data Streams? (Choose 2)

  • They can accept data as soon as it has been produced, without the need for batching
  • They can carry out real-time reporting and analysis of streamed data
  • They can provide long term storage of data
  • They can undertake the loading of streamed data directly into data stores
A
  • They can accept data as soon as it has been produced, without the need for batching
  • They can carry out real-time reporting and analysis of streamed data
52
Q

Data delivery from your Kinesis Firehose delivery stream to the destination is falling behind. When this happens, you need to manually change the buffer size to catch up and ensure that the data is delivered to the destination.

  • False
  • True
A

False

In circumstances where data delivery to the destination is falling behind data ingestion into the delivery stream, Amazon Kinesis Firehose raises the buffer size automatically to catch up and make sure that all data is delivered to the destination. Further information: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-how-remote-taskrunner-client.html

53
Q

What does a partition key segregate and route ?

A

Records to different shards of a stream

54
Q

Who specifies the partition key while adding data to an Amazing Kinesis Data Stream ?

A

Data Producer

55
Q

What is a unique identifier for each record ?

A

Sequence Number

56
Q

When does Amazon Kinesis assign sequence numbers ?

A

When a Data Producer calls PutRecords

57
Q

What is KPL ?

A

Kinesis Library

58
Q

A web application emits multiple types of events to Amazon Kinesis Streams for operational reporting. Critical events must be captured immediately before processing can continue, but informational events do not need to delay processing.What is the most appropriate solution to record these different types of events?

  • Log all events using the Kinesis Producer Library.
  • Log critical events using the Kinesis Producer Library,and log informational events using the PutRecords API method.
  • Log critical events using the PutRecords API method,andloginformational events using the Kinesis Producer Library.
  • Log all events using the PutRecords API method.
A

Log critical events using the PutRecords API method,and log informational events using the Kinesis Producer Library.

(he core of this question is how to send event messages to Kinesis synchronously vs. asynchronously. The critical events must be sent synchronously,and the informational events can be sent asynchronously. The Kinesis Producer Library (KPL) implements an asynchronous send function, so it can be used for the informational messages. PutRecords is a synchronous send function, so it must be used for the critical events.)

59
Q

A mobile application collects data that must be stored in multiple Availability Zones within five minutes of being captured in the app.
What architecture securely meets these requirements?

  • The mobile app should write to an S3 bucket that allows anonymous PutObject calls.
  • The mobile app should authenticate with an Amazon Cognito identity that is authorized to write to an Amazon Kinesis Firehose with an Amazon S3 destination.
  • The mobile app should authenticate with an embedded IAM access key that is authorized to write to an Amazon Kinesis Firehose with an Amazon S3 destination.
  • The mobile app should call a REST-based service that stores data on Amazon EBS. Deploy the service on multiple EC2 instances across two Availability Zones.
A

The mobile app should authenticate with an Amazon Cognito identity that is authorized to write to an Amazon Kinesis Firehose with an Amazon S3 destination.

(It is essential when writing mobile applications that you consider the security of both how the application authenticates and how it stores credentials. Option A uses an anonymous Put, which may allow other apps to write counterfeit data; Option B is the right answer, because using Amazon Cognito gives you the ability to securely authenticate pools of users on any type of device at scale. Option C would put credentials directly into the application, which is strongly discouraged because applications can be decompiled which can compromise the keys.Option D does not meet our availability requirements: although the EC2 instances are running in different Availability Zones, the EBS volumes attached to each instance only store data in a single Availability Zone.)

60
Q

A data engineer needs to collect data from multiple Amazon Redshift clusters within a business and consolidate the data into a single central data warehouse. Data must be encrypted at all times while at rest or in flight.
What is the most scalable way to build this data collection process?

  • Run an ETL process that connects to the source clusters using SSL to issue a SELECT query for new data, and then write to the target data warehouse using an INSERT command over another SSL secured connection.
  • Use AWS KMS data key to run an UNLOAD ENCRYPTED command that stores the data in an unencrypted S3 bucket; run a COPY command to move the data into the target cluster.
  • Run an UNLOAD command that stores the data in an S3 bucket encrypted with an AWS KMS data key; run a COPY command to move the data into the target cluster.
  • Connect to the source cluster over an SSL client connection, and write data records to Amazon Kinesis Firehose to load into your target data warehouse.
A

-Use AWS KMS data key to run an UNLOAD ENCRYPTED command that stores the data in an unencrypted S3 bucket; run a COPY command to move the data into the target cluster.

(The most scalable solutions are the UNLOAD/COPY solutions because they will work in parallel, which eliminates A and D as answers.Option C is incorrect because the data would not be encrypted in flight,and you cannot encrypt an entire bucket with a KMS key. Option B meets the encryption requirements, the UNLOAD ENCRYPTED command automatically stores the data encrypted using-client side encryption and uses HTTPS to encrypt the data during the transfer to S3.)