Kinesis Flashcards

Question

if we have an application and you have one million users

Answer 1

user ID is a great key right because we have one million users and so realistically all the users will do actions in different times but we're gonna get ordering for that user ID which is our message key and so user ID is a good one. Very distributed, very active and useful from a business perspective. But if you have country ID as a field and it turns out that 90% of your users are in one country say The United States then it's not good because all your country ID will go to one shard.

Answer 2

And if you get an exception called ProvisionedThroughputExceeded that's when you go over the limits. when we send more data than what was provisioned. we exceed the number of megabytes per second or transactions per second.

Answer 3

CLI, but you can use the SDK or producer libraries | from various frameworks.

Answer 4

And for this you can just 1. use retries and exponential backoff. 2. to increase the number of shards 3. to ensure that the partition key is a good one.

Answer 5

the CLI, the SDK, or the Kinesis Client Library (available in Java, Node, Python, Ruby or .Net.)

Answer 6

it uses also DynamoDB to checkpoint the offsets and to track other workers and share the work amongst shards. We'll have a DynamoDB table and the Kinesis app that uses KCL the client library. We'll checkpoint the progress through Amazon DynamoDB and then they will synchronize their work between them to consume messages from different shards.

Answer 7

1. we can control access and authorization to Kinesis using IAM policies. 2. encryption in flight using HTTPS endpoints. 3. encryption at rest using KMS. 4. There is a possibility to also encrypt and decrypt the data client side but it's much harder to implement. You need to write your own code. 5. you can also have VPC Endpoints available for Kinesis to access privately within a VPC.

Answer 8

a fullly managed service. There is no administration needed, it scales automatically, fully serverless. We're not going to prevision anything in advance. It's going to be near real time. (Kinesis streams was real time) = 60 seconds latency minimum for non full batches.

Answer 9

to load data into Redshift, Amazon S3, ElasticSearch and Splunk. (exam)

Answer 10

we will write about 32 MB of data as a minimum, at a time to load into these stores.

Answer 11

many formats, conversions, transformations, and compression. handy with CSV, JSON

Answer 12

the amount of data going through Firehose. you don't pay for provisioning Firehose. But you do first data streams.

Answer 13

could be Kinesis Producer Library, a kineses agent, a kinesis data stream Both the agent and the kinesis data streams can send data directly into the data firehose. can even be CloudWatch Logs or CloudWatch Events. And then you can do some transformations using a Lambda function.

Answer 14

Streams = 1. when you write custom code. You need to write your own producer, your own consumer most of the time. 2. And it's going to be real time. About 200 ms latency. 3. You must manage scaling yourself so you must do something called shard splitting or shard merging. And so that means that you have to do capacity planing over time. 4. you can store data and it is going to expire between one to seven days. So if you need a place to just store data for three days. kinesis data streams is a great way of doing it. 5. Thanks for this you can do replay capability 6. it's multi consumers. Firehose 1. fully managed, you're only provision capacity, 2. you send data to S3, Splunk, Redshift and ElasticSearch. 3. serverless so data transfromations with Lambda. 4. near real time 5. automated scaling. 6. there is no data storage. So you cannot replay from Firehose.

Answer 15

can take data from kinesis data streams and kinesis data firehose and perform some queries. the output of these queries can be analyzed by your analytics tools, rear outputs. performs a real time analytics using SQL. auto scaling, managed no servers to provision, continuous it's going to be real time out of these queries you can create new streams so they can be consumed again by consumers or by kinesis data firehose

Answer 16

the actual consumption rate of kinesis data analytics.

Answer 17

in Kinesis if we have 5 shards and 100 IDs, more or less 20 trucks will be assigned to each shard (partition ID) based on the hashed ID of the objects we are processing And we can have therefore only 5 parallel consumers In SQS FIFO there is only one queue. But we can create groups with IDs. So we can have 100 groups based on the IDs of the objects. So we will be able to have 100 parallel consumers.

Answer 18

1. the consumers pull data and the data is going to be deleted right after being consumed. 2. You can have as many consumers as we want 3. you don't need to provision throughput, it will scale automatically for you. 4. There is no ordering guarantee unless you use the FIFO queue, but if you use the FIFO que then you get limited throughputs 5. there is an individual message delay capability so you can take a message and say be consumed in 15 minutes.

Answer 19

1. pub/sub so you push data to many subscribers. 2. you can have up to 10 million subscribers to one topic, up to 10,000 topics. 3. the data is not persisted, so that means that it's lost if not delivered. 4. you don't need to provision the throughput in advance 5. if you wanted to persist the data out, deliver it to many SQS ques 6. you can use a fan-out architecture to integrate it with SQS

Answer 20

1. a pull of data, so like SQS we pull data, SNS was pushing data. 2. We can have as many consumers as we want, but we can only have one consumer per shard. 3. the possibility to replay data is available; we could reprocess a whole day of data 4. meant for real-time big data, analytics and ETL, (exam) we wanna do real-time ingestion of data of IOT any time you real-time big data 5. There is ordering but it's at the shard level. 6. the data expires after X number of days 7. there is some data retention but it's temporary. 8. you must provision your throughput in advance

Kinesis Flashcards

(44 cards)