Section 19: AWS Integration & Messaging: SQS, SNS & Kinesis: Kinesis Flashcards

1
Q

What does AWS Kinesis do?

A

Collect, process, and analyze real-time video and data streams

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What kind of real-time data is Kinesis well suited to ingest?

A

logs, metrics, website clickstreams, IoT telemetry data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

There are four types of Kinesis, what are they? (names only)

A
  • Kinesis Data Streams
  • Kinesis Data Firehose
  • Kinesis Data Analytics
  • Kinesis Video Streams
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which Kinesis type is best to: capture, process, and store data streams?

Your options are: Kinesis Data Streams, Kinesis Data Firehose, Kinesis Data Analytics, Kinesis Video Streams.

A

Kinesis Data Streams

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which Kinesis type is best to: load data streams into AWS data stores?

general Kinesis pic

Your options are: Kinesis Data Streams, Kinesis Data Firehose, Kinesis Data Analytics, Kinesis Video Streams.

A

Kinesis Data Firehose

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which Kinesis option is best suited to: analyze data streams with SQL or Apache Flink?

Your options are: Kinesis Data Streams, Kinesis Data Firehose, Kinesis Data Analytics, Kinesis Video Streams.

A

Kinesis Data Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which Kinesis option is best to: capture, process, and store video streams?

general kinesis pic.

Your options are: Kinesis Data Streams, Kinesis Data Firehose, Kinesis Data Analytics, Kinesis Video Streams.

A

Kinesis Video Streams.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a shard, in terms of data?

A

A part of a dataset, when that dataset has been partitioned.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Can applications, clients, sdk, kinesis produer library (kpl), and kinesis agents all be Kinesis Data Stream producers?

A

Yes. maybe even at the same time idk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Is a Kinesis Data Stream stream made up of shards?

A

yes. Again, as it pertains to data, a shard is a part of a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Can you scale up the number of shards in a Kinesis Data Stream… stream?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
  1. What is the max size of a record that can be sent to a Kinesis Data Stream?
  2. What is the MB/sec throughput at which a record can be sent to a Kinesis Data Stream shard?
  3. What is the alternative number of messages per sec throughput at which a records can be sent to a Kinesis data stream shard?
A
  1. 1 MB.
  2. 1 MB/sec
  3. 1000 msg/sec
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When sending a record from a producer to a Kinesis stream-shard, what three things does the record consist?

A
  1. sequence number (unique per partition-key within shard)
  2. A partition key (A partition key is used to group data by shard within a stream.) (myst specify while put records into stream)
  3. data blob (up to 1 MB)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Can all of the following be consumers of Kinesis Data Stream records?
* lambda
* kinesis data firehose
* kinesis data analytics
* custom consumer (aws sdk) - Classic or Enhanced Fan Out
* Kinesis Client Library (KCL) - Library to simplify reading from a data stream
* apps with an ec2 symbol was also one of the slides, but not another

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the three things that are in each record being sent from a Kinesis Data Stream stream-shard to a Kinesis Data Stream consumer?

A
  • Partition key
  • Sequence number
  • data blob
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the two reates at which records can be sent from a Kinesis Data Stream stream-shard to a Kinesis Data Stream consumer?

A
  1. 2 MB/sec (shared) per shard - all consumers
    OR
  2. 2 MB/sec (enhanced) per shard per consumer

I’m thinking you can only pick one per kinesis data stream? or maybe only one can happen at a time?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

About Kinesis Data Streams:
1. what is the retention
2. do you have the ability to reprocess (replay) data
3. can data be deleted once it’s been inserted into Kinesis?

A
  1. between 1 day and 365 (inclusive (including 1 and 365))
  2. yes
  3. Nope. data inserted into Kinesis is immutable.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

About Kinesis Data Streams:
4. Does data that shares the same partition go into the same shard (this is really confusing, given that I assumed a shard was just a partition).

A

Yes, they call it ordering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q
  1. What are the producers available to have as a Kinesis Data Stream
A
  1. AWS SDK;
  2. Kinesis Producer Library (KPL);
  3. Kinesis Agent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q
  1. What are the consumers available to a Kinesis Data Stream?
A
  1. you can write your own Kinesis Client Library (KCL) or by using AWS SDK
  2. Alternatively, you can use a managed consumer like aws lambda, kinesis data firehose, or kinesis data analytics.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Kinesis Data Streams have two capacity modes, what are they?

A
  1. Provisioned
  2. on-demand mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

You need the following things, do you use Kinesis Data Stream Provisioned capacity mode, or Kinesis Data Stream On-Demand capacity mode?
* you need to choose the number of shards provisioned, and scale manually or using API
* you need each shard to have up to 1MB/s in (or 1000 records per second)
* you need each shard to get 2MB/s out (for a classic or enhanced fan-out consumer)
* you need to pay per shard provisioned per hour

A

Capacity mode: Provisioned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

You need the following things, do you use Kinesis Data Stream Provisioned capacity mode, or Kinesis Data Stream On-Demand capacity mode?
* you don’t want to provision or manage the capcity
* you’re perfectly happy with 4MB/s in or 4000 records per second (this is the default capacity provisioned for this capacity mode)
* you’re very happy for your stream to scale automatically based on observed throughout peak during the last 30 days
* you’re happy to pay per stream per hour & data in/out per GB

A

Capacity mode: On-demand mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Let’s talk about Kinesis Data Stream Security:
1. how do you control access/authorization?
2. can you do encryption in flight? using what?
3. can you do encryption at rest? using what?

A
  1. control access/authorization using IAM policies
  2. encryption in flight using https endpoints
  3. encryption at rest using kms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Let’s talk about Kinesis Data Stream Security:
1. can you implement encryption/decryption of data on the client side?
2. are vpc endpoints available for kinesis to access within VPC?
3. how can you monitor api calls?

A
  1. you can implement encyrption/decryption of data on the client side, but it is harder
  2. vpc endpionts are available for Kinesis to access within VPC
  3. you can monitor api calls using CloudTrail
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What do Kinesis producers do?

A

put data records into data streams

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are the Kinesis data stream producer options? (possible these kinesis producers are not specific to data streams, but applicable to the other kinesis subservices (firehose, video stream, analystics)

A
  • aws sdk (simple producer)
  • kinesis produver library (kpl): C++, Java, batch, compressoin, retries
  • kinesis agent (monitor log files)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Kinesis producer write throughput?

A

1MB/sec or 1000 records/sec per shard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Kinesis producer APIs?

A
  • PutRecord API
  • PutRecords API - this one will help reduce costs and increase throughput
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

T/F

A hash function is used to map record partition keys (from the producer) to the shard handling that particular set of partition keys.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is the SubscribeToShard API

A

The SubscribeToShard API is a high-performance streaming API that pushes data from shards to consumers over a persistent connection without a request cycle from the client. The SubscribeToShard API uses the HTTP/2 protocol to deliver data to registered consumers whenever new data arrives on the shard, typically within 70 milliseconds, offering approximately 65% faster delivery compared to the GetRecords API. The consumers will enjoy fast delivery even when multiple registered consumers are reading from the same shard.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Q: What is enhanced fan-out?
Q: When should I use enhanced fan-out?

A
  • Enhanced fan-out is an optional feature for Kinesis Data Streams consumers that provides logical 2 MB/second throughput pipes between consumers and shards. This allows you to scale the number of consumers reading from a data stream in parallel, while maintaining high performance.
  • You should use enhanced fan-out if you have, or expect to have, multiple consumers retrieving data from a stream in parallel, or if you have at least one consumer that requires the use of the SubscribeToShard API to provide sub-200 millisecond data delivery speeds between producers and consumers.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q
  1. Why might you be seeing “ProvisionedThroughputExceeded”?
  2. What can you do about it option 1.
  3. What can you do about it option 2.
  4. What can you do about it option 3.
A
  1. You might be seeing it if your from-producers-traffic grew more than double the previous peak within a 15 minute duration.
  2. add more shards, then retry the throttled requests.
  3. retries with exponential backoff
  4. use highly distributed partition key
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is high distrubution, as it relates to keys?

A

Key distribution
* An indexed storage system will use keys to determine where it should store, or look for, associated data. One strategy to optimize access to data is to spread that data out over a number of storage locations. As a result, because each location should have less data, it will be faster to find something within a given collection. Consider having one million records: searching for a given record will, in the worst case, require the system to look at one million different elements to find what was asked for. On the other hand, if those one million records were divided into one-thousand groups, being able to limit your search to one group would reduce that worst-case to only one thousand elements.

  • The relative probability that a key will direct a search to a given location is known as the “key distribution”. An even distribution means that, for any given key, the probability of being directed to any location is as likely as another and finding data will therefore be more efficient.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What does “retries with exponential backoff” mean?

A

Retries with exponential backoff is a technique that retries an operation, with an exponentially increasing wait time, up to a maximum retry count has been reached (the exponential backoff).

https://learn.microsoft.com/en-us/dotnet/architecture/microservices/implement-resilient-applications/implement-retries-exponential-backoff (everything else is from aws and steph’s course)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What are some Kinesis Data Stream Consumers?

A
  • lambda
  • kinesis data firehose
  • kinesis data analytics
  • custom consumer (aws sdk) - Classic or Enhanced Fan Out
  • Kinesis Client Library (KCL) - Library to simplify reading from a data stream
  • apps with an ec2 symbol was also one of the slides, but not another
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What do Kinesis Data Stream Consumers do?

A

get data records from data streams and process them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

There are two Kinesis Custom Consumer options, what are they? (name only)

A

Shared (Classic) Fan-out and Enhanced Fan-out consumer

39
Q

What is the main difference between Kinesis Custom Consumers: Shared (Classic) Fan-Out Consumer and Enhanced Fan-out Consumer?

A
  • With Shared Classic Fan-Out Consumer you get 2 MB/sec per shard across all consumers. That means if four consumers are getting records from Shard 1, then each consumer might be taking up 0.5 MB/sec throughput (it was not mentioned whether throughput was distrubuted easily, i just made up those numbers for the purpose of illustration).
  • With enhanced fan out consumers you get 2MB/sec per consumer per shard. That means if four consumers are trying to get a record for Shard 1, each consumer might be taking up 2MB/sec throughput, and together the four consumers might be using 8 MB/sec throughput.
40
Q

What is the API consumers use to obtain data records from Kinesis Data Stream Shards?

A

For Shared (Classis) Fan-out COnsumer the api is GetRecords(), but for Enhanced Fan-out Consumer it’s SubscribeToShard().

41
Q

Does this describe Kinesis Shared Classic Fan-Out Consumers or enhanced fan-out Consumers?
* allows only for (or perhaps best for) a low number of consuming applications
* max 5 GetRecords API calls/sec
* latency ~200 ms
* Minimizes costs
* consumers poll data from Kinesis using GetRecords API call
* Returns up to 10 MB (then throttle for 5 seconds) or up to 10000 records

A

Shared (Classic) Fan-Out Consumer

42
Q

Is Kinesis Consumer Shared (Classic) Fan-out Consumer the kind where consumers poll data from Kinesis using GetRecords API, or the kind where Kinesis pushes data to consumers over HTTP/2 using SubscribeToShard API?

A

It’s the kind where consumers poll data from Kinesis using GetRecords API.

43
Q

Is Kinesis Consumer Enhanced Fan-out Consumer the kind where consumers poll data from Kinesis using GetRecords API, or the kind where Kinesis pushes data to consumers over HTTP/2 using SubscribeToShard API?

A

The kind where kinesis pushes data to consumers over HTTP/2 using SubscribeToShard API.

44
Q

T/F does this describe enhanced fan out consumer?

  • multiple consuming applications for the same stream
  • 2MB/sec per consumer per shard
  • Latency ~70ms
  • Higher costs ($$$)
  • Kinesis pushes data to consumers over HTTP/2 SubscribeToShard API
  • Soft limit of 5 consumer applications (KCL) per data stream (default)
A

T

45
Q

Kinesis Consumers - AWS Lambda
1. Does it support classic and enhanced fan-out consumers?
2. What happens if you’re using kinesis data streams with aws lambda and an error occurs?

A
  1. yes
  2. lambda will retry until it succeeds or data expires.
46
Q

How many batches per shard can kinesis data streams + aws lambda handle simultaneously?

A

10

47
Q

When using Kinesis Consumers with AWS Lambda:
1. are records read in batches?
2. can you configure batch size and batch window?

A
  1. Yes
  2. Yes
48
Q

You want to use Kinesis Data Streams cuz you have a ton of streaming data. What’s a popular aws service for processing records, and where is a popular place to save records after that?

A
  1. AWS Lambda
  2. DynamoDB
49
Q

What is the Kinesis Client Library (KCL)?

A

A java library that helps read records from a Kinesis Data Stream with distributed applications sharing the read workload.

50
Q

Is a java library basically just a python package?

A

no it sounds like both java and python have packages and libraries. It seems like libraries tend to have maybe a ton of packages in them? or a bunch of smaller pieces of code anyway.

51
Q

If you’re using Kinesis Client Libraries each shard is to be read by only one KCL instance (where a KCL instance might be running on ec2 or elastic beanstalk or on some on-premise solution). Does this mean each shard needs it’s own KCL instance?

A

No. It means that two shards could be being read from the same KCL instance, but that each shard will only get read by one instance. Just look at the picture.

52
Q

What can KCL run on?

A

ec2, elastic beanstalk, on-premise

53
Q

when using KCL, are records read in order at the shard level?

A

Yes.

54
Q

When using KCL, is progress checkpointed into DynamoDB? Do you need a kind of access for that?

A
  1. yes
  2. yes, IAM access
55
Q
  1. What type(s) of kinesis data stream consumer does KCL 1.x support?
  2. KCL 2.x?
A
  1. shared consumer
  2. supports shared and enhanced fan out consumer
56
Q

How do you increase the stream capacity? (what is stream capacity?)

A
  1. well i think stream capacity is the number of shards in your stream.
  2. you can increase it by shard splitting (a kinesis operation). When you shard split, you turn one “hot shard” into two shards. In split shards, each shard gets 1MB/s data in (incoming?) per shard). You can’t split a shard into more than two shards in a single operation. There is no automatic scaling - this happens manually.
57
Q

You maybe don’t have a ton of use for your shards at the moment. How can you save costs?

A
  • You can decrease stream capacity by merging shards (shard merging is another kinesis operation).
  • you can merge two shards with low traffic (shards with low traffic are called cold shards, btw). You can’t merge more than two shards in a single operation.
  • old shards are closed and will be deleted once the data is expired.
58
Q

Popular producers (also called sources, though that understanding should be verified, not feeling particularly careful atm) for Kinesis Data Firehose?

A
  • Apps
  • clients
  • sdks
  • kinesis agents
  • AWS IoT
  • Amazon CloudWatch (Logs and Events)
  • Kinesis Data Streams
59
Q

Size of records that can go through Kinesis Data Firehose ()

A
60
Q

What is a destination in Kinesis Data Firehose?

A

A destination is the data store where your data will be delivered. Kinesis Data Firehose currently supports Amazon S3, Amazon Redshift, Amazon OpenSearch Service, Splunk, Datadog, NewRelic, Dynatrace, Sumo Logic, LogicMonitor, MongoDB, and HTTP End Point as destinations.

61
Q

Q: What is Streaming ETL?

A

Streaming ETL is the processing and movement of real-time data from one place to another. ETL is short for the database functions extract, transform, and load. Extract refers to collecting data from some source. Transform refers to any processes performed on that data. Load refers to sending the processed data to a destination, such as a warehouse, a datalake, or an analytical tool.

62
Q

Q: What is a delivery stream in Kinesis Data Firehose?

A

A delivery stream is the underlying entity of Kinesis Data Firehose. You use Firehose by creating a delivery stream and then sending data to it. You can create an Kinesis Data Firehose delivery stream through the Firehose Console or the CreateDeliveryStream operation. For more information, see Creating a Delivery Stream.

63
Q

Q: What is a record in Kinesis Data Firehose?

A
  • A record is the data of interest your data producer sends to a delivery stream. The maximum size of a record (before Base64-encoding) is 1024 KB if your data source is Direct PUT or Kinesis Data Streams. The maximum size of a record (before Base64-encoding) is 10 MB if your data source is Amazon MSK.
  • Okay, this is possibly noteworthy. Steph says that the max size of a record is 1MB. the aws docs says the max size of record (within parameters mentioned above) is 1024 KB. It turns out that 1MB is 1000 KB. So I’m thinking maybe he rounded here, and am wondering where else he might have rounded. So if you see that on the exam go with either. Maybe 1024 first. https://aws.amazon.com/kinesis/data-firehose/faqs/
64
Q

How does (or maybe it’s just one way of doing this) Kinesis Data Firehose get data to whatever destinations the data is going?

A

batch writes.

65
Q

What aws service is popularly used (or perhaps the only option?) by Kinesis Data Firehose for data transformation?

A

AWS Lambda/lambda functions.

66
Q

When you’re thinking about Kinesis Data Firehose, what is important to remember about costs?

A

You pay for data going through FIrehose.

67
Q

Does Kinesis Data Firehose work in real time?

A

No, it works in near real time. It has 60 second latency minimum for non full batches, or a minimum of 1MB of data at a time (you have the option)

68
Q

Can Kinesis Data Firehose send failed or all data to a backup S3 backup?

A

Yes

69
Q

What’s your guess for when you’d use Kinesis Data Streams vs Kinesis Data Firehose?

A

My guess is you’d use Kinesis Data Streams when you’re like, “no, i need real time (~200ms) processing/movement/whatever and I don’t care if I have to write custom code for my producers/consumers to get that real time stuff. and yes I need data storage for 1 to 365 days and replay capability”. And then you’d use Kinesis Data Firehose when you’re like “actually i don’t care if processing/storage happens in about ~200 ms, I’m perfectly happy with that processing/tranforming/storage stuff taking about 60 seconds and I do like that firehose is aws fully managed and yes I will gladly pay the extra cost for all that. And also I don’t need data storage or replay capability”.

70
Q

T/F does this look accurate?

  1. Kinesis Data FIrehose:
    * streaming service for ingest at scale
    * write custom code for producers consumers
    * real time (~200ms)
    * manage your own scaling using shard splitting and merging
    * data storage for 1 to 365 days
    * supports replay capability

AAAAAAAAAAAA

  1. Kinesis Data Streams
    * Load streaming data into S3/Redshift/OpenSearch/3rd party/custom HTTP
    * fully managed
    * near real-time (buffer time of about 60 seconds)
    * automatic scaling
    * no data storage
    * doesn’t support replay capability
A

Nope. 1 is actually Kinesis Data Streams, and 2 is actually Kinesis Data Firehose.

71
Q

Okay, Kinesis Data Analyticsfor SQL applications.
1. What do they call producers (hint, it’s not producers).
2. what do they call consumers (hint, it’s not ‘consumers’)

A
  1. sources
  2. sinks (eye roll. like they couldn’t just use the same term)
72
Q

Okay, so what are the two options for Kinesis Data Analytics for SQL Application sources?

A
  1. Kinesis Data Streams
  2. inesis Data Firehose
73
Q

Aside from sources (Kinesis Data Streams or Kinesis Data Firehose), what are the other inputs needed for Kinesis Data Analytics for SQL Applications?

A
  1. SQL Statements
  2. Reference Data from S3 (used to enrich streaming data - note that i’m not 100% sure you have to add the reference data from S3 and a hands on would probably clear that up)
74
Q

What are the sink options for Kinesis Data Analytics for SQL Applications (just names)?

A

Okay, unfortunately aws is confusing about this.
1. Per steph’s slides, it looks like Kinesis Data Analytics for SQL Applications can connect directly to sink options Kinesis Data Streams (which can connect directly to aws lambda or applications) and Kinesis Data Firehose (which can connect to s3 or redshift-copy-through-s3 or other firehose destinations).
2. However the aws doc https://docs.aws.amazon.com/kinesisanalytics/latest/dev/what-is.html is not clearly written. In one place it says what steph says above. But in the next paragraph it says that “Kinesis Data Analytics supports Amazon Kinesis Data Firehose (Amazon S3, Amazon Redshift, Amazon OpenSearch Service, and Splunk), AWS Lambda, and Amazon Kinesis Data Streams as destinations.”. It’s possible that the secondary paragraph, the one in quotations above, is referring to Kinesis Data Analytics as a whole and the thing steph and aws agree on is specific to Kinesis Data Analytics for SQL Applications? But then again, that quote is from a page titled Kinesis Data Analytics for SQL Applications. Good luck figuring that out.

75
Q
  1. With Amazon Kinesis Data Analytics for SQL Applications elevator pitch.
  2. Name a few use cases for Kinesis Data Analytics for SQL Applications.
A
  1. A: With Amazon Kinesis Data Analytics for SQL Applications, you can process and analyze streaming data using standard SQL (first line from an aws doc). B: real time analytics on Kinesis Data Streams and Firehose using SQL (steph slide)
  2. The service enables you to quickly author and run powerful SQL code against streaming sources to perform time series analytics, feed real-time dashboards, and create real-time metrics.
76
Q

So lets say the two output/sink options for Kinesis Data Analytics for SQL Applications are Kinesis Data Streams and Kinesis Data Firehose. When would you use the former, and when would you use the latter?

A
  1. you’d use Kinesis Data Stream as an ouput when you want to create streams out of hte real-time analytics queries
  2. you’d use Kinesis Data Firehose as an output when you want to send analytics query results to destinations
77
Q
  1. Is Kinsis Data Analytics for SQL Applications fully manged?
  2. does it scale automatically or manually
  3. do you pay for anything?
A
  1. fully managed, no servers to provision
  2. automatic scaling
  3. pay for actual consumption rate
78
Q

Kinesis Data Analytics for Apache Flink, it’s new name is?

A

Amazon Managed Service for Apache Flink

79
Q

Amazon Managed Service for Apache Flink, it’s old name was?

A

Kinesis Data Analytics for Apache Flink

80
Q

What is apache flink?

A

Stateful Computations over Data Streams
Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale.

source: https://flink.apache.org/, since this quote didn’t come from steph or aws.

81
Q

What Does Kinesis Data Analytics for Apache Flink (Amazon Managed Service for Apache Flink) do?

A

It uses Flink (Java, Scala or SQL) to process and analyze streaming data.

82
Q

What aw

  1. what aws services seem like popular inputs (a name like sources, consumers whatever isn’t actually given for inputs this time) for Kinesis Data Analtyics for Apache Flink (Amazon Managed Service for Apache Flink)
  2. Is Firehose an option?
A
  • Kinesis Data Streams
  • Amazon MSK (whatever that is)
  • another slide says that Flink does not read from Firehose, but that you can use Kinesis Analytics for SQL instead.
83
Q

T/F

The Kinesis Data Analytics for Apache Flink (Amazon Managed Service for Apache Flink) flowchart doesn’t really look like there’s anything going on other than an input going into Kinesis Data Analytics for Apache Flink (Amazon Managed Service for Apache Flink).

A

True.

84
Q

Is Kinesis Data Analytics for Apache Flink (Amazon Managed Service for Apache Flink) aws managed?

A

Yes. Apache Flink applications run on a managed aws cluster (hope i’m getting this wording correct)

85
Q

What does aws management include for Kinesis Data Analytics for Apache Flink (Amazon Managed Service for Apache Flink)

A
  • provisioning compute resources, parallel computation, automatic scaling
  • application backups (implemented as checkpoints and snapshots)
  • use any apache flink programming features
86
Q

Is there application backup when using Kinesis Data Analytics for Apache Flink (Amazon Managed Service for Apache Flink)?

A

Yes. application backups are implemented as checkpoints and snapshots.

87
Q

Can you use any apache filnk programming feature when using Kinesis Data Analytics for Apache Flink (Amazon Managed Service for Apache Flink)?

A

Yes

88
Q

Ordering data into (getting data in an orderly manner into) Kinesis

Say you have 100 trucks (truck_1, …, truck_100) on the road sending their GPS positions regularly into AWS. You want to consume the data in order for each truck, so that you can track their movement accurately. How should you send that data into Kinesis?

A
  • Send using “Partition Key” value of the “truck_id.
  • The same key will always go to the same shard.
89
Q

Ordering data into (getting data in an orderly manner into) SQS

For SQS standard, is there ordering? How?

A

For SQS standard, there is no ordering.

90
Q

For SQS FIFO is there ordering? How does it work if you have one consumer?

A

For SQS FIFO, if you don’t use a Group ID, messages are consumed in the order they are sent, with only one consumer (i think they mean that only one consumer is used with this method).

91
Q

For SQS FIFO is there ordering? How does it work if you have more than one consumer?

A

If you want to scale out the number of consumers and you want the messages to be grouped when they are related to each other, then you use a Group ID (similar to Partition Key in Kinesis).

92
Q

Kinesis vs SQS ordering.

Assume you have 100 trucks, 5 kinesis shards, 1 sqs fifo. For Kinesis Data Streams:
1. how many trucks per shard do you have on average?
2. will trucks have their data ordered within each shard?
3. what is the max amount of consumers you can have in parallel?
4. what is the max MB/s you can receive?

A
  1. 20
  2. yes
  3. 5
  4. 5 MB/s
93
Q

Kinesis vs SQS ordering.

Assume you have 100 trucks, 5 kinesis shards, 1 sqs fifo. For SQS FIFO:
1. how many SQS queues do you have?
2. how many Group IDs
3. up to how many consumers can you have?
4. up to how many messages per second can you have?

A
  1. one
  2. 100
  3. 100 (due to the 100 group id)
  4. 3000 if using batching
94
Q

Which one is Kinesis, which one is SQS, and which one is SNS?

A
  1. SQS
  2. SNS
  3. Kinesis