D4: Analysis Flashcards

Question 1

Q

What are the main sources of data for Kinesis Analytics?

Answer

A

KDS, KDF
S3 can store Reference tables to be able to join/enrich the incoming data

Question 2

Q

How can Kinesis Analytics Reference Tables be used

Answer

A

You store a mapping file in S3 and make that available as a reference table in Kinesis Analytics
You use a JOIN command in SQL to join data that in.
Eg. you have zip codes in the data, but want to enrich that with the city names.

Question 3

Q

How are errors handled in Kinesis Analytics

Answer

A

There is an error stream that records will be written to when there are error conditions.

Question 4

Q

What are the possible destinations for Kinesis Analytics

Answer

A

KDS, KDF, and Lambda
Once you send data to Lambda, that opens up several other destinations that lambda integrates with (eg SNS, S3, Dynamo, Redshift, SNS, SQS, Cloudwatch, etc)

Question 5

Q

How does KDA for Apache Flink work?

Answer

A

Flink is an open source framework for hanlding data streams.
You can develop and use that Flink application and store that in S3, reference that when you setup KDA for Flink
Serverless, you dont need to worry about where/how Flink runs

Question 6

Q

What are some common use cases for KDA?

Answer

A

Streaming ETL
Continuous metric generation
Responsive analytics - eg computing the availability or success of a customer facing API over time an send that to Cloudwatch

Question 7

Q

KDA Cost Model, Security

Answer

A

Serverless, you pay for what you consume
1 KPU = 1 vCPU + 4GB mem
Use IAM permissions to access streaming source

Question 8

Q

What is KDA Scema Discovery?

Answer

A

KDA can analyze an incoming stream to discover the schema

Question 9

Q

What is RANDOM_CUT_FOREST?

Answer

A

It is a SQL function used for anomaly/outlier detection on numeric columns in the stream
Example: detect anomalous subway ridership during NYC marathon.