D4: Analysis Flashcards
What are the main sources of data for Kinesis Analytics?
KDS, KDF
S3 can store Reference tables to be able to join/enrich the incoming data
How can Kinesis Analytics Reference Tables be used
- You store a mapping file in S3 and make that available as a reference table in Kinesis Analytics
- You use a JOIN command in SQL to join data that in.
- Eg. you have zip codes in the data, but want to enrich that with the city names.
How are errors handled in Kinesis Analytics
There is an error stream that records will be written to when there are error conditions.
What are the possible destinations for Kinesis Analytics
KDS, KDF, and Lambda
Once you send data to Lambda, that opens up several other destinations that lambda integrates with (eg SNS, S3, Dynamo, Redshift, SNS, SQS, Cloudwatch, etc)
How does KDA for Apache Flink work?
- Flink is an open source framework for hanlding data streams.
- You can develop and use that Flink application and store that in S3, reference that when you setup KDA for Flink
- Serverless, you dont need to worry about where/how Flink runs
What are some common use cases for KDA?
- Streaming ETL
- Continuous metric generation
- Responsive analytics - eg computing the availability or success of a customer facing API over time an send that to Cloudwatch
KDA Cost Model, Security
Serverless, you pay for what you consume
1 KPU = 1 vCPU + 4GB mem
Use IAM permissions to access streaming source
What is KDA Scema Discovery?
KDA can analyze an incoming stream to discover the schema
What is RANDOM_CUT_FOREST?
It is a SQL function used for anomaly/outlier detection on numeric columns in the stream
Example: detect anomalous subway ridership during NYC marathon.