Development (data processing, analytics and big data solutions) Flashcards
What service would you use to ANALYSE data using SQL?
Athena
What service would allow you to COLLECT and PROCESS large amounts of data on the availability of flights to display to customers?
Kinesis Data Streams because it specializes in handling real-time streaming data
You want to put some data in and search it.
What is a good option for this?
Opensearch
If you hear “real time streaming of data” what’s usually the answer?
Kinesis because it takes that real time data and has it ready for us whenever we want to process it
What would you use to write standard SQL queries on STREAMING DATA?
Kinesis Analytics is best for real time analytics on streaming data using SQL queries
A supermarket wants to analyze real time data of users based on their clicks on the web page
What service should you use
Kinesis firehose because it can ingest the clickstream data and send it to an analytics service
You have video camera door bell. You want to STORE the data for potential playback and for ML ANALYTICS
What is a good option?
Kinesis video streams
You want to LOAD a video and transform it into DIFFERENT FORMATS
What is a good option?
Elemental Media Services
Name Amazon’s version of zoom?
Chime
You want to LOAD streaming data into Kinesis Analytics, S3, Redshift and Open Search
What should you use?
Kinesis firehose because it can do it 1 step
If you used streams you would need to add a lambda function or something
What would you do to keep a strong performance if shards are hot?
Split the shards
5MB ingest and 40MB read
What is the min number of shards it needs
20 shards
because 1 write, 2 read
so
5/1 for write
40/2 for read
What can you use to simplify data integration and ETL (Extract, Transform, Load) processes
Glue
Glue simplifies data integration and ETL processes by automatically cataloging, cleaning, and transforming data
Name the 3 steps for processing real time data using Kinesis?
Data (temp sensors/click stream data/load sale info gather & batch so we can process in real time) → Kinesis → Compute (ec2 cluster, lambdas etc)
Name 2 benefits of Kinesis?
Duarability
Scalability