Development (data processing, analytics and big data solutions) Flashcards
What service would you use to ANALYSE data using SQL?
Athena
What service would allow you to COLLECT and PROCESS large amounts of data on the availability of flights to display to customers?
Kinesis Data Streams because it specializes in handling real-time streaming data
You want to put some data in and search it.
What is a good option for this?
Opensearch
If you hear “real time streaming of data” what’s usually the answer?
Kinesis because it takes that real time data and has it ready for us whenever we want to process it
What would you use to write standard SQL queries on STREAMING DATA?
Kinesis Analytics is best for real time analytics on streaming data using SQL queries
A supermarket wants to analyze real time data of users based on their clicks on the web page
What service should you use
Kinesis firehose because it can ingest the clickstream data and send it to an analytics service
You have video camera door bell. You want to STORE the data for potential playback and for ML ANALYTICS
What is a good option?
Kinesis video streams
You want to LOAD a video and transform it into DIFFERENT FORMATS
What is a good option?
Elemental Media Services
Name Amazon’s version of zoom?
Chime
You want to LOAD streaming data into Kinesis Analytics, S3, Redshift and Open Search
What should you use?
Kinesis firehose because it can do it 1 step
If you used streams you would need to add a lambda function or something
What would you do to keep a strong performance if shards are hot?
Split the shards
5MB ingest and 40MB read
What is the min number of shards it needs
20 shards
because 1 write, 2 read
so
5/1 for write
40/2 for read
What can you use to simplify data integration and ETL (Extract, Transform, Load) processes
Glue
Glue simplifies data integration and ETL processes by automatically cataloging, cleaning, and transforming data
Name the 3 steps for processing real time data using Kinesis?
Data (temp sensors/click stream data/load sale info gather & batch so we can process in real time) → Kinesis → Compute (ec2 cluster, lambdas etc)
Name 2 benefits of Kinesis?
Duarability
Scalability
What is a shard?
A shard is the capacity of the data stream
When would you re-shard?
You would re-shard (increase the num of shards) when the data rate increases
Does the number of shards directly impact its ability to process incoming data?
Yes
Name 8 ways Kinesis policies can be implemented?
- IAM policies (users, roles or groups)
- Resource based policies
- Fine grained access control (ie., tags)
- Kinesis Data Streams Access Control
- Amazon CloudWatch Metrics and Alarms
- Cross-Account Access
- Integration with AWS Organizations
- Key Rotation and Encryption Policies
Name 4 types of Kinesis
Kinesis streams
Kinesis video
Kinesis analytics
Kinesis firehouse
You need to perform log analysis.
What would you use?
Opensearch because its used for full-text search & analytics
It is suitable for building search engines, log analysis and monitoring solutions
What service would you use if you want to LOAD live stream data into REDSHIFT
Kinesis Data Firehose because it simplifies the process of capturing and loading data into data stores like Redshift by automatically handling the data delivery and ensuring that it is efficiently and reliably loaded for analysis.
What is the difference between Athena and Aurora?
Athena is a serverless query service for s3
Aurora is a relational database service
Name the 4 types of Kinesis:
- Kinesis Data Streams: Collects and processes large streams of data records in real time.
- Kinesis Data Firehose: Loads streaming data into AWS data stores for near real-time analytics.
- Kinesis Data Analytics: Processes streaming data using SQL for real-time insights.
- Kinesis Video Streams: Securely streams video from connected devices to AWS for analysis and processing.