Big Data Flashcards
Redshift characteristics
- It’s for BI applications
- It’s relational (but not a replacement for RDS in traditional applications)
- It can store up to 16 PB of data
EMR (Elastic MapReduce)
Is a managed big data platform that allows you to process vast amounts of data using open-source tools, such as Spark, HBase, Flink, Hudi, and Presto. It’s AWS’s ETL tool.
How does EMR (Elastic MapReduce) work under the hood?
EMR is a managed fleet of EC2 instances running open-source tools (Spark, HBase, Flink, Hudi, and Presto)
Do EC2 rules apply to EMR (Elastic MapReduce)?
Yes. You can use RIs and Spot instances to reduce your costs.
Do EMR (Elastic MapReduce) lives inside a VPC?
Yes
Does Redshift support Multi-AZ deployments?
No. Redshift only supports Single-AZ deployments. You can create multiple clusters in different AZs, but they’re technically separate deployments. It’s not highly available by default.
What’s the only service with a real-time response?
Kinesis
AWS queuing services
SQS and Kinesis can both be queues. Each service has its pros and cons. SQS is easier and simpler, and Kinesis is faster and can store data for up to a year.
Serverless SQL
Athena
How to query data in S3?
Athena
Serverless ETL
Glue. It can help create the schema for your data when paired with Athena
Big Data/BI dashboard/data visualisation?
QuickSight
Elasticsearch
Excels when it’s combined with Logstash and Kibana. This creates an ELK stack and is a very common way to search over your server logs.