Chapter 15 - Big Data Flashcards
1
Q
Redshift
A
Is a relational database, but it is not a replacement for RDS.
- We’re not using it in standard applications
- Technically only a single availability zone service, so it’s not highly available
- You can create multiple clusters in different availability zones, but there’s no one-click button that just says make it highly available.
- You’d be duplicating your data over and over again.
2
Q
EMR
A
Amazon Elastic MapReduce
- Is made up of standard EC2 instances
- They are built and managed by AWS, but we can use cost-saving measures:
- spot market
- reserved instances
- If we can time our workloads, we can save a lot of cash
3
Q
Kinesis
A
Kinesis is the only service that offers anything real-time related.
- If you see that word real-time, think Kinesis.
- Kinesis can act as queues
- Kinesis datastreams - real-time being a requirement
- Kinesis Data Firehose - automatic scaling and ease of use
- SQS is easier, it’s simpler, but it’s not in real-time
- SQS can only store data up to 14 days.
- Kinesis is faster, namely, it is real-time, but it’s a bit more complicated
- Kinesis is that it can store data up to a year if it is properly configured.
*
4
Q
Athena
A
If you’re trying to query anything inside of S3, think Athena
5
Q
Glue
A
Serverless ETL tool
Can create that schema for your data that’s stored in S3, and then Athena can query it
6
Q
QuickSight
A
Talking about visualizing data, namely, how do we take all of this big data, all of this information, and put it in a graph, something that a human can look at?
7
Q
ElasticSearch
A
Allows you to create an ELK stack(Elasticsearch, Logstash, and Kibana)
- Third party logging solution,s omething that allows you to analyze logs, analyze unstructured data,
- We’re not looking for a proprietary AWS tool
- Very common third party way to look over those logs that are coming from your EC2 instances or even on-prem servers
- Elasticsearch equals ELK equals logs.
- We are just looking for a way to analyze and visualize our logs, using not CloudWatch Logs.
- If the scenario is looking for a third-party logging solution, look for that Elasticsearch, Logstash, and Kibana
or ELK stack.