Chapter 15 - Big Data Flashcards

1
Q

Redshift

A

Is a relational database, but it is not a replacement for RDS.

  • We’re not using it in standard applications
  • Technically only a single availability zone service, so it’s not highly available
  • You can create multiple clusters in different availability zones, but there’s no one-click button that just says make it highly available.
  • You’d be duplicating your data over and over again.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

EMR

A

Amazon Elastic MapReduce

  • Is made up of standard EC2 instances
  • They are built and managed by AWS, but we can use cost-saving measures:
    • spot market
    • reserved instances
  • If we can time our workloads, we can save a lot of cash
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Kinesis

A

Kinesis is the only service that offers anything real-time related.

  • If you see that word real-time, think Kinesis.
  • Kinesis can act as queues
  • Kinesis datastreams - real-time being a requirement
  • Kinesis Data Firehose - automatic scaling and ease of use
  • SQS is easier, it’s simpler, but it’s not in real-time
  • SQS can only store data up to 14 days.
  • Kinesis is faster, namely, it is real-time, but it’s a bit more complicated
  • Kinesis is that it can store data up to a year if it is properly configured.
    *
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Athena

A

If you’re trying to query anything inside of S3, think Athena

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Glue

A

Serverless ETL tool

Can create that schema for your data that’s stored in S3, and then Athena can query it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

QuickSight

A

Talking about visualizing data, namely, how do we take all of this big data, all of this information, and put it in a graph, something that a human can look at?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

ElasticSearch

A

Allows you to create an ELK stack(Elasticsearch, Logstash, and Kibana)

  • Third party logging solution,s omething that allows you to analyze logs, analyze unstructured data,
  • We’re not looking for a proprietary AWS tool
  • Very common third party way to look over those logs that are coming from your EC2 instances or even on-prem servers
  • Elasticsearch equals ELK equals logs.
  • We are just looking for a way to analyze and visualize our logs, using not CloudWatch Logs.
  • If the scenario is looking for a third-party logging solution, look for that Elasticsearch, Logstash, and Kibana
    or ELK stack.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly