Big Data (Redshift) Flashcards by John M

How large can RedShift scale?

16 Petabytes

How well did you know this?

Not at all

Perfectly

What is EMR and what should you use it for?

Elastic Map Reduce
Allows you to process VAST amounts of data from various platforms
Stores the result in S3
Run in a managed fleet of EC2 instances running
- Big Data: Hadoop or Spark
- BI: Hive and PIG
https://tutorialsdojo.com/amazon-emr/

How well did you know this?

Not at all

Perfectly

When using EMR, where does it run?

EC2 Fleets.
EC2 rules apply.

How well did you know this?

Not at all

Perfectly

What is AWS Kinesis

Allows you to ingest process and analyze real time streaming data,

How well did you know this?

Not at all

Perfectly

What are the two forms of Kinesis

Kinesis Data Streams
- Real time
- Unlimited destinations
- Roll-your-own with the Kinesis SDK
Kinesis Firehose
- Near real time (60s)
- Limited destinations
- Plug and play

How well did you know this?

Not at all

Perfectly

What are the valid destinations of Kinesis Firehose?

S3
Redshift
Elastisearch
Splunk

How well did you know this?

Not at all

Perfectly

What is Kinesis Data Analytics

Pair this with Firehose or Datatream
Analyze with SQL
Fully managed

How well did you know this?

Not at all

Perfectly

How do you choose Kinesis v SQS?

Kinesis is real-time

How well did you know this?

Not at all

Perfectly

Which one scales automatically? Kinesis Data Streams or Kinesis Firehose

Kinesis Firehose

How well did you know this?

Not at all

Perfectly

What is AWS Glue

ETL Tool
Allows you to cleanse data for a data warehouse or data lake.

How well did you know this?

Not at all

Perfectly