Big Data (Redshift) Flashcards
1
Q
How large can RedShift scale?
A
16 Petabytes
2
Q
What is EMR and what should you use it for?
A
- Elastic Map Reduce
- Allows you to process VAST amounts of data from various platforms
- Stores the result in S3
- Run in a managed fleet of EC2 instances running
- Big Data: Hadoop or Spark
- BI: Hive and PIG
- https://tutorialsdojo.com/amazon-emr/
3
Q
- When using EMR, where does it run?
A
- EC2 Fleets.
- EC2 rules apply.
4
Q
What is AWS Kinesis
A
- Allows you to ingest process and analyze real time streaming data,
5
Q
What are the two forms of Kinesis
A
- Kinesis Data Streams
- Real time
- Unlimited destinations
- Roll-your-own with the Kinesis SDK
- Kinesis Firehose
- Near real time (60s)
- Limited destinations
- Plug and play
6
Q
What are the valid destinations of Kinesis Firehose?
A
- S3
- Redshift
- Elastisearch
- Splunk
7
Q
What is Kinesis Data Analytics
A
- Pair this with Firehose or Datatream
- Analyze with SQL
- Fully managed
8
Q
How do you choose Kinesis v SQS?
A
- Kinesis is real-time
9
Q
Which one scales automatically? Kinesis Data Streams or Kinesis Firehose
A
- Kinesis Firehose
10
Q
What is AWS Glue
A
- ETL Tool
- Allows you to cleanse data for a data warehouse or data lake.