Big Data (Redshift) Flashcards

1
Q

How large can RedShift scale?

A

16 Petabytes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is EMR and what should you use it for?

A
  • Elastic Map Reduce
  • Allows you to process VAST amounts of data from various platforms
  • Stores the result in S3
  • Run in a managed fleet of EC2 instances running
    • Big Data: Hadoop or Spark
    • BI: Hive and PIG
  • https://tutorialsdojo.com/amazon-emr/
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  • When using EMR, where does it run?
A
  • EC2 Fleets.
  • EC2 rules apply.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is AWS Kinesis

A
  • Allows you to ingest process and analyze real time streaming data,
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two forms of Kinesis

A
  • Kinesis Data Streams
    • Real time
    • Unlimited destinations
    • Roll-your-own with the Kinesis SDK
  • Kinesis Firehose
    • Near real time (60s)
    • Limited destinations
    • Plug and play
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the valid destinations of Kinesis Firehose?

A
  • S3
  • Redshift
  • Elastisearch
  • Splunk
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Kinesis Data Analytics

A
  • Pair this with Firehose or Datatream
  • Analyze with SQL
  • Fully managed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you choose Kinesis v SQS?

A
  • Kinesis is real-time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which one scales automatically? Kinesis Data Streams or Kinesis Firehose

A
  • Kinesis Firehose
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is AWS Glue

A
  • ETL Tool
  • Allows you to cleanse data for a data warehouse or data lake.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly