Big Data Flashcards
1
Q
Redshift
A
- a fully managed petabyte-scale data warehouse
- a very large relational database
2
Q
Redshift - size
A
- up to 16 PB of data per cluster. (You don’t have to split up large data sets)
3
Q
Redshift - relational
A
- use your standard SQL and BI tools to interact with it
4
Q
Redshift use cases
A
- BI applications
- not a replacement for standard RDS
5
Q
Limitations of Redshift
A
- not highly available
- can only exist within one AZ
6
Q
EMR
A
Elastic Map Reduce
- ETL (Extract Transform Load)
- an AWS managed big data platform that allows you to process vast amounts of data using open source tools such as Spark, Hive, HBase, Flink, Hudi and Presto
7
Q
EMR exam tips
A
- opensource cluster
- a managed fleet of EC2 instances running open source tools
- EC2 rules apply - use spot instances and RIs to reduce your costs
- it processes and moves data
8
Q
Kinesis
A
- a big highway to transport stuff
- allows you to ingest, process and analyze real-time streaming data
9
Q
Kinesis Data Streams
A
- real time streaming for ingesting data
- you’re responsible for creating the consumer & scaling the stream
- older than Firehose
- a lot of overhead to configure
- does not automatically scale
10
Q
Kinesis Data Firehose
A
- Data transfer tool to get info to S3, Redshift, ElasticSearch, Splunk
- Speed: w/i 60 seconds (near real time)
- plug & play w/ AWS architecture
- automatically scales
11
Q
Kinesis Data Analytics
A
- paired with Data Firehose or Data Stream
- lets you analyze data using SQL
- easy, simple
- no servers (fully managed)
- pay per use
12
Q
How long can Kinesis store data?
A
up to one year
13
Q
When to use SQS over Kinesis?
A
- slightly delayed message delivery
- not much configuration needed
- simple to use
14
Q
When to use Kinesis over SQS?
A
- real time message delivery
- complicated to configure
- mostly used for big data applications
15
Q
What is the easiest way to process streaming data going thru Kinesis using SQL?
A
Kinesis Data Analytics