Big Data Flashcards
What is Redshift?
A fully managed, petabyte scale data warehouse service in the cloud.
How much information can Redshift hold?
16 petabytes
Is Redshift relational?
Yes
What is typical use case for Redshift?
Business Intelligence
Is Redshift a better RDS?
No, Redshift is not meant to replace RDS’s
What is EMR?
A managed big data platform that allows you to process vast amounts of data (AWS”s ETL tool)
What is Kinesis?
Allows you to ingest process and analyze real time streaming data. (think of it as a huge data highway)
What is Kinesis data streams for?
the real time streaming for ingesting data
What is kinesis data firehose for?
data transfer tool to get information to S3, Redshift, elasticsearch, or spunk
What is the downside to Kinesis data stream?
A lot of work to set up (specify shards and data consumer)
What can kinesis data firehose be thought of as
a simpler data stream
What is Kinesis data analytics?
allows us to analyze data in the pipeline using standard data
When would you choose Kinesis over SQS for messages?
If messages need real time delivery
Does kinesis data stream or kinesis data firehose automatically scale?
data streams
What is AWS Athena?
An interactive query service that makes it easy to analyze data in S3 using SQL. This allows you to query from S3 without uploading it to database