01 - Analytics Flashcards

1
Q

Amazon Athena

• Serverless query service to perform analytics against S3 objects

A
  • Uses standard SQL language to query the files
  • Supports CSV, JSON, ORC, Avro, and Parquet (built on Presto)
  • Pricing: $5.00 per TB of data scanned
  • Use compressed or columnar data for cost-savings (less scan)
  • Use cases: Business intelligence / analytics / reporting, analyze & query VPC Flow Logs, ELB Logs, CloudTrail trails, etc…
  • Exam Tip: analyze data in S3 using serverless SQL, use Athena
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

AWS Glue

A
  • Managed extract, transform, and load (ETL) service
  • Useful to prepare and transform data for analytics
  • Fully serverless service
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Amazon ElasticSearch Service (Amazon ES)

A
  • With ElasticSearch, you can search any field, even partially matches
  • It’s common to use ElasticSearch as a complement to another database
  • ElasticSearch also has some usage for Big Data applications
  • You can provision a cluster of instances
  • Built-in integrations: Amazon Kinesis Data Firehose, AWS IoT, and Amazon CloudWatch Logs for data ingestion
  • Security through Cognito & IAM, KMS encryption, SSL & VPC
  • Comes with Kibana (visualization) & Logstash (log ingestion) – ELK stack
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Amazon EMR

• EMR stands for “Elastic MapReduce”

A
  • EMR helps creating Hadoop clusters (Big Data) to analyze and process vast amount of data
  • The clusters can be made of hundreds of EC2 instances
  • Also supports Apache Spark, HBase, Presto, Flink…
  • EMR takes care of all the provisioning and configuration
  • Auto-scaling and integrated with Spot instances
  • Use cases: data processing, machine learning, web indexing, big data…
How well did you know this?
1
Not at all
2
3
4
5
Perfectly