Analytics Flashcards

Question 1

Q

Amazon Athena

Answer

A

interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL; query data in S3 using SQL
Serverless, so there is no infrastructure to manage, and you pay only for the queries you run
Athena uses a managed Data Catalog (AWS Glue) to store information and schemas about that databases and tables that you create for your data stored in Amazon S3
Athena is optimized for fast performance w/ Amazon S3
With Athena, there’s no need for complex ETL jobs to prepare data for analysis
makes it easy for anyone w/ SQL skills to quickly analyze large-scale datasets

Question 2

Q

Amazon Kinesis

Answer

A

makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new info

examples of streaming data use cases include

purchases from online stores
stock prices
game data (statistics and results as the gamer plays)
social network data
geospatial data (think uber.com)
IoT sensor data

Different types: Data Streams, Data Analytics, Firehose

Question 3

Q

AWS Data Pipeline

Answer

A

web service that helps you reliably process and move data b/w different AWS compute and storage services, as well as on-premises data sources, at specified intervals
With AWS Data Pipeline, you can regularly access your data where its stored, transform and process it as scale, and transfer the results to services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon EMR

Question 4

Q

AWS Glue

Answer

A

fully managed, pay-as-you-go, extract, transform, and load (ETL) service that automates the time-consuming steps of data preparation for analytics
Glue discovers data, transforms it, and prepares it for analytics
Glue provides a unified view of data through the Glue Data Catalog that Athena (and other services) can use
automatically discovers and profiles data via the Glue Data Catalog, recommends and generates ETL code to transform your source data into target schemas
simply point AWS Glue to your data stored on AWS, and AWS Glue discovers data and stores the associated metadata (eg table definition and schema) in the AWS Glue Data Catalog
Once cataloged, data is immediately searchable, queryable, and available for ETL
Works with data lakes (eg data on S3), data warehouses (including RedShift), and data stores (including RDS or Amazon EC2 databases)

Question 5

Q

Amazon QuickSight

Answer

A

fast, cloud-powered business intelligence service that makes it easy to deliver insights to everyone in your organization
fully managed service
QuickSight lets you easily create and publish interactive dashboards that include ML insights
Dashboards can then be accessed from any device, and embedded into your applications, portals, and websites

Collect and load data - clickstreams, sales orders, IoT, financial data, etc
Data sources - seamlessly connect to your dat wherever it lives - in the cloud, in 3rd party applications, or on-premises
Amazon QuickSight - first BI service w/ pay-per-session pricing

Interactive dashboards, email reports, embedded analytics

Question 6

Q

Amazon Elastic Map Reduce (EMR)

Answer

A

provides a managed Hadoop framework that makes it easy, fast, and cost-effective to process vast amounts of data across dynamically scalable Amazon EC2 instances.

Apache Hadoop is an open source framework that is used to efficiently store and process large datasets ranging in size from gigabytes to petabytes of data. Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly