Analytics Flashcards
analytics
the act of querying or processing your data
data warehouse
a data storage solution that aggregates massive amounts of historical data from disparate sources
uses of a data warehouse
querying, reporting, analytics and business intelligence
- not used for transaction processing
Redshift
AWS’s data warehousing solution
- improves speed and efficiency when querying
- handles exabyte-scale data
Use cases for Redshift
- to consolidate multiple databases for reporting
- when you want to run a relational database that doesn’t need to do transactions
Athena
a query service for Amazon S3
- can analyze S3 data using SQL
- pay per query
- serverless
Glue
data integration service that prepares your data for analytics
- ETL service
- prepares and loads data
- helps you better understand your data
- serverless
ETL
Extract, Transform and Load
data integration
the process of preparing and combining data for analytics, machine learning, and application development. It involves multiple tasks, such as discovering and extracting data from various sources; enriching, cleaning, normalizing, and combining data; and loading and organizing data in databases, data warehouses, and data lakes.
Kinesis
allows you to analyze data and video streams in real time
- supports: audio, video, application logs, website clickstreams, IoT
Use case for Kinesis
analyze logs in near real time for application monitoring or fraud detection
EMR
Elastic Map Reduce
- helps you process large amounts of big data
- analyze data using Hadoop
- works with big data frameworks like Apache Spark
Data Pipeline
helps you move data between compute and storage services either running on AWS or on-premises
- moves data at specific intervals or based on conditions
- sends notifications of success or failure
use case for Data Pipeline
to move data from S3 to Redshift