Services - Analytics Flashcards

1
Q

Athena - Characteristics

A
  • It’s serverless and interactive query service that eases data analysis in S3 using standard SQL
  • Pay only for the queries you run
  • No ETL process needed and accesses S3 easily. Supports many input formats like CSV, JSON, TSV, and others
  • Queries can be executed in parallel
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Athena - Creation steps

A
  1. Create S3 bucket
  2. Create a metadata database
  3. Create a schema
  4. (Optional) Fine tune the serializer/deserializer (serde)
  5. Run the query
  6. Access the history (to rerun previous queries or save them)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Athena - Use cases

A
  • Extract info from auto-generated log files
  • Query exported spreadsheets
  • Get info from non-AWS database export
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Elasticsearch Service - Characteristics

A
  • Now named OpenSearch. It’s a distributed, open-source search and analytics suite
  • Provides a highly scalable system with fast access and response to large volumes of data with an integrated visualization tool, named OpenSearch Dashboards
  • Pay based on three dimensions: instance hours (hours available); storage needed; and data transferred in and out of OpenSearch Service
  • Can load streaming data from Kinesis Data Firehose and CloudWatch Logs directly
  • Can load streaming data from S3, Kinesis Data Streams and DynamoDB by using Lambda functions as event handlers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Elasticsearch Service - Features

A
  • Encryption, authentication, authorization, and auditing features
  • Offers SQL query syntax
  • Reporting, notifications, and asynchronous search
  • Anomaly detection on data ingested
  • Identify performance problems with OpenTelemetry data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

EMR - Characteristics

A
  • Named Elastic MapReduce. Let you easily run and scale Hadoop clusters.
  • Integrates with Kinesis, DynamoDB, Redshift, CloudFormation, CloudWatch, Data Pipeline, S3
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

EMR - Hadoop definition

A
  • It’s an open source, highly scalable distributed system that processes massive amounts of data
  • Uses Hadoop Distributed File System (HDFS)
  • Processes structured, unstructured, or semi-structured data
  • Supports tools like MapReduce, Spark, and others
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

EMR - Other characteristics 1

A
  • Creates and scales managed Hadoop clusters of EC2 instances
  • Provides EMRFS (EMR file system) connectors for S3, DynamoDB, Kinesis, and Redshift
  • The architecture consists of a master node, core nodes (where data is stored), and task nodes (compute only)
  • No need to manually provision, configure or tune Hadoop clusters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

EMR - Other characteristics 2

A
  • Pay as you go by: avoiding paying idle EC2 instances, or take advantage of EC2 spot and reserved instances
  • Can use security groups, isolated VPCs, and encryption to restrict access
  • Monitors, identifies and replaces poorly performing instances
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Kinesis - Characteristics

A
  • It’s a managed and scalable service that helps to collect, process, and analyze real-time streaming data
  • Can ingest real-time data such as video, audio, application logs, website clickstreams, and IoT telemetry data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Kinesis - Video and data capabilities

A
  • Kinesis Video Streams: streams video from connected devices to AWS for analytics, ML, and other processing
  • Kinesis Data Streams: real-time data streaming service that can capture GB per second from hundreds of thousands of sources
  • Kinesis Data Firehose: captures, transforms, and loads data streams into AWS data stores for near real-time analytics with existing business intelligence tools
  • Can transform using Lambda functions. And later store it, using its features, in S3, Redshift and ES
  • Kinesis Data Analytics: process data streams in real time with SQL or Apache Flink
  • Can ingest data from Kinesis Streams and Kinesis Firehose
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Kinesis - Pricing

A
  • Video Streams: pay for the volume of data ingested, stored, and consumed through the service. Also WebRTC capabilities are charged
  • Data Streams: pay as you go. Based on two core dimensions (Shard Hour and PUT Payload Unit) and other optional dimensions
  • Data Firehose: pay for the volume of data ingested into the service
  • Data Analytics: pay for what you use. Based on the number of Kinesis Processing Units (KPUs) used to run your application
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Kinesis - Streams Characteristics 1

A
  • Kinesis Streams is a set of shards that receives data records from producers and puts them on consumers.
  • A shard is a uniquely identified sequence of data records in a stream
  • A partition key is used to group data by shard within a stream
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Kinesis - Streams Characteristics 2

A
  • Consist of producers, Kinesis Streams application, and consumers
  • Can store data in S3, Redshift and DynamoDB
  • The default retention period is 24 hours. And can be configured up to 168 hours
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

QuickSight - Characteristics

A
  • Lets you create and publish interactive BI dashboards, and receive answers in seconds through natural language queries
  • Can embed BI dashboards in applications
  • Can scale to tens of thousands users without setup
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

QuickSight - Creation workflow

A
  1. Create a dataset
  2. Prepare data
  3. Create an analysis
  4. Create a visual, modify it, and add more visuals
  5. Add scenes (a captured iteration) to story (series of iterations on an analysis)
  6. Publish it as dashboard
17
Q

QuickSight - Payment schema

A
  • Authors pay a monthly or yearly subscription, in Enterprise and Standard editions
  • Readers pay per session up or a number of sessions provisioned, only in Enterprise edition
  • Alerts are charged by number of metrics evaluated within a range, only in Enterprise edition
  • Spice, an in-memory data store that scales automatically, is charged by the size of it. Only in Enterprise edition
18
Q

QuickSight - Tools

A
  • X-ray allows to see patterns and anomalies of data
  • QuickSight Q can ask business questions in natural language and receive answers with visualizations to gain insights from the data