Services - Analytics Flashcards

Question 1

Q

Athena - Characteristics

Answer

A

It’s serverless and interactive query service that eases data analysis in S3 using standard SQL
Pay only for the queries you run
No ETL process needed and accesses S3 easily. Supports many input formats like CSV, JSON, TSV, and others
Queries can be executed in parallel

Question 2

Q

Athena - Creation steps

Answer

A

Create S3 bucket
Create a metadata database
Create a schema
(Optional) Fine tune the serializer/deserializer (serde)
Run the query
Access the history (to rerun previous queries or save them)

Question 3

Q

Athena - Use cases

Answer

A

Extract info from auto-generated log files
Query exported spreadsheets
Get info from non-AWS database export

Question 4

Q

Elasticsearch Service - Characteristics

Answer

A

Now named OpenSearch. It’s a distributed, open-source search and analytics suite
Provides a highly scalable system with fast access and response to large volumes of data with an integrated visualization tool, named OpenSearch Dashboards
Pay based on three dimensions: instance hours (hours available); storage needed; and data transferred in and out of OpenSearch Service
Can load streaming data from Kinesis Data Firehose and CloudWatch Logs directly
Can load streaming data from S3, Kinesis Data Streams and DynamoDB by using Lambda functions as event handlers

Question 5

Q

Elasticsearch Service - Features

Answer

A

Encryption, authentication, authorization, and auditing features
Offers SQL query syntax
Reporting, notifications, and asynchronous search
Anomaly detection on data ingested
Identify performance problems with OpenTelemetry data

Question 6

Q

EMR - Characteristics

Answer

A

Named Elastic MapReduce. Let you easily run and scale Hadoop clusters.
Integrates with Kinesis, DynamoDB, Redshift, CloudFormation, CloudWatch, Data Pipeline, S3

Question 7

Q

EMR - Hadoop definition

Answer

A

It’s an open source, highly scalable distributed system that processes massive amounts of data
Uses Hadoop Distributed File System (HDFS)
Processes structured, unstructured, or semi-structured data
Supports tools like MapReduce, Spark, and others

Question 8

Q

EMR - Other characteristics 1

Answer

A

Creates and scales managed Hadoop clusters of EC2 instances
Provides EMRFS (EMR file system) connectors for S3, DynamoDB, Kinesis, and Redshift
The architecture consists of a master node, core nodes (where data is stored), and task nodes (compute only)
No need to manually provision, configure or tune Hadoop clusters

Question 9

Q

EMR - Other characteristics 2

Answer

A

Pay as you go by: avoiding paying idle EC2 instances, or take advantage of EC2 spot and reserved instances
Can use security groups, isolated VPCs, and encryption to restrict access
Monitors, identifies and replaces poorly performing instances

Question 10

Q

Kinesis - Characteristics

Answer

A

It’s a managed and scalable service that helps to collect, process, and analyze real-time streaming data
Can ingest real-time data such as video, audio, application logs, website clickstreams, and IoT telemetry data

Question 11

Q

Kinesis - Video and data capabilities

Answer

A

Kinesis Video Streams: streams video from connected devices to AWS for analytics, ML, and other processing
Kinesis Data Streams: real-time data streaming service that can capture GB per second from hundreds of thousands of sources
Kinesis Data Firehose: captures, transforms, and loads data streams into AWS data stores for near real-time analytics with existing business intelligence tools
Can transform using Lambda functions. And later store it, using its features, in S3, Redshift and ES
Kinesis Data Analytics: process data streams in real time with SQL or Apache Flink
Can ingest data from Kinesis Streams and Kinesis Firehose

Question 12

Q

Kinesis - Pricing

Answer

A

Video Streams: pay for the volume of data ingested, stored, and consumed through the service. Also WebRTC capabilities are charged
Data Streams: pay as you go. Based on two core dimensions (Shard Hour and PUT Payload Unit) and other optional dimensions
Data Firehose: pay for the volume of data ingested into the service
Data Analytics: pay for what you use. Based on the number of Kinesis Processing Units (KPUs) used to run your application

Question 13

Q

Kinesis - Streams Characteristics 1

Answer

A

Kinesis Streams is a set of shards that receives data records from producers and puts them on consumers.
A shard is a uniquely identified sequence of data records in a stream
A partition key is used to group data by shard within a stream

Question 14

Q

Kinesis - Streams Characteristics 2

Answer

A

Consist of producers, Kinesis Streams application, and consumers
Can store data in S3, Redshift and DynamoDB
The default retention period is 24 hours. And can be configured up to 168 hours

Question 15

Q

QuickSight - Characteristics

Answer

A

Lets you create and publish interactive BI dashboards, and receive answers in seconds through natural language queries
Can embed BI dashboards in applications
Can scale to tens of thousands users without setup

Question 16

Q

QuickSight - Creation workflow

Answer

A

Create a dataset
Prepare data
Create an analysis
Create a visual, modify it, and add more visuals
Add scenes (a captured iteration) to story (series of iterations on an analysis)
Publish it as dashboard

Question 17

Q

QuickSight - Payment schema

Answer

A

Authors pay a monthly or yearly subscription, in Enterprise and Standard editions
Readers pay per session up or a number of sessions provisioned, only in Enterprise edition
Alerts are charged by number of metrics evaluated within a range, only in Enterprise edition
Spice, an in-memory data store that scales automatically, is charged by the size of it. Only in Enterprise edition

Question 18

Q

QuickSight - Tools

Answer

A

X-ray allows to see patterns and anomalies of data
QuickSight Q can ask business questions in natural language and receive answers with visualizations to gain insights from the data