Section 22: Data and Analytics Flashcards by Michael Fickley

A serverless, interactive analytics service built on open-source frameworks, supporting open-table and file formats

Amazon Athena

How well did you know this?

Not at all

Perfectly

Amazon Athena is commonly paired with _____ in order to create reports and dashboards

Amazon Quicksight

How well did you know this?

Not at all

Perfectly

This service is the best tool available when you need to analyze data in S3 using serverless SQL

Amazon Athena

How well did you know this?

Not at all

Perfectly

What are four ways you can enhance the performance of Amazon Athena?

Use columnar data for cost-savings (less scan)
Compress data for smaller retrievals
Partition your datasets in S3
Use larger files

How well did you know this?

Not at all

Perfectly

If you have data in sources other than Amazon S3, you can use this Athena feature to query the data in place or build pipelines that extract data from multiple data sources and store them in Amazon S3

Amazon Athena Federated Query

How well did you know this?

Not at all

Perfectly

How much do Athena queries cost to run?

$5.00 per TB of data scanned

How well did you know this?

Not at all

Perfectly

This service is a fully managed, petabyte-scale data warehouse service in the cloud

Amazon Redshift

How well did you know this?

Not at all

Perfectly

What types of nodes comprise a Redshift Cluster?

Leader Node
Compute Nodes

How well did you know this?

Not at all

Perfectly

What are three ways you can insert data into Redshift?

Kinesis Data Firehose
S3 Copy Command
Insert in batches from EC2 instance using JDBC driver

How well did you know this?

Not at all

Perfectly

Feature that allows you to efficiently query and retrieve structured and semi-structured data from files in Amazon S3 without having to load the data into Amazon Redshift tables

Redshift Spectrum

How well did you know this?

Not at all

Perfectly

An open source, distributed search and analytics suite derived from Elasticsearch that makes it easy for you to perform interactive log analytics, real-time application monitoring, website search, and more

Amazon OpenSearch Service

How well did you know this?

Not at all

Perfectly

AWS service that is a cloud big data solution for petabyte-scale data processing, interactive analytics, and machine learning using open-source frameworks such as Apache Spark, Apache Hive, and Presto

Amazon EMR (Elastic MapReduce)

How well did you know this?

Not at all

Perfectly

EMR node type that coordinates and manages the health of all your other nodes

Master Node

How well did you know this?

Not at all

Perfectly

EMR node type that runs tasks and stores data

Core Node

How well did you know this?

Not at all

Perfectly

EMR node type that only runs tasks - typically it is a good practice to use Spot instances for these nodes

Task Node

How well did you know this?

Not at all

Perfectly

A cloud-native, serverless, business intelligence service with native ML integrations and usage-based pricing, used to create interactive dashboards

Amazon QuickSight

How well did you know this?

Not at all

Perfectly

Engine that performs in-memory computations if you import data directly into QuickSight

Study These Flashcards

SPICE Engine

A serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development

Study These Flashcards

AWS Glue

AWS Glue feature that prevents the re-processing of old data

Study These Flashcards

Glue Job Bookmarks

AWS Glue feature that allows you to combine and replicate data across multiple data stores using SQL

Study These Flashcards

Glue Elastic Views

AWS Glue feature that allows you to clean and normalize data using pre-built transformations

Study These Flashcards

AWS DataBrew

AWS Glue feature that provides you with a GUI to create, run, and monitor ETL jobs

Study These Flashcards

Glue Studio

AWS Glue feature that allows you to run streaming ETL jobs that can be integrated with Kinesis Data Streaming, Kafka, MSK, etc.

Study These Flashcards

Glue Streaming ETL

AWS service that easily creates secure data lakes, making data available for wide-ranging analytics

Study These Flashcards

AWS Lake Formation

This service enables you to quickly author and run powerful SQL code against streaming sources to perform time series analytics, feed real-time dashboards, and create real-time metrics

Kinesis Data Analytics for SQL Applications

How long can you retain data in Amazon MSK?

As long as you want as long as you pay for the underlying EBS Storage

A fully managed service for Apache Kafka that makes it easier for developers to build and run highly available, secure, and scalable applications based on Apache Kafka

Amazon Managed Streaming for Apache Kafka (Amazon MSK)

You would like to have a database that is efficient at performing analytical queries on large sets of columnar data. You would like to connect to this Data Warehouse using a reporting and dashboard tool such as Amazon QuickSight. Which AWS technology do you recommend?

Amazon Redshift

You have a lot of log files stored in an S3 bucket that you want to perform a quick analysis, if possible Serverless, to filter the logs and find users that attempted to make an unauthorized action. Which AWS service allows you to do so?

Amazon Athena

As a Solutions Architect, you have been instructed you to prepare a disaster recovery plan for a Redshift cluster. What should you do?

Enable Automated Snapshots, then configure the Redshift cluster to automatically copy snapshots to another AWS region

Which feature in Redshift forces all COPY and UNLOAD traffic moving between your cluster and data repositories through your VPCs?

Enhanced VPC Routing

You are running a gaming website that is using DynamoDB as its data store. Users have been asking for a search feature to find other gamers by name, with partial matches if possible. Which AWS technology do you recommend to implement this feature?

Amazon OpenSearch

An AWS service allows you to create, run, and monitor ETL (extract, transform, and load) jobs in a few clicks

AWS Glue

A company is using AWS to host its public websites and internal applications. Those different websites and applications generate a lot of logs and traces. There is a requirement to centrally store those logs and efficiently search and analyze those logs in real-time for detection of any errors and if there is a threat. Which AWS service can help them efficiently store and analyze logs?

Amazon OpenSearch service

This service makes it easy and cost-effective for data engineers and analysts to run applications built using open source big data frameworks such as Apache Spark, Hive, or Presto without having to operate or manage clusters

Amazon Elastic Map Reduce (EMR)

An e-commerce company has all its historical data such as orders, customers, revenues, and sales for the previous years hosted on a Redshift cluster. There is a requirement to generate some dashboards and reports indicating the revenues from the previous years and the total sales, so it will be easy to define the requirements for the next year. The DevOps team is assigned to find an AWS service that can help define those dashboards and have native integration with Redshift. Which AWS service is best suited?

Amazon Quicksight

Which AWS Glue feature allows you to save and track the data that has already been processed during a previous run of a Glue ETL job?

Glue Job Bookmarks

You are a DevOps engineer in a machine learning company which 3 TB of JSON files stored in an S3 bucket. There’s a requirement to do some analytics on those files using Amazon Athena and you have been tasked to find a way to convert those files’ format from JSON to Apache Parquet. Which AWS service is best suited?

AWS Glue

You have an on-premises application that is used together with an on-premises Apache Kafka to receive a stream of clickstream events from multiple websites. You have been tasked to migrate this application as soon as possible without any code changes. You decided to host the application on an EC2 instance. What is the best option you recommend to migrate Apache Kafka?

Amazon MSK

You have data stored in RDS, S3 buckets and you are using AWS Lake Formation as a data lake to collect, move and catalog data so you can do some analytics. You have a lot of big data and ML engineers in the company and you want to control access to part of the data as it might contain sensitive information. What can you use?

Lake Formation Fine-grained Access Control

Which AWS service is most appropriate when you want to perform real-time analytics on streams of data?

Amazon Kinesis Data Analytics

Section 22: Data and Analytics Flashcards

(41 cards)