Section 22: Data and Analytics Flashcards

1
Q

A serverless, interactive analytics service built on open-source frameworks, supporting open-table and file formats

A

Amazon Athena

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Amazon Athena is commonly paired with _____ in order to create reports and dashboards

A

Amazon Quicksight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

This service is the best tool available when you need to analyze data in S3 using serverless SQL

A

Amazon Athena

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are four ways you can enhance the performance of Amazon Athena?

A

Use columnar data for cost-savings (less scan)
Compress data for smaller retrievals
Partition your datasets in S3
Use larger files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

If you have data in sources other than Amazon S3, you can use this Athena feature to query the data in place or build pipelines that extract data from multiple data sources and store them in Amazon S3

A

Amazon Athena Federated Query

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How much do Athena queries cost to run?

A

$5.00 per TB of data scanned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

This service is a fully managed, petabyte-scale data warehouse service in the cloud

A

Amazon Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What types of nodes comprise a Redshift Cluster?

A

Leader Node
Compute Nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are three ways you can insert data into Redshift?

A

Kinesis Data Firehose
S3 Copy Command
Insert in batches from EC2 instance using JDBC driver

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Feature that allows you to efficiently query and retrieve structured and semi-structured data from files in Amazon S3 without having to load the data into Amazon Redshift tables

A

Redshift Spectrum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

An open source, distributed search and analytics suite derived from Elasticsearch that makes it easy for you to perform interactive log analytics, real-time application monitoring, website search, and more

A

Amazon OpenSearch Service

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

AWS service that is a cloud big data solution for petabyte-scale data processing, interactive analytics, and machine learning using open-source frameworks such as Apache Spark, Apache Hive, and Presto

A

Amazon EMR (Elastic MapReduce)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

EMR node type that coordinates and manages the health of all your other nodes

A

Master Node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

EMR node type that runs tasks and stores data

A

Core Node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

EMR node type that only runs tasks - typically it is a good practice to use Spot instances for these nodes

A

Task Node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A cloud-native, serverless, business intelligence service with native ML integrations and usage-based pricing, used to create interactive dashboards

A

Amazon QuickSight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Engine that performs in-memory computations if you import data directly into QuickSight

A

SPICE Engine

18
Q

A serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development

A

AWS Glue

19
Q

AWS Glue feature that prevents the re-processing of old data

A

Glue Job Bookmarks

20
Q

AWS Glue feature that allows you to combine and replicate data across multiple data stores using SQL

A

Glue Elastic Views

21
Q

AWS Glue feature that allows you to clean and normalize data using pre-built transformations

A

AWS DataBrew

22
Q

AWS Glue feature that provides you with a GUI to create, run, and monitor ETL jobs

A

Glue Studio

23
Q

AWS Glue feature that allows you to run streaming ETL jobs that can be integrated with Kinesis Data Streaming, Kafka, MSK, etc.

A

Glue Streaming ETL

24
Q

AWS service that easily creates secure data lakes, making data available for wide-ranging analytics

A

AWS Lake Formation

25
Q

This service enables you to quickly author and run powerful SQL code against streaming sources to perform time series analytics, feed real-time dashboards, and create real-time metrics

A

Kinesis Data Analytics for SQL Applications

26
Q

How long can you retain data in Amazon MSK?

A

As long as you want as long as you pay for the underlying EBS Storage

27
Q

A fully managed service for Apache Kafka that makes it easier for developers to build and run highly available, secure, and scalable applications based on Apache Kafka

A

Amazon Managed Streaming for Apache Kafka (Amazon MSK)

28
Q

You would like to have a database that is efficient at performing analytical queries on large sets of columnar data. You would like to connect to this Data Warehouse using a reporting and dashboard tool such as Amazon QuickSight. Which AWS technology do you recommend?

A

Amazon Redshift

29
Q

You have a lot of log files stored in an S3 bucket that you want to perform a quick analysis, if possible Serverless, to filter the logs and find users that attempted to make an unauthorized action. Which AWS service allows you to do so?

A

Amazon Athena

30
Q

As a Solutions Architect, you have been instructed you to prepare a disaster recovery plan for a Redshift cluster. What should you do?

A

Enable Automated Snapshots, then configure the Redshift cluster to automatically copy snapshots to another AWS region

31
Q

Which feature in Redshift forces all COPY and UNLOAD traffic moving between your cluster and data repositories through your VPCs?

A

Enhanced VPC Routing

32
Q

You are running a gaming website that is using DynamoDB as its data store. Users have been asking for a search feature to find other gamers by name, with partial matches if possible. Which AWS technology do you recommend to implement this feature?

A

Amazon OpenSearch

33
Q

An AWS service allows you to create, run, and monitor ETL (extract, transform, and load) jobs in a few clicks

A

AWS Glue

34
Q

A company is using AWS to host its public websites and internal applications. Those different websites and applications generate a lot of logs and traces. There is a requirement to centrally store those logs and efficiently search and analyze those logs in real-time for detection of any errors and if there is a threat. Which AWS service can help them efficiently store and analyze logs?

A

Amazon OpenSearch service

35
Q

This service makes it easy and cost-effective for data engineers and analysts to run applications built using open source big data frameworks such as Apache Spark, Hive, or Presto without having to operate or manage clusters

A

Amazon Elastic Map Reduce (EMR)

36
Q

An e-commerce company has all its historical data such as orders, customers, revenues, and sales for the previous years hosted on a Redshift cluster. There is a requirement to generate some dashboards and reports indicating the revenues from the previous years and the total sales, so it will be easy to define the requirements for the next year. The DevOps team is assigned to find an AWS service that can help define those dashboards and have native integration with Redshift. Which AWS service is best suited?

A

Amazon Quicksight

37
Q

Which AWS Glue feature allows you to save and track the data that has already been processed during a previous run of a Glue ETL job?

A

Glue Job Bookmarks

38
Q

You are a DevOps engineer in a machine learning company which 3 TB of JSON files stored in an S3 bucket. There’s a requirement to do some analytics on those files using Amazon Athena and you have been tasked to find a way to convert those files’ format from JSON to Apache Parquet. Which AWS service is best suited?

A

AWS Glue

39
Q

You have an on-premises application that is used together with an on-premises Apache Kafka to receive a stream of clickstream events from multiple websites. You have been tasked to migrate this application as soon as possible without any code changes. You decided to host the application on an EC2 instance. What is the best option you recommend to migrate Apache Kafka?

A

Amazon MSK

40
Q

You have data stored in RDS, S3 buckets and you are using AWS Lake Formation as a data lake to collect, move and catalog data so you can do some analytics. You have a lot of big data and ML engineers in the company and you want to control access to part of the data as it might contain sensitive information. What can you use?

A

Lake Formation Fine-grained Access Control

41
Q

Which AWS service is most appropriate when you want to perform real-time analytics on streams of data?

A

Amazon Kinesis Data Analytics