*Analytics* Flashcards

1
Q

Amazon Kinesis Data Streams

A

This service is a massively scalable and durable real-time data streaming service. KDS can continuously capture gigabytes of data per second from hundreds of thousands of sources such as website clickstreams, database event streams, financial transactions, social media feeds, IT logs, and location-tracking events. The data collected is available in milliseconds to enable real-time analytics use cases such as real-time dashboards, real-time anomaly detection, dynamic pricing, and more.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Reversed

This service makes it easy to securely stream video from connected devices to AWS for analytics, machine learning (ML), playback, and other processing.

  • Automatically provisions and elastically scales all the infrastructure needed* to ingest streaming video data from millions of devices. It also durably stores, encrypts, and indexes video data in your streams, and allows you to access your data through easy-to-use APIs.
  • Enables you to playback video for live and on-demand viewing, and quickly build applications* that take advantage of computer vision and video analytics through integration with Amazon Rekognition Video, and libraries for ML frameworks such as Apache MxNet, TensorFlow, and OpenCV.
A

Amazon Kinesis Video Streams

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Reversed

An interactive query service that easily analyzes data in Amazon S3 using standard SQL.

Serververless: no infrastructure to manage. Pay only for the queries that you run.

Simply point to data in Amazon S3, define the schema, and start querying using standard SQL.

Out-of-the-box integrated with AWS Glue Data Catalog, allowing creation of a unified metadata repository across various services, crawl data sources to discover schemas and populate your catalog with new and modified table and partition definitions, and maintain schema versioning.

A

Amazon Athena

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Reversed

This service is a web service that helps you reliably process and move data between different AWS compute and storage services, as well as on-premises data sources, at specified intervals.

Regularly access your data where it’s stored, transform and process it at scale, and efficiently transfer the results to AWS services such as Amazon S3 (p. 74), Amazon RDS (p. 28), Amazon DynamoDB (p. 26), and Amazon EMR (p. 11).

Easily create complex data processing workloads that are fault tolerant, repeatable, and highly available.

This service also allows you to move and process data that was previously locked up in on-premises data silos.

A

AWS Data Pipeline

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Amazon CloudSearch

A

A managed service in the AWS Cloud that makes it simple and cost-effective to set up, manage, and scale a search solution for your website or application. Supports 34 languages and popular search features such as highlighting, autocomplete, and geospatial search.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Amazon Kinesis

A

This service makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information.

Offers key capabilities to cost effectively process streaming data at any scale, along with the flexibility to choose the tools that best
suit the requirements of your application.

Ingest real-time data such as video, audio, application logs, website clickstreams, and IoT telemetry data for machine learning, analytics, and other applications.

Process and analyze data as it arrives and respond instantly instead of having to wait until all your data is collected before the processing can begin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Amazon Kinesis Data Analytics

A

This service is the easiest way to analyze streaming data, gain actionable insights, and respond to your business and customer needs in real time.

Reduces the complexity of building, managing, and integrating streaming applications with other AWS services. SQL users can easily query streaming data or build entire streaming applications using templates and an interactive SQL editor. Java developers can quickly build sophisticated streaming applications using open source Java libraries and AWS integrations to transform and analyze data in real-time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Amazon Redshift

A

This service is the most widely used cloud data warehouse. It makes it fast, simple and cost effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools.

Run complex analytic queries against terabytes to petabytes of structured and semistructured data, using sophisticated query optimization, columnar storage on high-performance storage, and massively parallel query execution. Most results come back in seconds. You can start small for just $0.25 per hour with no commitments and scale out to petabytes of data for $1,000 per terabyte per year, less than a tenth the cost of traditional on-premises solutions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

AWS Glue

A

AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics.
Point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Amazon Managed Streaming for Apache Kafka (Amazon MSK)

A

Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed service that makes it easy for you to build and run applications that use Apache Kafka to process streaming data.

Open-source platform for building real-time streaming data pipelines and applications.
With Amazon MSK, you can use Apache Kafka APIs to populate data lakes, stream changes to and from databases, and power machine learning and analytics applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Amazon FinSpace

A

Amazon FinSpace is a data management and analytics service purpose-built for the financial services industry (FSI). FinSpace reduces the time you spend finding and preparing petabytes of financial data to be ready for analysis from months to minutes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Reversed

The industry-leading cloud big data platform for processing vast amounts of data using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto. This service makes it easy to set up, operate, and scale your big data environments by automating time-consuming tasks like provisioning capacity and tuning clusters.

You can run petabyte-scale analysis at less than half of the cost of traditional on-premises solutions and over 3x faster than standard Apache Spark. You can run workloads on Amazon EC2 instances, on Amazon Elastic Kubernetes Service (EKS) clusters, or on-premises.

A
  • *Amazon EMR**
  • *(Elastic Map Reduce)**
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Amazon QuickSight

A

This service is a fast, cloud-powered business intelligence (BI) service that makes it easy for you to deliver insights to everyone in your organization.

Create and publish interactive dashboards that can be accessed from browsers or mobile devices. You can embed dashboards into your applications, providing your customers with powerful self-service analytics. This service easily scales to tens of thousands of users without any software to install, servers to deploy, or infrastructure to manage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Amazon Kinesis Data Firehose

A

This service is the easiest way to reliably load streaming data into data stores and analytics tools. It can capture, transform, and load streaming data into Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Splunk, enabling near real-time analytics with existing
business intelligence tools and dashboards you’re already using today.

It is a fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration. It can also batch, compress, transform, and encrypt the data before loading it, minimizing the amount of storage used at the destination and increasing security.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Reversed

This service makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information.

Offers key capabilities to cost effectively process streaming data at any scale, along with the flexibility to choose the tools that best
suit the requirements of your application.

Ingest real-time data such as video, audio, application logs, website clickstreams, and IoT telemetry data for machine learning, analytics, and other applications.

Process and analyze data as it arrives and respond instantly instead of having to wait until all your data is collected before the processing can begin.

A

Amazon Kinesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Reversed

A managed service in the AWS Cloud that makes it simple and cost-effective to set up, manage, and scale a search solution for your website or application. Supports 34 languages and popular search features such as highlighting, autocomplete, and geospatial search.

A

Amazon CloudSearch

17
Q

AWS Data Exchange

A

AWS Data Exchange makes it easy to find, subscribe to, and use third-party data in the cloud. For data providers, AWS Data Exchange makes it easy to reach the millions of AWS customers migrating to the cloud by removing the need to build and maintain infrastructure for data storage, delivery, billing,
and entitling.

18
Q

Reversed

This service is the easiest way to analyze streaming data, gain actionable insights, and respond to your business and customer needs in real time.

Reduces the complexity of building, managing, and integrating streaming applications with other AWS services. SQL users can easily query streaming data or build entire streaming applications using templates and an interactive SQL editor. Java developers can quickly build sophisticated streaming applications using open source Java libraries and AWS integrations to transform and analyze data in real-time.

A

Amazon Kinesis Data Analytics

19
Q

AWS Lake Formation

A

AWS Lake Formation is a service that makes it easy to set up a secure data lake in days. A data lake is a centralized, curated, and secured repository that stores all your data, both in its original form and
prepared for analysis. A data lake enables you to break down data silos and combine different types of analytics to gain insights and guide better business decisions.

Define where your data resides and what data access and security policies you want to apply. Lake Formation then collects and catalogs data from databases and object storage, moves the data into your new Amazon S3 data lake, cleans and classifies data using machine learning algorithms, and secures access to your sensitive data. Your users can then access a centralized catalog of data which describes available data sets and their appropriate usage.

Your users then leverage these data sets with their choice of analytics and machine learning services, like
Amazon EMR for Apache Spark, Amazon Redshift, Amazon Athena, SageMaker, and Amazon QuickSight.

20
Q
  • *Amazon EMR**
  • *(Elastic Map Reduce)**
A

The industry-leading cloud big data platform for processing vast amounts of data using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto. This service makes it easy to set up, operate, and scale your big data environments by automating time-consuming tasks like provisioning capacity and tuning clusters.

You can run petabyte-scale analysis at less than half of the cost of traditional on-premises solutions and over 3x faster than standard Apache Spark. You can run workloads on Amazon EC2 instances, on Amazon Elastic Kubernetes Service (EKS) clusters, or on-premises.

21
Q

Reversed

Makes it easy to deploy, secure, operate, and scale. Search, analyze, and visualize data in real-time. You get easy-to-use APIs and real-time analytics capabilities to power use-cases such as log analytics, full-text search, application monitoring, and clickstream analytics, with enterprise-grade availability, scalability, and security. The service offers integrations with open-source tools like Kibana and Logstash for data ingestion and visualization.

It also integrates seamlessly with other AWS services such as Amazon Virtual Private Cloud (Amazon VPC), AWS Key Management Service (AWS KMS), Amazon Kinesis Data Firehose, AWS Lambda,
AWS Identity and Access Management (IAM), Amazon Cognito, and Amazon CloudWatch, so that you can go from raw data to actionable insights quickly.

A

Amazon Elasticsearch Service

22
Q

Reversed

This service is a fast, cloud-powered business intelligence (BI) service that makes it easy for you to deliver insights to everyone in your organization.

Create and publish interactive dashboards that can be accessed from browsers or mobile devices. You can embed dashboards into your applications, providing your customers with powerful self-service analytics. This service easily scales to tens of thousands of users without any software to install, servers to deploy, or infrastructure to manage.

A

Amazon QuickSight

23
Q

Reversed

This service is the easiest way to reliably load streaming data into data stores and analytics tools. It can capture, transform, and load streaming data into Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Splunk, enabling near real-time analytics with existing
business intelligence tools and dashboards you’re already using today.

It is a fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration. It can also batch, compress, transform, and encrypt the data before loading it, minimizing the amount of storage used at the destination and increasing security.

A

Amazon Kinesis Data Firehose

24
Q

Amazon Athena

A

An interactive query service that easily analyzes data in Amazon S3 using standard SQL.

Serververless: no infrastructure to manage. Pay only for the queries that you run.

Simply point to data in Amazon S3, define the schema, and start querying using standard SQL.

Out-of-the-box integrated with AWS Glue Data Catalog, allowing creation of a unified metadata repository across various services, crawl data sources to discover schemas and populate your catalog with new and modified table and partition definitions, and maintain schema versioning.

25
Q

Amazon Kinesis Video Streams

A

This service makes it easy to securely stream video from connected devices to AWS for analytics, machine learning (ML), playback, and other processing.

  • Automatically provisions and elastically scales all the infrastructure needed* to ingest streaming video data from millions of devices. It also durably stores, encrypts, and indexes video data in your streams, and allows you to access your data through easy-to-use APIs.
  • Enables you to playback video for live and on-demand viewing, and quickly build applications* that take advantage of computer vision and video analytics through integration with Amazon Rekognition Video, and libraries for ML frameworks such as Apache MxNet, TensorFlow, and OpenCV.
26
Q

Reversed

This service is the most widely used cloud data warehouse. It makes it fast, simple and cost effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools.

Run complex analytic queries against terabytes to petabytes of structured and semistructured data, using sophisticated query optimization, columnar storage on high-performance storage, and massively parallel query execution. Most results come back in seconds. You can start small for just $0.25 per hour with no commitments and scale out to petabytes of data for $1,000 per terabyte per year, less than a tenth the cost of traditional on-premises solutions.

A

Amazon Redshift

27
Q

AWS Data Pipeline

A

This service is a web service that helps you reliably process and move data between different AWS compute and storage services, as well as on-premises data sources, at specified intervals.

Regularly access your data where it’s stored, transform and process it at scale, and efficiently transfer the results to AWS services such as Amazon S3 (p. 74), Amazon RDS (p. 28), Amazon DynamoDB (p. 26), and Amazon EMR (p. 11).

Easily create complex data processing workloads that are fault tolerant, repeatable, and highly available.

This service also allows you to move and process data that was previously locked up in on-premises data silos.

28
Q

Amazon Elasticsearch Service

A

Makes it easy to deploy, secure, operate, and scale. Search, analyze, and visualize data in real-time. You get easy-to-use APIs and real-time analytics capabilities to power use-cases such as log analytics, full-text search, application monitoring, and clickstream analytics, with enterprise-grade availability, scalability, and security. The service offers integrations with open-source tools like Kibana and Logstash for data ingestion and visualization.

It also integrates seamlessly with other AWS services such as Amazon Virtual Private Cloud (Amazon VPC), AWS Key Management Service (AWS KMS), Amazon Kinesis Data Firehose, AWS Lambda,
AWS Identity and Access Management (IAM), Amazon Cognito, and Amazon CloudWatch, so that you can go from raw data to actionable insights quickly.

29
Q

Reversed

This service is a massively scalable and durable real-time data streaming service. KDS can continuously capture gigabytes of data per second from hundreds of thousands of sources such as website clickstreams, database event streams, financial transactions, social media feeds, IT logs, and location-tracking events. The data collected is available in milliseconds to enable real-time analytics use cases such as real-time dashboards, real-time anomaly detection, dynamic pricing, and more.

A

Amazon Kinesis Data Streams