AI, Machine Learning, Analytics Technology and Services Flashcards

Question 1

Q

What is RedShift?

Answer

A

RedShift is a data warehousing service used for reporting and analytics that can store and query petabytes of data.

Question 2

Q

What does RedShift allow you to do with multiple sources of data?

Answer

A

RedShift allows you to combine multiple sources of data into one place, enabling you to perform analytics on the data.

Question 3

Q

What does MPP stand for and what does it mean in the context of RedShift?

Answer

A

MPP stands for massively parallel processing. In the context of RedShift, it means that RedShift is capable of running complex queries in parallel.

Question 4

Q

What are the benefits of using RedShift for OLAP?

Answer

A

RedShift is designed for Online Analytical Processing (OLAP), making it great for analytics and reporting. It provides automated data management, including backup, replication, and scaling without downtime.

Question 5

Q

What is RedShift Serverless?

Answer

A

RedShift Serverless is a serverless option of RedShift that simplifies the use of RedShift by eliminating the need to manage any infrastructure. It automatically provisions and scales everything.

Question 6

Q

What are some use cases for RedShift?

Answer

A

Some use cases for RedShift include complex querying and reporting for businesses that need to analyze large volumes of data, integration with data lakes for querying structured and unstructured data, and operational analytics for making time-sensitive decisions based on real-time data.

Question 7

Q

What are the advantages of using RedShift Serverless for unpredictable workloads?

Answer

A

RedShift Serverless eliminates the need to manage infrastructure, allowing you to focus on analyzing your data. It automatically provisions and scales everything, making it a great option for unpredictable workloads.

Question 8

Q

What is a data lake and how can RedShift integrate with it?

Answer

A

A data lake is a central repository of structured and unstructured data, often stored in S3. RedShift can integrate with a data lake, allowing you to query that data using RedShift.

Question 9

Q

What type of workload is RedShift designed for?

Answer

A

RedShift is designed for business intelligence workloads, specifically for reporting and analytics.

Question 10

Q

What are some automated data management features provided by RedShift?

Answer

A

RedShift provides automated data backup, replication, and scaling without any downtime.

Question 11

Q

What is the main purpose of Kinesis?

Answer

A

The main purpose of Kinesis is to collect, process, and analyze streaming data in real time.

Question 12

Q

What does the name “Kinesis” mean and why is it a fitting name for this service?

Answer

A

“Kinesis” is a Greek word that means movement or motion. It is a fitting name for this service because Kinesis deals with data that is in motion, moving from one place to another.

Question 13

Q

How does Kinesis data streams store and retain data?

Answer

A

Kinesis data streams store data in shards, which are sequences of data records. The data is retained by default for 24 hours, with a maximum retention of 365 days.

Question 14

Q

What is the difference between streaming data and static data?

Answer

A

Streaming data refers to data that is generated continuously by multiple data sources or producers, while static data is data that is stored on disk, in S3, or in a database.

Question 15

Q

Give examples of the types of data that can be handled by Kinesis.

Answer

A

Examples of data that can be handled by Kinesis include financial transactions, stock prices, in-game data, social media feeds, location tracking data, IoT sensor data, clickstream data, and application log files.

Question 16

Q

What are shards in Kinesis data streams?

Answer

A

Shards in Kinesis data streams are storage units that hold data records. Each data record has a unique sequence number, and a Kinesis stream is made up of one or more shards.

Question 17

Q

What role do data consumers play in the Kinesis architecture?

Answer

A

Data consumers in the Kinesis architecture consume data from the shards and process it. They can perform various actions on the data, such as running algorithms, analyzing sentiment, or generating recommendations.

Question 18

Q

Give examples of actions that data consumers can perform on the data.

Answer

A

Data consumers can perform actions such as running algorithms on stock prices, sentiment analysis on social media feeds, or analyzing clickstream data to generate product recommendations.

Question 19

Q

What are some possible destinations for data after it has been processed by data consumers?

Answer

A

After being processed by data consumers, the data can be sent to permanent storage destinations such as DynamoDB, S3, Elastic MapReduce, or Redshift.

Question 20

Q

Explain the main purpose of Kinesis data streams and Kinesis video streams.

Answer

A

Kinesis data streams is designed for handling streaming data, while Kinesis video streams is specifically designed for streaming video data from connected video devices.

Question 21

Q

What is Kinesis Data Firehose?

Answer

A

Kinesis Data Firehose, also known as Kinesis Firehose, is a fully managed service that allows you to capture, transform, and load data streams into AWS data stores for near real-time analytics.

Question 22

Q

What are the primary functions of Kinesis Data Firehose?

How does Kinesis Data Firehose handle varying data volumes?
Kinesis Data Firehose dynamically adjusts its resources to handle varying data volumes, scaling automatically.

What is the typical processing time for data in Kinesis Data Firehose?
Kinesis Data Firehose processes and delivers data within 60 seconds for timely insights.

Answer

A

The primary functions of Kinesis Data Firehose are capturing, transforming, and loading data.

Question 23

Q

Is there any data retention in Kinesis Data Firehose?

Answer

A

No, Kinesis Data Firehose does not retain data.

Question 24

Q

Can you transform data with Kinesis Data Firehose before loading it into storage?

Answer

A

Yes, you can transform and customize the data using AWS Lambda before loading it into permanent storage.

Question 25

Q

What tools can be used for analytics after data is loaded by Kinesis Data Firehose?

Answer

A

Business intelligence tools can be used for analytics after data is loaded into its final destination by Kinesis Data Firehose.

Question 26

Q

What monitoring tools are integrated with Kinesis Data Firehose?

Answer

A

Kinesis Data Firehose includes integrated monitoring with CloudWatch.

Question 27

Q

What happens if there is an error in data processing within Kinesis Data Firehose?

Answer

A

Kinesis Data Firehose has automatic error retries if something goes wrong.

Question 28

Q

Does Kinesis Data Firehose retain data temporarily?

Answer

A

No, Kinesis Data Firehose does not retain data, even temporarily.

Question 29

Q

What does a data lake refer to in the context of Kinesis Data Firehose?

Answer

A

A data lake refers to a large-scale data repository for storing streaming data.

Question 30

Q

What AWS service can you use to transform data in Kinesis Data Firehose?

Answer

A

You can use AWS Lambda to transform data in Kinesis Data Firehose.

Question 31

Q

What are some use cases for Kinesis Data Firehose?

Answer

A

Use cases include real-time analytics, feeding data into data lakes, log data management, and IoT data integration.

Question 32

Q

What are some common destinations for data after processing in Kinesis Data Firehose?

Answer

A

Common destinations include Amazon S3, Amazon Redshift, and Amazon OpenSearch Service.

Question 33

Q

What is the difference between Kinesis Data Streams and Kinesis Data Firehose?

Answer

A

Kinesis Data Streams capture and store streaming video and data, whereas Kinesis Data Firehose captures, transforms, and loads data continuously into data stores.

Question 34

Q

What is Amazon Athena?

Answer

A

Amazon Athena is an interactive query service that enables you to run standard SQL queries on data stored in Amazon S3.

Question 35

Q

What type of queries can you run with Amazon Athena?

Answer

A

You can run standard SQL queries with Amazon Athena.

Question 36

Q

What is a key feature of Amazon Athena regarding infrastructure?

Answer

A

Amazon Athena is serverless, meaning there is nothing to provision and manage.

Question 37

Q

How do you pay for using Amazon Athena?

Answer

A

You pay per query and per terabyte scanned when using Amazon Athena.

Question 38

Q

Is there a need for complex ETL processes when using Amazon Athena?

Answer

A

No, there is no need for complex extract, transform, and load (ETL) processes when using Amazon Athena. It works directly with data stored in S3.

Question 39

Q

What are some use cases for Amazon Athena?

Answer

A

Use cases for Amazon Athena include querying log files stored in S3, analyzing AWS cost and usage reports, generating business reports on data stored in S3, and running queries on clickstream data stored in S3.

Question 40

Q

What is AWS Glue used for?

Answer

A

AWS Glue is used to prepare your data for analytics and machine learning.

Question 41

Q

Why is AWS Glue important for analytics and machine learning?

Answer

A

AWS Glue is important because it prepares and transforms data, making it ready for use by analytics applications and machine learning models.

Question 42

Q

What is the purpose of the data catalog created by AWS Glue?

Answer

A

The data catalog serves as the central repository containing metadata about the data, including its type and format.

Question 43

Q

Into which AWS services can transformed data be loaded using AWS Glue?

Answer

A

Transformed data can be loaded into AWS services like RDS, Redshift, S3, or Athena.

Question 44

Q

What are some specific transformations AWS Glue can perform on data?

Answer

A

AWS Glue can categorize data, clean it, remove duplicates, and join multiple datasets.

Question 45

Q

What does AWS Glue do with your data?

Answer

A

AWS Glue crawls your data and creates the data catalog, which is the central repository containing the metadata, such as the type or format of your data.

Question 46

Q

What can AWS Glue do after creating the data catalog?

Answer

A

After creating the data catalog, AWS Glue can extract data from various sources, transform it (e.g., categorize, clean, remove duplicates, or join multiple datasets), and then load it into other AWS services.

Question 47

Q

What is AWS Data Exchange used for?

Answer

A

AWS Data Exchange allows you to securely exchange and use data provided by third parties on a subscription basis.

Question 48

Q

Who provides the data products available on AWS Data Exchange?

Answer

A

Data products are available from a variety of suppliers, including financial services, healthcare, weather, manufacturing, and telecommunications.

Question 49

Q

What can the data from AWS Data Exchange be used for?

Answer

A

The data can be used for analytics, machine learning workloads, and decision-making.

Question 50

Q

Can you give an example use case for AWS Data Exchange?

Answer

A

An example use case is analyzing customer spending patterns based on geographic location using data products provided by companies like MasterCard, Experian, and Equifax.

Question 51

Q

What is Elastic Map Reduce (EMR)?

Answer

A

Elastic Map Reduce (EMR) is a big data platform provided by AWS that supports large-scale parallel data processing and petabyte-scale interactive analysis.

Question 52

Q

What types of data does EMR support?

Answer

A

EMR supports structured data (e.g., financial transaction data), semi-structured data (e.g., text or documentation), and unstructured data (e.g., application logs or click-stream data).

Question 53

Q

Give an example of a use case for EMR.

Answer

A

One example of a use case for EMR is processing genomic data using statistical algorithms and predictive models to discover hidden patterns and find correlations.

Question 54

Q

How can EMR help in analyzing click-stream data?

Answer

A

EMR can analyze click-stream data to understand customer preferences or market trends.

Question 55

Q

What are some of the data sources from which EMR can extract data?

Answer

A

EMR can extract data from sources like S3, DynamoDB, or Redshift.

Question 56

Q

Which real-time data streaming service is compatible with EMR for event analysis?

Answer

A

EMR can be used to analyze events from streaming data sources in real time using Amazon Kinesis.

Question 57

Q

Name some popular open-source frameworks supported by EMR.

Answer

A

EMR supports popular open-source frameworks like Apache Spark, Apache Hive, Presto, and Hadoop.

Question 58

Q

What are the benefits of using EMR as a fully managed big data solution?

Answer

A

The benefits of using EMR include not having to worry about provisioning and managing infrastructure, configuring and managing open-source applications, capacity planning, and it can dynamically scale as required by the workload. It is also optimized for performance and is claimed to be faster and less costly than deploying an on-premises big data solution.

Question 59

Q

How does AWS claim EMR compares in cost to deploying your own big data solution on-premises?

Answer

A

AWS claims that EMR is less than 50% of the cost of deploying your own big data solution on-premises.

Question 60

Q

What is Amazon OpenSearch?

Answer

A

Amazon OpenSearch is a fully-managed service based on open-source Elasticsearch technology, compatible with Elasticsearch open-source APIs, Logstash for data collection and processing, and Kibana for search and data visualization.

Question 61

Q

Which open-source technologies is Amazon OpenSearch compatible with?

Answer

A

Amazon OpenSearch is compatible with industry-standard Elasticsearch open-source APIs, Logstash, and Kibana.

Question 62

Q

Why might a business choose to use Amazon OpenSearch?

Answer

A

A business might choose to use Amazon OpenSearch because it is a fully-managed service that simplifies the use of Elasticsearch open-source technology, while also supporting data collection, processing, and visualization tools like Logstash and Kibana. It is suitable for various analytics use cases, including log, application, security, and business data analytics

Question 63

Q

What AWS services can you ingest data from into Amazon OpenSearch?

Answer

A

You can ingest data into Amazon OpenSearch from AWS services such as CloudWatch Logs, S3, DynamoDB, and Firehose.

Question 64

Q

Name a tool that is used for data collection and processing in conjunction with Amazon OpenSearch.

Answer

A

Logstash is used for data collection and processing in conjunction with Amazon OpenSearch.

Answer 65

A

Kibana is used with Amazon OpenSearch for search and data visualization.

Answer 66

A

Use cases for Amazon OpenSearch include log analytics, application monitoring, security analytics, and business data analytics.

Answer 67

A

Amazon OpenSearch is a fully-managed service that is based on open-source Elasticsearch technology and is compatible with Elasticsearch open-source APIs.

Answer 68

A

Using Amazon OpenSearch, you can perform log analytics, application monitoring, security analytics, and business data analytics.

Answer 69

A

Yes, you can use Amazon OpenSearch with AWS CloudWatch Logs by ingesting data from CloudWatch Logs into Amazon OpenSearch.

Answer 70

A

AWS Data Exchange

Answer 71

A

Amazon Comprehend

Answer 72

A

Kinesis Data Firehose

Answer 73

A

Amazon MSK (Managed Streaming for Apache Kafka)

Answer 74

A

Kinesis enables you to collect, process, and analyze streaming data in real time.

Answer 75

A

Amazon Textract

Answer 76

A

Athena is an interactive query service for data in S3. It enables you to query data stored in S3 using standard SQL.

Answer 77

A

Amazon EMR (Elastic MapReduce)

Answer 78

A

Amazon CloudWatch

Answer 79

A

Trusted Advisor

Answer 80

A

AppStream will handle hosting, scaling, and user management for your application and help you convert it into a SaaS product for your employees or customers.

Answer 81

A

Generate insights and recommendations to help you adhere to the Well-Architected Framework.

Answer 82

A

Indefinitely

Answer 83

A

The Well-Architected Tool helps you use the Well-Architected Framework as a set of lenses through which to analyze your workloads. You can use it to learn about the Well-Architected Framework and generate action plans to bring your architectures into alignment with it.

Answer 84

A

AWS Config allows you to set up account-wide rules and detect non-compliant resources.

Answer 85

A

AWS Health Dashboard will give you a view of all outages across AWS, as well as a personal dashboard that displays only those services and Regions that are relevant to your cloud resources.

Answer 86

A

CloudWatch alarms can be used to send notifications or trigger automated events when metrics reach defined thresholds.

Brainscape's Knowledge GenomeTM

AI, Machine Learning, Analytics Technology and Services Flashcards

Brainscape's Knowledge Genome^TM