AI, Machine Learning, Analytics Technology and Services Flashcards

1
Q

What is RedShift?

A

RedShift is a data warehousing service used for reporting and analytics that can store and query petabytes of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does RedShift allow you to do with multiple sources of data?

A

RedShift allows you to combine multiple sources of data into one place, enabling you to perform analytics on the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does MPP stand for and what does it mean in the context of RedShift?

A

MPP stands for massively parallel processing. In the context of RedShift, it means that RedShift is capable of running complex queries in parallel.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the benefits of using RedShift for OLAP?

A

RedShift is designed for Online Analytical Processing (OLAP), making it great for analytics and reporting. It provides automated data management, including backup, replication, and scaling without downtime.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is RedShift Serverless?

A

RedShift Serverless is a serverless option of RedShift that simplifies the use of RedShift by eliminating the need to manage any infrastructure. It automatically provisions and scales everything.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some use cases for RedShift?

A

Some use cases for RedShift include complex querying and reporting for businesses that need to analyze large volumes of data, integration with data lakes for querying structured and unstructured data, and operational analytics for making time-sensitive decisions based on real-time data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the advantages of using RedShift Serverless for unpredictable workloads?

A

RedShift Serverless eliminates the need to manage infrastructure, allowing you to focus on analyzing your data. It automatically provisions and scales everything, making it a great option for unpredictable workloads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a data lake and how can RedShift integrate with it?

A

A data lake is a central repository of structured and unstructured data, often stored in S3. RedShift can integrate with a data lake, allowing you to query that data using RedShift.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What type of workload is RedShift designed for?

A

RedShift is designed for business intelligence workloads, specifically for reporting and analytics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are some automated data management features provided by RedShift?

A

RedShift provides automated data backup, replication, and scaling without any downtime.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the main purpose of Kinesis?

A

The main purpose of Kinesis is to collect, process, and analyze streaming data in real time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the name “Kinesis” mean and why is it a fitting name for this service?

A

“Kinesis” is a Greek word that means movement or motion. It is a fitting name for this service because Kinesis deals with data that is in motion, moving from one place to another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does Kinesis data streams store and retain data?

A

Kinesis data streams store data in shards, which are sequences of data records. The data is retained by default for 24 hours, with a maximum retention of 365 days.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the difference between streaming data and static data?

A

Streaming data refers to data that is generated continuously by multiple data sources or producers, while static data is data that is stored on disk, in S3, or in a database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Give examples of the types of data that can be handled by Kinesis.

A

Examples of data that can be handled by Kinesis include financial transactions, stock prices, in-game data, social media feeds, location tracking data, IoT sensor data, clickstream data, and application log files.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are shards in Kinesis data streams?

A

Shards in Kinesis data streams are storage units that hold data records. Each data record has a unique sequence number, and a Kinesis stream is made up of one or more shards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What role do data consumers play in the Kinesis architecture?

A

Data consumers in the Kinesis architecture consume data from the shards and process it. They can perform various actions on the data, such as running algorithms, analyzing sentiment, or generating recommendations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Give examples of actions that data consumers can perform on the data.

A

Data consumers can perform actions such as running algorithms on stock prices, sentiment analysis on social media feeds, or analyzing clickstream data to generate product recommendations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are some possible destinations for data after it has been processed by data consumers?

A

After being processed by data consumers, the data can be sent to permanent storage destinations such as DynamoDB, S3, Elastic MapReduce, or Redshift.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Explain the main purpose of Kinesis data streams and Kinesis video streams.

A

Kinesis data streams is designed for handling streaming data, while Kinesis video streams is specifically designed for streaming video data from connected video devices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is Kinesis Data Firehose?

A

Kinesis Data Firehose, also known as Kinesis Firehose, is a fully managed service that allows you to capture, transform, and load data streams into AWS data stores for near real-time analytics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the primary functions of Kinesis Data Firehose?

How does Kinesis Data Firehose handle varying data volumes?
Kinesis Data Firehose dynamically adjusts its resources to handle varying data volumes, scaling automatically.

What is the typical processing time for data in Kinesis Data Firehose?
Kinesis Data Firehose processes and delivers data within 60 seconds for timely insights.

A

The primary functions of Kinesis Data Firehose are capturing, transforming, and loading data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Is there any data retention in Kinesis Data Firehose?

A

No, Kinesis Data Firehose does not retain data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Can you transform data with Kinesis Data Firehose before loading it into storage?

A

Yes, you can transform and customize the data using AWS Lambda before loading it into permanent storage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What tools can be used for analytics after data is loaded by Kinesis Data Firehose?

A

Business intelligence tools can be used for analytics after data is loaded into its final destination by Kinesis Data Firehose.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What monitoring tools are integrated with Kinesis Data Firehose?

A

Kinesis Data Firehose includes integrated monitoring with CloudWatch.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What happens if there is an error in data processing within Kinesis Data Firehose?

A

Kinesis Data Firehose has automatic error retries if something goes wrong.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Does Kinesis Data Firehose retain data temporarily?

A

No, Kinesis Data Firehose does not retain data, even temporarily.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What does a data lake refer to in the context of Kinesis Data Firehose?

A

A data lake refers to a large-scale data repository for storing streaming data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What AWS service can you use to transform data in Kinesis Data Firehose?

A

You can use AWS Lambda to transform data in Kinesis Data Firehose.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What are some use cases for Kinesis Data Firehose?

A

Use cases include real-time analytics, feeding data into data lakes, log data management, and IoT data integration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What are some common destinations for data after processing in Kinesis Data Firehose?

A

Common destinations include Amazon S3, Amazon Redshift, and Amazon OpenSearch Service.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is the difference between Kinesis Data Streams and Kinesis Data Firehose?

A

Kinesis Data Streams capture and store streaming video and data, whereas Kinesis Data Firehose captures, transforms, and loads data continuously into data stores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is Amazon Athena?

A

Amazon Athena is an interactive query service that enables you to run standard SQL queries on data stored in Amazon S3.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What type of queries can you run with Amazon Athena?

A

You can run standard SQL queries with Amazon Athena.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What is a key feature of Amazon Athena regarding infrastructure?

A

Amazon Athena is serverless, meaning there is nothing to provision and manage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

How do you pay for using Amazon Athena?

A

You pay per query and per terabyte scanned when using Amazon Athena.

37
Q

Is there a need for complex ETL processes when using Amazon Athena?

A

No, there is no need for complex extract, transform, and load (ETL) processes when using Amazon Athena. It works directly with data stored in S3.

38
Q

What are some use cases for Amazon Athena?

A

Use cases for Amazon Athena include querying log files stored in S3, analyzing AWS cost and usage reports, generating business reports on data stored in S3, and running queries on clickstream data stored in S3.

39
Q

What is AWS Glue used for?

A

AWS Glue is used to prepare your data for analytics and machine learning.

40
Q

Why is AWS Glue important for analytics and machine learning?

A

AWS Glue is important because it prepares and transforms data, making it ready for use by analytics applications and machine learning models.

41
Q

What is the purpose of the data catalog created by AWS Glue?

A

The data catalog serves as the central repository containing metadata about the data, including its type and format.

42
Q

Into which AWS services can transformed data be loaded using AWS Glue?

A

Transformed data can be loaded into AWS services like RDS, Redshift, S3, or Athena.

43
Q

What are some specific transformations AWS Glue can perform on data?

A

AWS Glue can categorize data, clean it, remove duplicates, and join multiple datasets.

44
Q

What does AWS Glue do with your data?

A

AWS Glue crawls your data and creates the data catalog, which is the central repository containing the metadata, such as the type or format of your data.

45
Q

What can AWS Glue do after creating the data catalog?

A

After creating the data catalog, AWS Glue can extract data from various sources, transform it (e.g., categorize, clean, remove duplicates, or join multiple datasets), and then load it into other AWS services.

46
Q

What is AWS Data Exchange used for?

A

AWS Data Exchange allows you to securely exchange and use data provided by third parties on a subscription basis.

47
Q

Who provides the data products available on AWS Data Exchange?

A

Data products are available from a variety of suppliers, including financial services, healthcare, weather, manufacturing, and telecommunications.

48
Q

What can the data from AWS Data Exchange be used for?

A

The data can be used for analytics, machine learning workloads, and decision-making.

49
Q

Can you give an example use case for AWS Data Exchange?

A

An example use case is analyzing customer spending patterns based on geographic location using data products provided by companies like MasterCard, Experian, and Equifax.

50
Q

What is Elastic Map Reduce (EMR)?

A

Elastic Map Reduce (EMR) is a big data platform provided by AWS that supports large-scale parallel data processing and petabyte-scale interactive analysis.

51
Q

What types of data does EMR support?

A

EMR supports structured data (e.g., financial transaction data), semi-structured data (e.g., text or documentation), and unstructured data (e.g., application logs or click-stream data).

52
Q

Give an example of a use case for EMR.

A

One example of a use case for EMR is processing genomic data using statistical algorithms and predictive models to discover hidden patterns and find correlations.

53
Q

How can EMR help in analyzing click-stream data?

A

EMR can analyze click-stream data to understand customer preferences or market trends.

54
Q

What are some of the data sources from which EMR can extract data?

A

EMR can extract data from sources like S3, DynamoDB, or Redshift.

55
Q

Which real-time data streaming service is compatible with EMR for event analysis?

A

EMR can be used to analyze events from streaming data sources in real time using Amazon Kinesis.

56
Q

Name some popular open-source frameworks supported by EMR.

A

EMR supports popular open-source frameworks like Apache Spark, Apache Hive, Presto, and Hadoop.

57
Q

What are the benefits of using EMR as a fully managed big data solution?

A

The benefits of using EMR include not having to worry about provisioning and managing infrastructure, configuring and managing open-source applications, capacity planning, and it can dynamically scale as required by the workload. It is also optimized for performance and is claimed to be faster and less costly than deploying an on-premises big data solution.

58
Q

How does AWS claim EMR compares in cost to deploying your own big data solution on-premises?

A

AWS claims that EMR is less than 50% of the cost of deploying your own big data solution on-premises.

59
Q

What is Amazon OpenSearch?

A

Amazon OpenSearch is a fully-managed service based on open-source Elasticsearch technology, compatible with Elasticsearch open-source APIs, Logstash for data collection and processing, and Kibana for search and data visualization.

60
Q

Which open-source technologies is Amazon OpenSearch compatible with?

A

Amazon OpenSearch is compatible with industry-standard Elasticsearch open-source APIs, Logstash, and Kibana.

61
Q

Why might a business choose to use Amazon OpenSearch?

A

A business might choose to use Amazon OpenSearch because it is a fully-managed service that simplifies the use of Elasticsearch open-source technology, while also supporting data collection, processing, and visualization tools like Logstash and Kibana. It is suitable for various analytics use cases, including log, application, security, and business data analytics

62
Q

What AWS services can you ingest data from into Amazon OpenSearch?

A

You can ingest data into Amazon OpenSearch from AWS services such as CloudWatch Logs, S3, DynamoDB, and Firehose.

63
Q

Name a tool that is used for data collection and processing in conjunction with Amazon OpenSearch.

A

Logstash is used for data collection and processing in conjunction with Amazon OpenSearch.

64
Q

What tool can be used with Amazon OpenSearch for search and data visualization?

A

Kibana is used with Amazon OpenSearch for search and data visualization.

65
Q

List some use cases for Amazon OpenSearch.

A

Use cases for Amazon OpenSearch include log analytics, application monitoring, security analytics, and business data analytics.

66
Q

How does Amazon OpenSearch relate to Elasticsearch?

A

Amazon OpenSearch is a fully-managed service that is based on open-source Elasticsearch technology and is compatible with Elasticsearch open-source APIs.

67
Q

What kind of analytics can you perform using Amazon OpenSearch?

A

Using Amazon OpenSearch, you can perform log analytics, application monitoring, security analytics, and business data analytics.

68
Q

Can you use Amazon OpenSearch with AWS CloudWatch Logs? If so, how?

A

Yes, you can use Amazon OpenSearch with AWS CloudWatch Logs by ingesting data from CloudWatch Logs into Amazon OpenSearch.

69
Q

You are building an application that will be used to analyze customer spending patterns. Your application relies on data from third parties in the retail and financial services sector. Which of the following services can be used to securely exchange and use data provided by third parties on a subscription basis?

A

AWS Data Exchange

70
Q

Which AWS service can be used to perform sentiment analysis on customer feedback data?

A

Amazon Comprehend

71
Q

Which AWS service captures, transforms, and loads data continuously into data stores?

A

Kinesis Data Firehose

72
Q

You would like to use Apache Kafka to process a continuous stream of data that you need to track and analyze in real-time. You are looking for a fully managed service to avoid building and maintaining your own Kafka platform. Which AWS service can you use?

A

Amazon MSK (Managed Streaming for Apache Kafka)

73
Q

You would like to use deep learning technology to add natural-sounding speech to your website so that the contents of certain web pages can be read out loud to help people who are visually impaired. Which AWS service can you use to implement this?

A

Polly

74
Q

Your company needs a solution to collect website clickstream data in real time so that it can be processed for real-time insights. Which AWS service do you suggest?

A

Kinesis enables you to collect, process, and analyze streaming data in real time.

75
Q

Which AWS service can be used to extract text and data from documents, including extracting drivers licenses numbers or passport numbers to help verify the identity of loan applicants?

A

Amazon Textract

76
Q

Which AWS service can be used to create a data catalog and perform ETL (Extract, Transform, and Load) on your data so that it can be used by your data analytics and machine learning applications?

A

AWS Glue

77
Q

You need to query data stored in S3 using standard SQL queries. Which of the following AWS services will enable you to do this?

A

Athena is an interactive query service for data in S3. It enables you to query data stored in S3 using standard SQL.

78
Q

Your genomics application generates a large amount of unstructured and semi-structured data. You now require a Big Data solution so that you can process the data to identify patterns and trends, using open-source technologies like Apache Spark and Apache Hive. Which of the following services would you recommend?

A

Amazon EMR (Elastic MapReduce)

79
Q

What can you use to group and visualize AWS resources by project, environment, or application?

A

Tags

80
Q

Which service allows you to create dashboards where you visualize metrics produced by services and applications on AWS?

A

Amazon CloudWatch

81
Q

Which service provides account-wide recommendations around cost optimization, service limits, and security best practices?

A

Trusted Advisor

82
Q

Which service allows you to most easily convert an existing application into a cloud-hosted Software-as-a-Service?

A

AppStream will handle hosting, scaling, and user management for your application and help you convert it into a SaaS product for your employees or customers.

83
Q

Which of the following is NOT a function of AWS Audit Manager?

  1. Centralize audit data from AWS Config and various security services.
  2. Generate insights and recommendations to help you adhere to the Well-Architected Framework.
  3. Use pre-built frameworks to help you meet industry-specific security and configuration standards.
  4. Find root causes of noncompliance and generate reports.
A

Generate insights and recommendations to help you adhere to the Well-Architected Framework.

84
Q

How long are CloudWatch Logs stored by default?

A

Indefinitely

85
Q

What feature can be used to analyze your workloads and generate action plans to help you achieve more reliable and cost-effective architecture?

  1. Systems Manager
  2. Audit Manager
  3. AWS Config
  4. The Well-Architected Tool
A

The Well-Architected Tool helps you use the Well-Architected Framework as a set of lenses through which to analyze your workloads. You can use it to learn about the Well-Architected Framework and generate action plans to bring your architectures into alignment with it.

86
Q

What does AWS Config do?

  1. AWS Config allows you to take automated actions on large groups of cloud resources.
  2. AWS Config allows you to set up account-wide rules and detect non-compliant resources.
  3. AWS Config allows you to set up account-wide rules and enforce compliance by disallowing the creation of non-compliant resources.
  4. AWS Config allows you to audit non-compliant resources and generate audit reports.
A

AWS Config allows you to set up account-wide rules and detect non-compliant resources.

87
Q

Which service will notify you about service events, outages, planned changes, and account notifcations?

  1. Trusted Advisor
  2. CloudWatch Alarms
  3. AWS Health Dashboard
  4. Systems Manager
A

AWS Health Dashboard will give you a view of all outages across AWS, as well as a personal dashboard that displays only those services and Regions that are relevant to your cloud resources.

88
Q

How can you receive a notification when CPU utilization of your EC2 instance reaches 90%?

  1. Trusted Advisor will automatically generate a recommendation when CPU utilization of your EC2 instance reaches 90%.
  2. Systems Manager automatically tracks CPU utilization and will notify administrators when it exceeds 90% on any given instance.
  3. Create a CloudWatch alarm that triggers when CPU utilization reaches 90%.
  4. Create a CloudTrail Alarm that triggers when CPU utilization reaches 90%.
A

CloudWatch alarms can be used to send notifications or trigger automated events when metrics reach defined thresholds.

89
Q
A