AI Practice Test #2 Flashcards

1
Q

Neural network

A

Neural networks consist of layers of nodes (neurons) that process input data, adjusting the weights of connections between nodes through training to recognize patterns and make predictions

Neural networks are composed of multiple layers of interconnected nodes (neurons). These nodes process input data and adjust the weights of the connections between them during the training phase. This process allows the network to learn to recognize patterns and make predictions based on the data.

via - https://aws.amazon.com/what-is/neural-network/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Cloud computing

A

Cloud computing refers to the on-demand delivery of IT resources and applications via the internet with pay-as-you-go pricing.

Cloud computing, as defined by AWS, is the on-demand delivery of IT resources and applications over the internet with pay-as-you-go pricing. This allows businesses to access computing power, storage, and applications as needed without investing in physical infrastructure.

https://aws.amazon.com/what-is-cloud-computing/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Reinforcement learning

A

Reinforcement learning focuses on an agent learning optimal actions through interactions with the environment and feedback, while supervised learning involves training models on labeled data to make predictions

Reinforcement learning is characterized by an agent that learns to make optimal decisions through interactions with the environment, receiving feedback in the form of rewards or penalties. This feedback helps the agent learn a policy to maximize cumulative rewards. In contrast, supervised learning involves training models using labeled datasets to make predictions or classifications based on the input data.

via - https://aws.amazon.com/what-is/reinforcement-learning/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Feature engineering

A

Feature engineering for structured data often involves tasks such as normalization and handling missing values, while for unstructured data, it involves tasks such as tokenization and vectorization

Feature engineering for structured data typically includes tasks like normalization, handling missing values, and encoding categorical variables. For unstructured data, such as text or images, feature engineering involves different tasks like tokenization (breaking down text into tokens), vectorization (converting text or images into numerical vectors), and extracting features that can represent the content meaningfully.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Structured data

A

Structured data can include numerical and categorical data

structured data may require less preprocessing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Unstructured data

A

Unstructured data includes text, images, audio, et cetera

Unstructured data typically requires more extensive preprocessing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Self-supervised learning

A

It works when models are provided vast amounts of raw, almost entirely, or completely unlabeled data and then generate the labels themselves.

Foundation models use self-supervised learning to create labels from input data. In self-supervised learning, models are provided vast amounts of raw completely unlabeled data and then the models generate the labels themselves. This means no one has instructed or trained the model with labeled training data sets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Reinforcement learning

A

Reinforcement learning is a method with reward values attached to the different steps that the algorithm must go through. So the model’s goal is to accumulate as many reward points as possible and eventually reach an end goal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Supervised learning

A

In supervised learning, models are supplied with labeled and defined training data to assess for correlations. The sample data specifies both the input and the output for the model. For example, images of handwritten figures are annotated to indicate which number they correspond to. A supervised learning system could recognize the clusters of pixels and shapes associated with each number, given sufficient examples.

Data labeling is the process of categorizing input data with its corresponding defined output values. Labeled training data is required for supervised learning. For example, millions of apple and banana images would need to be tagged with the words “apple” or “banana.” Then machine learning applications could use this training data to guess the name of the fruit when given a fruit image.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data labeling

A

Data labeling is the process of categorizing input data with its corresponding defined output values. Labeled training data is required for supervised learning. For example, millions of apple and banana images would need to be tagged with the words “apple” or “banana.” Then machine learning applications could use this training data to guess the name of the fruit when given a fruit image.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Unsupervised learning

A

Unsupervised learning algorithms train on unlabeled data. They scan through new data, trying to establish meaningful connections between the inputs and predetermined outputs. They can spot patterns and categorize data. For example, unsupervised algorithms could group news articles from different news sites into common categories like sports, crime, etc. They can use natural language processing to comprehend meaning and emotion in the article.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Amazon Q Business

A

Amazon Q Business is a fully managed, generative-AI powered assistant that you can configure to answer questions, provide summaries, generate content, and complete tasks based on your enterprise data. It allows end users to receive immediate, permissions-aware responses from enterprise data sources with citations, for use cases such as IT, HR, and benefits help desks.

Amazon Q Business also helps streamline tasks and accelerate problem-solving. You can use Amazon Q Business to create and share task automation applications or perform routine actions like submitting time-off requests and sending meeting invites.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Amazon Q Developer

A

Amazon Q Developer assists developers and IT professionals with all their tasks—from coding, testing, and upgrading applications, to diagnosing errors, performing security scanning and fixes, and optimizing AWS resources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Amazon Q in QuickSight

A

With Amazon Q in QuickSight, customers get a generative BI assistant that allows business analysts to use natural language to build BI dashboards in minutes and easily create visualizations and complex calculations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Amazon Q in Connect

A

Amazon Connect is the contact center service from AWS. Amazon Q helps customer service agents provide better customer service. Amazon Q in Connect uses real-time conversation with the customer along with relevant company content to automatically recommend what to say or what actions an agent should take to better assist customers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

SageMaker model cards

A

SageMaker model cards include information about the model such as intended use and risk rating of a model, training details and metrics, evaluation results, and observations. AI service cards provide transparency about AWS AI services’ intended use, limitations, and potential impacts

You can use Amazon SageMaker Model Cards to document critical details about your machine learning (ML) models in a single place for streamlined governance and reporting. You can catalog details such as the intended use and risk rating of a model, training details and metrics, evaluation results and observations, and additional call-outs such as considerations, recommendations, and custom information.

AI Service Cards are a form of responsible AI documentation that provides customers with a single place to find information on the intended use cases and limitations, responsible AI design choices, and deployment and performance optimization best practices for AI services from AWS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Token

A

A token is a sequence of characters that a model can interpret or predict as a single unit of meaning

A sequence of characters that a model can interpret or predict as a single unit of meaning. For example, with text models, a token could correspond not just to a word, but also to a part of a word with grammatical meaning (such as “-ed”), a punctuation mark (such as “?”), or a common phrase (such as “a lot”).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Embedding

A

Embedding is a vector of numerical values that represents condensed information obtained by transforming input into that vector

The process of condensing information by transforming input into a vector of numerical values, known as the embeddings, in order to compare the similarity between different objects by using a shared numerical representation. For example, sentences can be compared to determine the similarity in meaning, images can be compared to determine visual similarity, or text and image can be compared to see if they’re relevant to each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Knowledge Bases for Amazon Bedrock

A

Use Knowledge Bases for Amazon Bedrock to supplement contextual information from the company’s private data to the FM using Retrieval Augmented Generation (RAG)

With the comprehensive capabilities of Amazon Bedrock, you can experiment with a variety of top FMs, customize them privately with your data using techniques such as fine-tuning and retrieval-augmented generation (RAG), and create managed agents that execute complex business tasks—from booking travel and processing insurance claims to creating ad campaigns and managing inventory—all without writing any code.

Using Knowledge Bases for Amazon Bedrock, you can provide foundation models with contextual information from your company’s private data for Retrieval Augmented Generation (RAG), enhancing response relevance and accuracy. This fully managed feature handles the entire RAG workflow, eliminating the need for custom data integrations and management.

via - https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Retrieval Augmented Generation (RAG)

A

Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources before generating a response. Large Language Models (LLMs) are trained on vast volumes of data and use billions of parameters to generate original output for tasks like answering questions, translating languages, and completing sentences. RAG extends the already powerful capabilities of LLMs to specific domains or an organization’s internal knowledge base, all without the need to retrain the model. It is a cost-effective approach to improving LLM output so it remains relevant, accurate, and useful in various contexts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Reinforcement learning from human feedback (RLHF)

A

Reinforcement learning from human feedback (RLHF) is a machine learning (ML) technique that uses human feedback to optimize ML models to self-learn more efficiently. Reinforcement learning (RL) techniques train software to make decisions that maximize rewards, making their outcomes more accurate. RLHF incorporates human feedback in the rewards function, so the ML model can perform tasks more aligned with human goals, wants, and needs. RLHF is used throughout generative artificial intelligence (generative AI) applications, including in large language models (LLM).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Small language model (SLM)

A

A small language model (SLM) is an AI model designed to process and generate human language, with a compact architecture, fewer parameters, and lower computational requirements compared to large language models (LLMs).

A small language model (SLM) optimized for deployment on edge devices is specifically designed to be lightweight, efficient, and capable of running on devices with limited computational resources. Deploying the model directly on the edge device eliminates the need for network communication with a central server, thereby achieving the required low-latency inference needed for real-time IoT applications.

https://aws.amazon.com/about-aws/whats-new/2024/05/amazon-bedrock-mistral-small-foundation-model/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Edge device

A

In computer networking, an edge device is a device that provides an entry point into enterprise or service provider core networks.[1] Examples include routers,[2] routing switches, integrated access devices (IADs), multiplexers, and a variety of metropolitan area network (MAN) and wide area network (WAN) access devices. Edge devices also provide connections into carrier and service provider networks. An edge device that connects a local area network to a high speed switch or backbone (such as an ATM switch) may be called an edge concentrator.[3]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Central API

A

Central API and asynchronous inference endpoint introduces network latency.

using a central API with asynchronous inference endpoints still involves network communication that can result in latency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

AWS Audit Manager

A

AWS Audit Manager helps automate the collection of evidence to continuously audit your AWS usage. It simplifies the process of assessing risk and compliance with regulations and industry standards, making it an essential tool for governance in AI systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

AWS Artifact

A

AWS Artifact provides on-demand access to AWS’ compliance reports and online agreements. It is useful for obtaining compliance documentation but does not provide continuous auditing or automated evidence collection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

AWS Trusted Advisor

A

AWS Trusted Advisor offers guidance to help optimize your AWS environment for cost savings, performance, security, and fault tolerance. While it provides recommendations for best practices, it does not focus on auditing or evidence collection for compliance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

AWS CloudTrail

A

AWS CloudTrail records AWS API calls for auditing purposes and delivers log files for compliance and operational troubleshooting. It is crucial for tracking user activity but does not automate compliance assessments or evidence collection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Foundation Model: Amazon Titan

A

Amazon Titan foundation models, developed by Amazon Web Services (AWS), are pre-trained on extensive datasets, making them robust and versatile models suitable for a wide range of applications. Amazon Titan foundation models (FMs) provide customers with a breadth of high-performing image, multimodal, and text model choices, via a fully managed API. Amazon Titan models are created by AWS and pretrained on large datasets, making them powerful, general-purpose models built to support a variety of use cases, while also supporting the responsible use of AI.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Foundation Model: Llama

A

Llama is a series of large language models trained on publicly available data. They are built on the transformer architecture, enabling them to handle input sequences of any length and produce output sequences of varying lengths. A notable feature of Llama models is their capacity to generate coherent and contextually appropriate text.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Foundation Model: Jurassic

A

Jurassic family of models from AI21 Labs supported use cases such as question answering, summarization, draft generation, advanced information extraction, and ideation for tasks requiring intricate reasoning and logic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Foundation Model: Claude

A

Claude is Anthropic’s frontier, state-of-the-art large language model that offers important features for enterprises like advanced reasoning, vision analysis, code generation, and multilingual processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Amazon SageMaker Model Dashboard

A

Amazon SageMaker Model Dashboard is a centralized repository of all models created in your account. The models are generally the outputs of SageMaker training jobs, but you can also import models trained elsewhere and host them on SageMaker. Model Dashboard provides a single interface for IT administrators, model risk managers, and business leaders to track all deployed models and aggregate data from multiple AWS services to provide indicators about how your models are performing. You can view details about model endpoints, batch transform jobs, and monitoring jobs for additional insights into model performance.

The dashboard’s visual display helps you quickly identify which models have missing or inactive monitors, so you can ensure all models are periodically checked for data drift, model drift, bias drift, and feature attribution drift. Lastly, the dashboard’s ready access to model details helps you dive deep, so you can access logs, infrastructure-related information, and resources to help you debug monitoring failures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Amazon SageMaker Model Monitor

A

Amazon SageMaker Model Monitor monitors the quality of Amazon SageMaker machine learning models in production. With Model Monitor, you can set up: Continuous monitoring with a real-time endpoint, Continuous monitoring with a batch transform job that runs regularly, and On-schedule monitoring for asynchronous batch transform jobs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Amazon SageMaker Ground Truth

A

Amazon SageMaker Ground Truth offers the most comprehensive set of human-in-the-loop capabilities, allowing you to harness the power of human feedback across the ML lifecycle to improve the accuracy and relevancy of models. You can complete a variety of human-in-the-loop tasks with SageMaker Ground Truth, from data generation and annotation to model review, customization, and evaluation, either through a self-service or an AWS-managed offering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Amazon SageMaker Clarify

A

SageMaker Clarify helps identify potential bias during data preparation without writing code. You specify input features, such as gender or age, and SageMaker Clarify runs an analysis job to detect potential bias in those features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Multimodal model

A

A multimodal model can accept a mix of input types such as audio/text and create a mix of output types such as video/image

A multimodal model is an artificial intelligence system designed to process and understand multiple types of data, such as text, images, audio, and video. Unlike unimodal models, which handle a single type of data, multimodal models can integrate and make sense of information from various sources, allowing them to perform more complex and versatile tasks.

Multimodal models represent a significant advancement in AI, enabling the integration and understanding of multiple types of data. By combining different modalities, these models can perform a wide range of complex tasks, making them highly versatile and powerful tools in various fields.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Amazon Textract

A

Automatically extract printed text, handwriting, layout elements, and data from any document. Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, layout elements, and data from scanned documents. It goes beyond simple optical character recognition (OCR) to identify, understand, and extract specific data from documents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Amazon Forecast

A

Forecast business outcomes easily and accurately using machine learning. Amazon Forecast uses machine learning (ML) to generate more accurate demand forecasts with just a few clicks, without requiring any prior ML experience. Amazon Forecast uses ML to learn not only the best algorithm for each item but the best ensemble of algorithms for each item, automatically creating the best model for your data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Amazon Kendra

A

Easy-to-use enterprise search service that’s powered by machine learning. Amazon Kendra is a highly accurate and easy-to-use enterprise search service that’s powered by machine learning (ML). It allows developers to add search capabilities to their applications so their end users can discover information stored within the vast amount of content spread across their company.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Fine-tuning

A

Fine-tuning changes the weights of the FM.

Fine-tuning a pre-trained foundation model is an affordable way to take advantage of their broad capabilities while customizing a model on your own small, corpus. Fine-tuning is a customization method that involved further training and does change the weights of your model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Retrieval-augmented generation (RAG)

A

Retrieval-augmented generation (RAG) does not change the weights of the FM.

Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources before generating a response. Large Language Models (LLMs) are trained on vast volumes of data and use billions of parameters to generate original output for tasks like answering questions, translating languages, and completing sentences. RAG extends the already powerful capabilities of LLMs to specific domains or an organization’s internal knowledge base, all without the need to retrain the model.

Retrieval Augmented Generation (RAG) allows you to customize a model’s responses when you want the model to consider new knowledge or up-to-date information. When your data changes frequently, like inventory or pricing, it’s not practical to fine-tune and update the model while it’s serving user queries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Prompt engineering

A

Prompt engineering does NOT change the weights of the FM.

Another recommended way to first customize a foundation model to a specific use case is through prompt engineering. Providing your foundation model with well-engineered, context-rich prompts can help achieve desired results without any fine-tuning or changing of model weights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Amazon SageMaker Model Cards

A

Describes how a model should be used in a production environment

Use Amazon SageMaker Model Cards to document critical details about your machine learning (ML) models in a single place for streamlined governance and reporting.

Catalog details such as the intended use and risk rating of a model, training details and metrics, evaluation results and observations, and additional call-outs such as considerations, recommendations, and custom information.

Model cards provide prescriptive guidance on what information to document and include fields for custom information. Specifying the intended uses of a model helps ensure that model developers and users have the information they need to train or deploy the model responsibly.

The intended uses of a model go beyond technical details and describe how a model should be used in production, the scenarios in which is appropriate to use a model, and additional considerations such as the type of data to use with the model or any assumptions made during development.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Amazon Textract

A

Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, layout elements, and data from scanned documents. It goes beyond simple optical character recognition (OCR) to identify, understand, and extract specific data from documents. Today, many companies manually extract data from scanned documents such as PDFs, images, tables, and forms, or through simple OCR software that requires manual configuration (which often must be updated when the form changes). To overcome these manual and expensive processes, Textract uses ML to read and process any type of document, accurately extracting text, handwriting, tables, and other data with no manual effort.

You can use one of AWS’s pre-trained or custom features to quickly automate document processing, whether you’re automating loan processing or extracting information from invoices and receipts. Textract provides you the ability to customize the pre-trained features to meet the document processing needs specific to your business. Textract can extract the data in minutes instead of hours or days.

Textract use cases: via - https://docs.aws.amazon.com/textract/latest/dg/what-is.html

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Amazon Transcribe

A

Amazon Transcribe is an automatic speech recognition service that uses machine learning models to convert audio to text. You can use Amazon Transcribe as a standalone transcription service or add speech-to-text capabilities to any application.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Amazon Comprehend

A

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find meaning and insights in text. Natural Language Processing (NLP) is a way for computers to analyze, understand, and derive meaning from textual information in a smart and useful way. By utilizing NLP, you can extract important phrases, sentiments, syntax, key entities such as brand, date, location, person, etc., and the language of the text.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Amazon Rekognition

A

Amazon Rekognition is a cloud-based image and video analysis service that makes it easy to add advanced computer vision capabilities to your applications. The service is powered by proven deep learning technology and it requires no machine learning expertise to use. Amazon Rekognition includes a simple, easy-to-use API that can quickly analyze any image or video file that’s stored in Amazon S3. While Rekognition can be used to extract text from images, Rekognition specializes in identifying text located spatially within an image, for instance, words displayed on street signs, t-shirts, or license plates. It’s not the ideal choice for images containing more than 100 words, as this exceeds its limitation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Temperature

A

Temperature is a value between 0 and 1, and it regulates the creativity of the model’s responses. Use a lower temperature if you want more deterministic responses. Use a higher temperature if you want creative or different responses for the same prompt on Amazon Bedrock and this is how you might see hallucination responses.

A lower value of temperature results in deterministic responses, so there are fewer chances of hallucinations.

A higher temperature results in a higher likelihood of hallucinations.

via - https://docs.aws.amazon.com/bedrock/latest/userguide/inference-parameters.html

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Hierarchical relationship between Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL), and Generative AI (GenAI)?

A

Artificial Intelligence > Machine Learning > Deep Learning > Generative AI

The correct hierarchy is as follows:

Artificial Intelligence (AI): The broadest field encompassing all aspects of creating machines that can perform tasks that typically require human intelligence.

Machine Learning (ML): A subset of AI focused on algorithms and statistical models that enable machines to improve their performance on tasks through experience.

Deep Learning (DL): A subset of ML that uses neural networks with many layers to learn from large amounts of data, allowing for more complex and abstract representations.

Generative AI (GenAI): A subset of Deep Learning focused on models that can generate new content, such as text, images, or music, by learning from existing data.

via - https://docs.aws.amazon.com/whitepapers/latest/aws-caf-for-ai/aws-caf-for-ai.html

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Amazon SageMaker Data Wrangler

A

Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare tabular and image data for ML from weeks to minutes. With SageMaker Data Wrangler, you can simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow (including data selection, cleansing, exploration, visualization, and processing at scale) from a single visual interface. You can use SQL to select the data that you want from various data sources and import it quickly. Next, you can use the data quality and insights report to automatically verify data quality and detect anomalies, such as duplicate rows and target leakage. SageMaker Data Wrangler contains over 300 built-in data transformations, so you can quickly transform data without writing code.

With the SageMaker Data Wrangler data selection tool, you can quickly access and select your tabular and image data from various popular sources - such as Amazon Simple Storage Service (Amazon S3), Amazon Athena, Amazon Redshift, AWS Lake Formation, Snowflake, and Databricks - and over 50 other third-party sources - such as Salesforce, SAP, Facebook Ads, and Google Analytics. You can also write queries for data sources using SQL and import data directly into SageMaker from various file formats, such as CSV, Parquet, JSON, and database tables.

How Data Wrangler works: via - https://aws.amazon.com/sagemaker/data-wrangler/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

Amazon SageMaker Model Dashboard

A

Amazon SageMaker Model Dashboard is a centralized portal, accessible from the SageMaker console, where you can view, search, and explore all of the models in your account. You can track which models are deployed for inference and if they are used in batch transform jobs or hosted on endpoints.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

Amazon SageMaker Clarify

A

SageMaker Clarify helps identify potential bias during data preparation without writing code. You specify input features, such as gender or age, and SageMaker Clarify runs an analysis job to detect potential bias in those features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

Amazon SageMaker Feature Store

A

Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models. Features are inputs to ML models used during training and inference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

AWS Trainium

A

Leverage AWS Trainium for high-performance, cost-effective Deep Learning training.

AWS Trainium is the machine learning (ML) chip that AWS purpose-built for deep learning (DL) training of 100B+ parameter models. Each Amazon Elastic Compute Cloud (Amazon EC2) Trn1 instance deploys up to 16 Trainium accelerators to deliver a high-performance, low-cost solution for DL training in the cloud.

https://aws.amazon.com/machine-learning/trainium/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

AWS Inferentia

A

Leverage AWS Inferentia for the deep learning (DL) and generative AI inference applications

AWS Inferentia accelerators are designed by AWS to deliver high performance at the lowest cost in Amazon EC2 for your deep learning (DL) and generative AI inference applications. The first-generation AWS Inferentia accelerator powers Amazon Elastic Compute Cloud (Amazon EC2) Inf1 instances, which deliver up to 2.3x higher throughput and up to 70% lower cost per inference than comparable Amazon EC2 instances.

https://aws.amazon.com/machine-learning/inferentia/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

Image processing

A

Image processing focuses on enhancing and manipulating images for visual quality

Image processing is primarily concerned with the techniques used to enhance and manipulate images, such as filtering, noise reduction, and image transformation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

Computer vision

A

Computer vision involves interpreting and understanding the content of images to make decisions

Computer vision, on the other hand, focuses on interpreting and understanding the content of images to make decisions, such as object detection, facial recognition, and scene understanding. Computer vision often uses machine learning algorithms to achieve these tasks.

https://aws.amazon.com/what-is/computer-vision/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

Inference parameter: Response length

A

Response length

Response length represents the minimum or maximum number of tokens to return in the generated response.

via - https://docs.aws.amazon.com/bedrock/latest/userguide/inference-parameters.html

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

Inference parameter: Stop sequence

A

Stop sequences specify the sequences of characters that stop the model from generating further tokens. If the model generates a stop sequence that you specify, it will stop generating after that sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

Inference parameter: Top P

A

Top P represents the percentage of most likely candidates that the model considers for the next token.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

Inference parameter: Top K

A

Top K represents the number of most likely candidates that the model considers for the next token.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

Amazon Q Developer

A

(1) Understand and manage your cloud infrastructure on AWS

Amazon Q Developer helps you understand and manage your cloud infrastructure on AWS. With this capability, you can list and describe your AWS resources using natural language prompts, minimizing friction in navigating the AWS Management Console and compiling all information from documentation pages.

For example, you can ask Amazon Q Developer, “List all of my Lambda functions”. Then, Amazon Q Developer returns the response with a set of my AWS Lambda functions as requested, as well as deep links so you can navigate to each resource easily.

(2) Get answers to your AWS account-specific cost-related questions using natural language

Amazon Q Developer can get answers to AWS cost-related questions using natural language. This capability works by retrieving and analyzing cost data from AWS Cost Explorer.

via - https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/what-is.html

via - https://aws.amazon.com/blogs/aws/amazon-q-developer-now-generally-available-includes-new-capabilities-to-reimagine-developer-experience/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

Rule-Based Application

A

A rule-based application is the most suitable choice for this scenario. Probability questions, like calculating the chance of drawing a spade from a deck of cards, are based on well-defined mathematical rules and formulas. A rule-based system can be programmed with these rules to provide precise answers to such questions, making it an efficient and straightforward solution. This approach ensures accuracy, is easy to implement, and requires no training data, making it ideal for helping students understand fundamental mathematical concepts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

Reinforcement Learning (RL)

A

Reinforcement Learning (RL) is a machine learning technique used for decision-making tasks where an agent learns by interacting with an environment and receiving feedback in the form of rewards or penalties. RL is better suited for dynamic and complex environments, such as games or robotic control, where exploration and adaptation are necessary. It is not appropriate for solving straightforward mathematical problems with well-defined answers, as it does not leverage existing mathematical rules and requires significant computational resources for training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

Supervised learning

A

Supervised learning involves training a model on a labeled dataset to predict outcomes based on input features. While effective for tasks like image recognition or language translation, it is not suitable for answering mathematical questions that have precise, rule-based answers. Building a dataset of probability questions and answers would be inefficient and unnecessary, as the app can directly use mathematical formulas to provide correct responses without requiring model training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

Unsupervised learning

A

Unsupervised learning is designed to identify patterns and structures in data without any predefined labels, making it useful for tasks such as clustering or dimensionality reduction. However, it is not applicable for answering specific mathematical questions like those involving probability, which require exact calculations based on established mathematical principles. Therefore, unsupervised learning does not provide a direct or efficient means to achieve the app’s objective.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

Amazon Inspector

A

Amazon Inspector is an automated security assessment service that helps improve the security and compliance of applications deployed on AWS. It automatically assesses applications for exposure, vulnerabilities, and deviations from best practices, making it an essential tool for ensuring the security of AI systems.

69
Q

AWS Config

A

AWS Config is a service that enables you to assess, audit, and evaluate the configurations of your AWS resources. While it is important for governance and compliance monitoring, it does not perform automated security assessments of applications.

70
Q

AWS Audit Manager

A

AWS Audit Manager helps you continuously audit your AWS usage to simplify how you assess risk and compliance with regulations and industry standards. It focuses on audit and compliance reporting rather than automated security assessments.

71
Q

AWS Artifact

A

AWS Artifact provides on-demand access to AWS’ compliance reports and online agreements. It helps with compliance reporting but does not offer automated security assessments of applications.

72
Q

Data access control

A

Data access control involves authentication and authorization of users

Data access control is about managing who can access data and what actions they can perform, typically through mechanisms like authentication and authorization.

73
Q

Data Integrity

A

Data integrity ensures the data is accurate, consistent, and unaltered

Data integrity, on the other hand, focuses on maintaining the accuracy, consistency, and trustworthiness of data throughout its lifecycle, ensuring that data remains unaltered and accurate during storage, processing, and transmission.

74
Q

Model Evaluation

A

Model evaluation on Amazon Bedrock involves a comprehensive process of preparing data, training models, selecting appropriate metrics, testing and analyzing results, ensuring fairness and bias detection, tuning performance, and continuous monitoring. Model Evaluation on Amazon Bedrock helps you to incorporate Generative AI into your application by giving you the power to select the foundation model that gives you the best results for your particular use case.

75
Q

Amazon Bedrock Guardrails

A

Guardrails for Amazon Bedrock enables you to implement safeguards for your generative AI applications based on your use cases and responsible AI policies. You can create multiple guardrails tailored to different use cases and apply them across multiple foundation models (FM), providing a consistent user experience and standardizing safety and privacy controls across generative AI applications. You can use guardrails with text-based user inputs and model responses.

76
Q

Amazon SageMaker Model Monitor

A

This tool is used for monitoring machine learning models in production to detect data and prediction quality issues. While it helps maintain model performance, it does not assist in model selection or content moderation.

77
Q

Amazon Comprehend

A

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. It is not specifically designed for selecting models or moderating content generated by LLMs.

78
Q

Amazon SageMaker Clarify

A

Amazon SageMaker Clarify is used to detect bias in machine learning models and data. While it is crucial for ensuring fairness and transparency, it does not help with model selection or content moderation for generative AI applications.

79
Q

Toxicity

A

Toxicity refers to AI model-generated content that can be deemed as offensive, disturbing, or inappropriate.

an example of toxicity, where the AI model generates harmful or offensive content about a specific group.

80
Q

Hallucination

A

Hallucination refers to AI model-generated assertions or claims that sound true but are incorrect

an example of hallucination, where the AI model generates an irrelevant or incorrect response

81
Q

Foundation models

A

Foundation models can perform a wide range of tasks across different domains by leveraging their extensive pre-training on large datasets

Foundation models are a form of generative artificial intelligence (generative AI). They generate output from one or more inputs (prompts) in the form of human language instructions.

In general, an FM uses learned patterns and relationships to predict the next item in a sequence. For example, with image generation, the model analyzes the image and creates a sharper, more clearly defined version of the image. Similarly, with text, the model predicts the next word in a string of text based on the previous words and their context. It then selects the next word using probability distribution techniques.

Foundation models use self-supervised learning to create labels from input data. This means no one has instructed or trained the model with labeled training data sets. This feature separates LLMs from previous ML architectures, which use supervised or unsupervised learning.

Foundation models, even though are pre-trained, can continue to learn from data inputs or prompts during inference. This means that you can develop comprehensive outputs through carefully curated prompts. Tasks that FMs can perform include language processing, visual comprehension, code generation, and human-centered engagement.

via - https://aws.amazon.com/what-is/foundation-models/

82
Q

Feature Engineering

A

Feature Engineering involves selecting, modifying, or creating features from raw data to improve the performance of machine learning models, and it is important because it can significantly enhance model accuracy and efficiency

Feature Engineering is the process of selecting, modifying, or creating new features from raw data to enhance the performance of machine learning models. It is crucial because it can lead to significant improvements in model accuracy and efficiency by providing the model with better representations of the data.

via - https://docs.aws.amazon.com/wellarchitected/latest/machine-learning-lens/feature-engineering.html

83
Q

ChatGPT

A

ChatGPT

ChatGPT or Chat Generative Pretrained Transformer is an example of a Transformer model. Transformer-based models use a self-attention mechanism. They weigh the importance of different parts of an input sequence when processing each element in the sequence.

To understand how transformer-based models work, imagine a sentence as a sequence of words. Self-attention helps the model focus on the relevant words as it processes each word. To capture different types of relationships between words, the transformer-based generative model employs multiple encoder layers called attention heads. Each head learns to attend to different parts of the input sequence. This allows the model to simultaneously consider various aspects of the data.

84
Q

Diffusion model

A

Diffusion models work by first corrupting data with noise through a forward diffusion process and then learning to reverse this process to denoise the data. They use neural networks to predict and remove the noise step by step, ultimately generating new, structured data from random noise.

85
Q

Amazon Comprehend

A

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find meaning and insights in text. Natural Language Processing (NLP) is a way for computers to analyze, understand, and derive meaning from textual information in a smart and useful way. By utilizing NLP, you can extract important phrases, sentiments, syntax, key entities such as brand, date, location, person, etc., and the language of the text.

You can use Amazon Comprehend to identify the language of the text, extract key phrases, places, people, brands, or events, understand sentiment about products or services, and identify the main topics from a library of documents. The source of this text could be web pages, social media feeds, emails, or articles. You can also feed Amazon Comprehend a set of text documents, and it will identify topics (or groups of words) that best represent the information in the collection. The output from Amazon Comprehend can be used to understand customer feedback, provide a better search experience through search filters, and use topics to categorize documents.

How Amazon Comprehend works: via - https://aws.amazon.com/comprehend/

86
Q

Amazon Transcribe

A

Amazon Transcribe is an automatic speech recognition service that uses machine learning models to convert audio to text. You can use Amazon Transcribe as a standalone transcription service or add speech-to-text capabilities to any application.

87
Q

Amazon Translate

A

Amazon Translate is a text translation service that uses advanced machine learning technologies to provide high-quality translation on demand. You can use Amazon Translate to translate unstructured text documents or to build applications that work in multiple languages.

88
Q

Amazon Rekognition

A

Amazon Rekognition is a cloud-based image and video analysis service that makes it easy to add advanced computer vision capabilities to your applications. You can add features that detect objects, text, and unsafe content, analyze images/videos, and compare faces to your application using Rekognition’s APIs.

89
Q

Machine learning implementation

A

Difficulty in collecting and preparing high-quality data for training models

One of the main challenges in machine learning implementation is the difficulty in collecting and preparing high-quality data for training models. High-quality data is essential for building effective machine learning models, and ensuring that the data is clean, relevant, and well-prepared can be a complex and time-consuming process.

There are many machine learning algorithms available, but the challenge lies in other aspects of implementation.

While computational power can be a challenge for very large models, it is not a primary challenge for most machine learning implementations due to the availability of powerful computing resources.

Machine learning has a wide range of applications in real-world scenarios, and its use is not particularly limited.

90
Q

How can you prevent model-overfitting in machine learning?

A

By using techniques such as cross-validation, regularization, and pruning to simplify the model and improve its generalization

To prevent overfitting, techniques such as cross-validation, regularization, and pruning are employed. Cross-validation helps ensure the model generalizes well to unseen data by dividing the data into multiple training and validation sets. Regularization techniques, such as L1 and L2 regularization, penalize complex models to reduce overfitting. Pruning simplifies decision trees by removing branches that have little importance.

via - https://docs.aws.amazon.com/machine-learning/latest/dg/model-fit-underfitting-vs-overfitting.html

Increasing the complexity of the model can lead to overfitting, as it may start capturing noise or random fluctuations in the training data.

Training on a small subset of data may lead to underfitting rather than preventing overfitting.

Avoiding model validation or testing does not prevent overfitting; it is essential to validate and test models to ensure they generalize well to new data.

91
Q

Model training in Deep Learning

A

Model training in deep learning involves using large datasets to adjust the weights and biases of a neural network through multiple iterations, using techniques such as gradient descent to minimize the error

In Deep Learning, model training involves feeding large datasets into the neural network and adjusting the weights and biases through multiple iterations. Techniques such as gradient descent are used to minimize the error by computing the gradient of the loss function and updating the weights to reduce the prediction error. Model training in deep learning involves initializing a neural network, feeding it data, calculating losses, adjusting weights using optimization algorithms, and iterating through this process until the model achieves satisfactory performance. Proper data preparation, validation, and hyperparameter tuning are crucial steps to ensure the model generalizes well to new, unseen data.

https://aws.amazon.com/what-is/artificial-intelligence/

Weights and biases in a neural network are not set manually; they are learned during the training process.

Data is crucial for training deep learning models; the network learns from input data.

Deep learning primarily uses neural networks rather than support vector machines and decision trees, which are more common in traditional machine learning.

92
Q

Inference Parameter: Top P

A

Top P represents the percentage of most likely candidates that the model considers for the next token. Choose a lower value to decrease the size of the pool and limit the options to more likely outputs. Choose a higher value to increase the size of the pool and allow the model to consider less likely outputs.

via - https://docs.aws.amazon.com/bedrock/latest/userguide/inference-parameters.html

93
Q

Inference Parameter: Temperature

A

Temperature is a value between 0 and 1, and it regulates the creativity of the model’s responses. Use a lower temperature if you want more deterministic responses, and use a higher temperature if you want more creative or different responses for the same prompt on Amazon Bedrock.

94
Q

Inference Parameter: Top K

A

Top K represents the number of most likely candidates that the model considers for the next token. Choose a lower value to decrease the size of the pool and limit the options to more likely outputs. Choose a higher value to increase the size of the pool and allow the model to consider less likely outputs.

95
Q

Inference Parameter: Stop sequences

A

Stop sequences specify the sequences of characters that stop the model from generating further tokens. If the model generates a stop sequence that you specify, it will stop generating after that sequence.

96
Q

Amazon Q in Connect

A

Amazon Connect is the contact center service from AWS. Amazon Q helps customer service agents provide better customer service. Amazon Q in Connect uses real-time conversation with the customer along with relevant company content to automatically recommend what to say or what actions an agent should take to better assist customers.

97
Q

Amazon Q Developer

A

Amazon Q Developer assists developers and IT professionals with all their tasks—from coding, testing, and upgrading applications, to diagnosing errors, performing security scanning and fixes, and optimizing AWS resources.

98
Q

Amazon Q Business

A

Amazon Q Business is a fully managed, generative-AI powered assistant that you can configure to answer questions, provide summaries, generate content, and complete tasks based on your enterprise data. It allows end users to receive immediate, permissions-aware responses from enterprise data sources with citations, for use cases such as IT, HR, and benefits help desks.

99
Q

Amazon Q in QuickSight

A

With Amazon Q in QuickSight, customers get a generative BI assistant that allows business analysts to use natural language to build BI dashboards in minutes and easily create visualizations and complex calculations.

100
Q

Semi-supervised learning

A

Semi-supervised learning is when you apply both supervised and unsupervised learning techniques to a common problem. This technique relies on using a small amount of labeled data and a large amount of unlabeled data to train systems. First, the labeled data is used to partially train the machine learning algorithm. After that, the partially trained algorithm labels the unlabeled data. This process is called pseudo-labeling. The model is then re-trained on the resulting data mix without being explicitly programmed.

via - https://aws.amazon.com/compare/the-difference-between-machine-learning-supervised-and-unsupervised/

Fraud identification

Within a large set of transactional data, there’s a subset of labeled data where experts have confirmed fraudulent transactions. For a more accurate result, the machine learning solution would train first on the unlabeled data and then with the labeled data.

Sentiment analysis

When considering the breadth of an organization’s text-based customer interactions, it may not be cost-effective to categorize or label sentiment across all channels. An organization could train a model on the larger unlabeled portion of data first, and then a sample that has been labeled. This would provide the organization with a greater degree of confidence in customer sentiment across the business.

101
Q

Neural network

A

supervised learning

A neural network solution is a more complex supervised learning technique. To produce a given outcome, it takes some given inputs and performs one or more layers of mathematical transformation based on adjusting data weightings. An example of a neural network technique is predicting a digit from a handwritten image.

102
Q

Clustering

A

unsupervised learning

Clustering is an unsupervised learning technique that groups certain data inputs, so they may be categorized as a whole. There are various types of clustering algorithms depending on the input data. An example of clustering is identifying different types of network traffic to predict potential security incidents.

103
Q

Dimensionality reduction

A

unsupervised learning

Dimensionality reduction is an unsupervised learning technique that reduces the number of features in a dataset. It’s often used to preprocess data for other machine learning functions and reduce complexity and overheads. For example, it may blur out or crop background features in an image recognition application.

104
Q

Amazon Bedrock and its capabilities

A

With Amazon Bedrock, you will be charged for model inference and customization. You have a choice of two pricing plans for inference: 1. On-Demand and Batch: This mode allows you to use FMs on a pay-as-you-go basis without having to make any time-based term commitments. 2. Provisioned Throughput: This mode allows you to provision sufficient throughput to meet your application’s performance requirements in exchange for a time-based term commitment.

Smaller models are cheaper to use than larger models

The cost of generative AI models can vary. It’s important to weigh the trade-offs between model size and speed. Larger models tend to be more accurate but are costly and have limited deployment options. In contrast, smaller models are more affordable and faster, offering more deployment flexibility.

You can use a customized model only in the Provisioned Throughput mode

With the Provisioned Throughput mode, you can purchase model units for a specific base or custom model. The Provisioned Throughput mode is primarily designed for large consistent inference workloads that need guaranteed throughput. Custom models can only be accessed using Provisioned Throughput.

via - https://aws.amazon.com/bedrock/pricing/

105
Q

Amazon Comprehend

A

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to discover insights from text. Amazon Comprehend provides Custom Entity Recognition, Custom Classification, Key Phrase Extraction, Sentiment Analysis, Entity Recognition, and more APIs so you can easily integrate natural language processing into your applications. You simply call the Amazon Comprehend APIs in your application and provide the location of the source document or text. The APIs will output entities, key phrases, sentiment, and language in a JSON format, which you can use in your application.

Amazon Comprehend ML capabilities can be used to detect and redact personally identifiable information (PII) in customer emails, support tickets, product reviews, social media, and more. No ML experience is required. For example, you can analyze support tickets and knowledge articles to detect PII entities and redact the text before you index the documents in the search solution. After that, search solutions are free of PII entities in documents. Redacting PII entities helps you protect privacy and comply with local laws and regulations.

How Comprehend works: via - https://aws.amazon.com/comprehend/

106
Q

Amazon Textract

A

Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, layout elements, and data from scanned documents. It goes beyond simple optical character recognition (OCR) to identify, understand, and extract specific data from documents. Text extracted from images can be sent to Amazon Comprehend to recognize PII.

107
Q

Amazon Lex

A

Amazon Lex is a fully managed artificial intelligence (AI) service with advanced natural language models to design, build, test, and deploy conversational interfaces in applications. Amazon Lex leverages the power of Generative AI and Large Language Models (LLMs) to enhance the builder and customer experience.

108
Q

Amazon Kendra

A

Amazon Kendra is a highly accurate and easy-to-use enterprise search service that’s powered by machine learning (ML). It allows developers to add search capabilities to their applications so their end users can discover information stored within the vast amount of content spread across their company. Companies use Amazon Comprehend to filter out PII before pushing the documents/data to Kendra.

109
Q

Amazon Mechanical Turk

A

Amazon Mechanical Turk provides a marketplace for outsourcing various tasks to a distributed workforce

Amazon Mechanical Turk provides a marketplace for outsourcing various tasks to a distributed workforce, while Amazon Ground Truth is specifically designed for creating labeled datasets for machine learning, incorporating both automated and human labeling

Amazon Mechanical Turk provides an on-demand, scalable, human workforce to complete jobs that humans can do better than computers. Amazon Mechanical Turk software formalizes job offers to the thousands of Workers willing to do piecemeal work at their convenience. The software also retrieves work performed and compiles it for you, the Requester, who pays the Workers for satisfactory work (only). Optional qualification tests enable you to select competent Workers.

Amazon Mechanical Turk (MTurk) is a marketplace that allows businesses to outsource tasks to a distributed workforce.

110
Q

Amazon SageMaker Ground Truth

A

Amazon Ground Truth is specifically designed for creating labeled datasets for machine learning, incorporating both automated and human labeling

Amazon Ground Truth helps you build high-quality training datasets for your machine learning models. With Amazon Ground Truth, you can use workers from either Amazon Mechanical Turk, a vendor company that you choose, or an internal, private workforce along with machine learning to enable you to create a labeled dataset. You can use the labeled dataset output from Amazon Ground Truth to train your own models. You can also use the output as a training dataset for an Amazon SageMaker model.

Amazon Ground Truth is designed specifically for creating labeled datasets for machine learning, using both automated and human labeling, often leveraging MTurk for the human labeling component.

111
Q

Amazon SageMaker Ground Truth

A

SageMaker Ground Truth enables the creation of high-quality labeled datasets by incorporating human feedback in the labeling process, which can be used to improve reinforcement learning models

Amazon SageMaker Ground Truth offers the most comprehensive set of human-in-the-loop capabilities for incorporating human feedback across the ML lifecycle to improve model accuracy and relevancy. You can complete various human-in-the-loop tasks, from data generation and annotation to reward model generation, model review, and customization through a self-service or AWS managed offering.

SageMaker Ground Truth helps in creating high-quality labeled datasets by incorporating human feedback, which is crucial for training and refining reinforcement learning models. This human feedback ensures that the data used for training accurately reflects real-world scenarios, enhancing the effectiveness of RLHF.

SageMaker Ground Truth includes a data annotator for RLHF capabilities. You can give direct feedback and guidance on output that a model has generated by ranking, classifying, or doing both for its responses for RL outcomes. The data, referred to as comparison and ranking data, is effectively a reward model or reward function, which is then used to train the model. You can use comparison and ranking data to customize an existing model for your use case or to fine-tune a model that you build from scratch.

112
Q

Bias versus variance trade-off

A

The bias versus variance trade-off refers to the challenge of balancing the error due to the model’s complexity (variance) and the error due to incorrect assumptions in the model (bias), where high bias can cause underfitting and high variance can cause overfitting

The bias versus variance trade-off in machine learning is about finding a balance between bias (error due to overly simplistic assumptions in the model, leading to underfitting) and variance (error due to the model being too sensitive to small fluctuations in the training data, leading to overfitting). The goal is to achieve a model that generalizes well to new data.

113
Q

underfitting

A

high bias

114
Q

overfitting

A

high variance

115
Q

Amazon Elastic Compute Cloud (EC2)

A

Infrastructure as a Service (IaaS)

Cloud Computing can be broadly divided into three types - Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS).

IaaS contains the basic building blocks for cloud IT. It typically provides access to networking features, computers (virtual or on dedicated hardware), and data storage space. IaaS gives the highest level of flexibility and management control over IT resources.

EC2 gives you full control over managing the underlying OS, virtual network configurations, storage, data, and applications. So EC2 is an example of an IaaS service.

Overview of the types of Cloud Computing: via - https://aws.amazon.com/types-of-cloud-computing/

116
Q

Elastic Beanstalk

A

Platform as a Service (PaaS) - PaaS removes the need to manage underlying infrastructure (usually hardware and operating systems), and allows you to focus on the deployment and management of your applications. You don’t need to worry about resource procurement, capacity planning, software maintenance, patching, or any of the other undifferentiated heavy lifting involved in running your application.

Elastic Beanstalk is an example of a PaaS service. You can simply upload your code and Elastic Beanstalk automatically handles the deployment, from capacity provisioning, load balancing, and auto-scaling to application health monitoring.

117
Q

AWS Rekognition

A

Software as a Service (SaaS) - SaaS provides you with a complete product that is run and managed by the service provider. With a SaaS offering, you don’t have to think about how the service is maintained or how the underlying infrastructure is managed. You only need to think about how you will use that particular software. AWS Rekognition is an example of a SaaS service.

118
Q

Model parameters

A

Model parameters are values that define a model and its behavior in interpreting input and generating responses.

Model parameters are values that define a model and its behavior in interpreting input and generating responses. Model parameters are controlled and updated by providers. You can also update model parameters to create a new model through the process of model customization. In other words, Model parameters are the internal variables of the model that are learned and adjusted during the training process. These parameters directly influence the output of the model for a given input. Examples include the weights and biases in a neural network.

via - https://docs.aws.amazon.com/bedrock/latest/userguide/key-definitions.html

119
Q

Hyperparameters

A

Hyperparameters are values that can be adjusted for model customization to control the training process

Hyperparameters are values that can be adjusted for model customization to control the training process and, consequently, the output custom model. In other words, hyperparameters are external configurations set before the training process begins. They control the training process and the structure of the model but are not adjusted by the training algorithm itself. Examples include the learning rate, the number of layers in a neural network, etc.

120
Q

Provisioned Throughput

A

A level of throughput that you purchase for a base or custom model in order to increase the amount and/or rate of tokens processed during model inference. When you purchase Provisioned Throughput for a model, a provisioned model is created that can be used to carry out model inference. For more information, see Increase model invocation capacity with Provisioned Throughput in Amazon Bedrock.

121
Q

Playground

A

A user-friendly graphical interface in the AWS Management Console in which you can experiment with running model inference to familiarize yourself with Amazon Bedrock. Use the playground to test out the effects of different models, configurations, and inference parameters on the responses generated for different prompts that you enter. For more information, see Generate responses in a visual interface using playgrounds.

122
Q

Orchestration

A

The process of coordinating between foundation models and enterprise data and applications in order to carry out a task. For more information, see Automate tasks in your application using conversational agents.

123
Q

Agent

A

An application that carry out orchestrations through cyclically interpreting inputs and producing outputs by using a foundation model. An agent can be used to carry out customer requests. For more information, see Automate tasks in your application using conversational agents.

124
Q

Machine learning

A

Machine learning is a subset of artificial intelligence that involves training algorithms to learn from data

Machine learning models perform more specific data analysis tasks - like classifying transactions as genuine or fraudulent, labeling images, or predicting the maintenance schedule of factory equipment.

125
Q

Artificial intelligence

A

artificial intelligence encompasses a wider range of technologies aimed at simulating human intelligence

Artificial intelligence is an umbrella term for different strategies and techniques used to make machines more human-like. AI includes everything from smart assistants like Alexa, chatbots, and image generators to robotic vacuum cleaners and self-driving cars. In contrast, machine learning is a subset of artificial intelligence, focusing specifically on training algorithms to learn from data and make predictions or decisions.

126
Q

Amazon SageMaker Model Cards

A

Use Amazon SageMaker Model Cards to document critical details about your machine learning (ML) models in a single place for streamlined governance and reporting.

Catalog details such as the intended use and risk rating of a model, training details and metrics, evaluation results and observations, and additional call-outs such as considerations, recommendations, and custom information.

Specifying the intended uses of a model helps ensure that model developers and users have the information they need to train or deploy the model responsibly. The intended uses of a model go beyond technical details and describe how a model should be used in production, the scenarios in which is appropriate to use a model, and additional considerations such as the type of data to use with the model or any assumptions made during development.

https://docs.aws.amazon.com/sagemaker/latest/dg/model-cards.html

127
Q

Amazon SageMaker Clarify

A

SageMaker Clarify helps identify potential bias during data preparation without writing code. You specify input features, such as gender or age, and SageMaker Clarify runs an analysis job to detect potential bias in those features.

128
Q

Amazon SageMaker Canvas

A

SageMaker Canvas offers a no-code interface that can be used to create highly accurate machine learning models —without any machine learning experience or writing a single line of code. SageMaker Canvas provides access to ready-to-use models including foundation models from Amazon Bedrock or Amazon SageMaker JumpStart or you can build your custom ML model using AutoML powered by SageMaker AutoPilot.

129
Q

Amazon SageMaker Model Monitor

A

Amazon SageMaker Model Monitor monitors the quality of Amazon SageMaker machine learning models in production. With Model Monitor, you can set up: Continuous monitoring with a real-time endpoint, Continuous monitoring with a batch transform job that runs regularly, and On-schedule monitoring for asynchronous batch transform jobs.

130
Q

Large Language Models (LLMs)

A

The Large Language Models (LLMs) are non-deterministic

Large Language Models (LLMs) are non-deterministic, which implies that the generated text may be different for every user that uses the same prompt.

You can use the inference parameter Temperature (having a value between 0 and 1), which regulates the creativity of LLMs’ responses. Use a lower temperature if you want more deterministic responses, and use a higher temperature if you want creative or different responses for the same prompt from LLMs on Amazon Bedrock.

Large language models (LLMs) are one class of FMs.

131
Q

Discriminative models

A

Discriminative models are used for classification. They focus on distinguishing between different categories based on the features they observe.

132
Q

Generative models

A

Generative models are used for creation. They learn the patterns and features of the data they have seen and can generate new, similar data. LLMs are generative models.

133
Q

Data residency

A

Data residency is concerned with the physical location of data storage

Data residency refers to the geographical or physical location where data is stored, which is crucial for compliance with regional laws and regulations.

134
Q

data retention

A

data retention defines the policies for how long data should be stored and maintained

Data retention, on the other hand, involves policies and practices related to how long data should be kept, archived, or deleted, ensuring that data is available when needed and disposed of when no longer required.

135
Q

Amazon SageMaker Studio

A

Amazon SageMaker Studio offers a broad set of fully managed integrated development environments (IDEs) for ML development, including JupyterLab, Code Editor based on Code-OSS (Visual Studio Code – Open Source), and RStudio.

136
Q

Inference - Real-time hosting services

A

The real-time inference is ideal for inference workloads where you have real-time, interactive, low latency requirements. You can deploy your model to SageMaker hosting services and get an endpoint that can be used for inference. These endpoints are fully managed and support autoscaling.

137
Q

Inference - Serverless Inference

A

Used for workloads that have idle periods between traffic spikes and can tolerate cold starts.

138
Q

Inference - Asynchronous Inference

A

Used for requests with large payload sizes up to 1GB, long processing times, and near real-time latency requirements.

139
Q

Inference - Batch transform

A

To get predictions for an entire dataset, use SageMaker batch transform.

140
Q

Amazon SageMaker Feature Store

A

Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models. Features are inputs to ML models used during training and inference. For example, in an application that recommends a music playlist, features could include song ratings, listening duration, and listener demographics.

You can ingest data into SageMaker Feature Store from a variety of sources, such as application and service logs, clickstreams, sensors, and tabular data from Amazon Simple Storage Service (Amazon S3), Amazon Redshift, AWS Lake Formation, Snowflake, and Databricks Delta Lake.

How Feature Store works: via - https://aws.amazon.com/sagemaker/feature-store/

141
Q

Amazon SageMaker Clarify

A

SageMaker Clarify helps identify potential bias during data preparation without writing code. You specify input features, such as gender or age, and SageMaker Clarify runs an analysis job to detect potential bias in those features.

142
Q

Amazon SageMaker Data Wrangler

A

Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare tabular and image data for ML from weeks to minutes. With SageMaker Data Wrangler, you can simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow (including data selection, cleansing, exploration, visualization, and processing at scale) from a single visual interface.

143
Q

Amazon SageMaker Ground Truth

A

Amazon SageMaker Ground Truth offers the most comprehensive set of human-in-the-loop capabilities, allowing you to harness the power of human feedback across the ML lifecycle to improve the accuracy and relevancy of models. You can complete a variety of human-in-the-loop tasks with SageMaker Ground Truth, from data generation and annotation to model review, customization, and evaluation, either through a self-service or an AWS-managed offering.

144
Q

Inference parameters - Temperature

A

LLMs on Amazon Bedrock come with several inference parameters that you can set to control the response from the models.

Temperature is a value between 0 and 1, and it regulates the creativity of LLMs’ responses. Use a lower temperature if you want more deterministic responses, and use a higher temperature if you want creative or different responses for the same prompt from LLMs on Amazon Bedrock.

via - https://docs.aws.amazon.com/bedrock/latest/userguide/inference-parameters.html

145
Q

Inference parameters - Top P

A

Top P represents the percentage of most likely candidates that the model considers for the next token. Choose a lower value to decrease the size of the pool and limit the options to more likely outputs. Choose a higher value to increase the size of the pool and allow the model to consider less likely outputs.

146
Q

Inference parameters - Top K

A

Top K represents the number of most likely candidates that the model considers for the next token. Choose a lower value to decrease the size of the pool and limit the options to more likely outputs. Choose a higher value to increase the size of the pool and allow the model to consider less likely outputs.

147
Q

Inference parameters - Stop sequences

A

Stop sequences specify the sequences of characters that stop the model from generating further tokens. If the model generates a stop sequence that you specify, it will stop generating after that sequence.

148
Q

Cloud computing

A

Cloud computing is the on-demand delivery of IT resources over the Internet with pay-as-you-go pricing. Instead of buying, owning, and maintaining physical data centers and servers, you can access technology services, such as computing power, storage, and databases, on an as-needed basis from a cloud provider like Amazon Web Services (AWS).

149
Q

Agility

A

Agility refers to the ability of the cloud to give you easy access to a broad range of technologies so that you can innovate faster and build nearly anything that you can imagine. You can quickly spin up resources as you need them – from infrastructure services, such as compute, storage, and databases, to the Internet of Things, machine learning, data lakes and analytics, and much more.

150
Q

Elasticity

A

With cloud computing elasticity, you don’t have to over-provision resources upfront to handle peak levels of business activity in the future. Instead, you provision the number of resources that you need. You can scale these resources up or down instantly to grow and shrink capacity as your business needs change.

151
Q

Cost savings of Cloud

A

The cloud allows you to trade capital expenses (such as data centers and physical servers) for variable expenses, and only pay for IT as you consume it. Plus, the variable expenses are much lower than what you would pay to do it yourself because of the economies of scale.

152
Q

Ability to deploy globally in minutes

A

With the cloud, you can expand to new geographic regions and deploy globally in minutes. For example, AWS has infrastructure all over the world, so you can deploy your application in multiple physical locations with just a few clicks. Putting applications in closer proximity to end users reduces latency and improves their experience.

153
Q

Generative AI

A

Generative AI is important because it can autonomously create novel and complex data, enhancing creativity and efficiency in various domains

Generative AI is important because it can autonomously create novel and complex data, which significantly enhances creativity and efficiency across various fields, such as content creation, design, and problem-solving.

via - https://aws.amazon.com/what-is/generative-ai/

154
Q

Multi-modal embedding model

A

The company should use a multi-modal embedding model, which is designed to represent and align different types of data (such as text and images) in a shared embedding space, allowing the chatbot to understand and interpret both forms of input simultaneously

A multi-modal embedding model is the most suitable choice for this task because it enables the integration of multiple types of data, such as text and images, into a unified representation. This allows the chatbot to effectively process and understand queries containing both text and visual content by aligning them in a shared embedding space, facilitating more accurate and context-aware responses.

You can generate embeddings for your content and store them in a vector database. When an end user submits any combination of text and image as a search query, the model generates embeddings for the search query and matches them to the stored embeddings to provide relevant search and recommendations results to end users. For example, a stock photography company with hundreds of millions of images can use the model to power its search functionality, so users can search for images using a phrase, image, or a combination of image and text. You can further customize the model to enhance its understanding of your unique content and provide more meaningful results using image-text pairs for fine-tuning.

155
Q

multi-modal generative model

A

While a multi-modal generative model can generate outputs based on multiple types of input data, it is more complex and typically used for generating new content rather than interpreting and responding to queries. In addition, it is costlier to build and maintain a multi-modal generative model compared to a multi-modal embedding model. A multi-modal embedding model is more efficient for understanding and processing combined text and image inputs, whereas a generative model may be excessive if the primary goal is to process and respond to existing multi-modal queries.

156
Q

text-only language model

A

A text-only language model cannot handle image data because it is designed to process and generate text exclusively. This model lacks the capability to understand or incorporate visual information, making it unsuitable for a chatbot that needs to interpret both text and images in user queries.

157
Q

convolutional neural network (CNN)

A

A convolutional neural network (CNN) is designed specifically for image recognition and processing tasks and is highly effective for analyzing visual data. However, it cannot process text-based inputs and therefore cannot fulfill the requirement of handling multi-modal queries that include both text and images. A CNN would need to be combined with other models to process text, which adds complexity without directly addressing the multi-modal nature of the queries.

158
Q

VPC endpoint

A

The company should use a VPC endpoint for Amazon S3 that allows secure, private connectivity between the VPC and Amazon S3, without the need for an internet connection, ensuring data is transferred securely within the AWS network

A VPC endpoint for Amazon S3 is the most appropriate choice because it creates a private connection between the VPC and Amazon S3 over the AWS network, without requiring internet access. This VPC endpoint allows the SageMaker model deployed within the VPC to securely access data from Amazon S3 directly, using the internal AWS network paths. It provides enhanced security by keeping data traffic within the AWS infrastructure and not exposing it to the public internet.

You can use two types of VPC endpoints to access Amazon S3: gateway endpoints and interface endpoints (by using AWS PrivateLink). A gateway endpoint is a gateway that you specify in your route table to access Amazon S3 from your VPC over the AWS network. Interface endpoints extend the functionality of gateway endpoints by using private IP addresses to route requests to Amazon S3 from within your VPC, on premises, or from a VPC in another AWS Region by using VPC peering or AWS Transit Gateway.

159
Q

Internet Gateway

A

An Internet Gateway enables VPC resources to communicate with the internet, but it is not suitable in this scenario because the VPC does not have internet access, and the objective is to securely access S3 without exposing data traffic to the public internet. Using an Internet Gateway would not only require additional security configurations but also go against the company’s requirement to avoid internet access.

160
Q

Amazon SageMaker Inference endpoint

A

A SageMaker Inference endpoint allows clients to invoke deployed models and receive predictions but does not serve the purpose of connecting a SageMaker model to Amazon S3. It is not designed to handle data access between the SageMaker model and S3, so it would not meet the company’s requirement for secure data retrieval from S3 within a VPC.

161
Q

NAT Gateway

A

A NAT Gateway allows instances in a private subnet to access the internet, but it still routes traffic through the public internet, which may not align with the company’s need for secure and private data transfer. Since the goal is to avoid internet access and maintain secure connectivity between the VPC and Amazon S3, a NAT Gateway is not the appropriate solution.

162
Q

Amazon Rekognition

A

Amazon Rekognition is a cloud-based image and video analysis service that makes it easy to add advanced computer vision capabilities to your applications. The service is powered by proven deep learning technology and it requires no machine learning expertise to use. Amazon Rekognition includes a simple, easy-to-use API that can quickly analyze any image or video file that’s stored in Amazon S3.

You can add features that detect objects, text, and unsafe content, analyze images/videos, and compare faces to your application using Rekognition’s APIs. With Amazon Rekognition’s face recognition APIs, you can detect, analyze, and compare faces for a wide variety of use cases, including user verification, cataloging, people counting, and public safety.

via - https://docs.aws.amazon.com/rekognition/latest/dg/text-detection.html

163
Q

Amazon Textract

A

Amazon Textract is a document analysis service that detects and extracts printed text, handwriting, structured data (such as fields of interest and their values), and tables from images and scans of documents. Amazon Textract’s machine learning models have been trained on millions of documents so that virtually any document type you upload is automatically recognized and processed for text extraction.

While Amazon Textract can detect text from images and documents from a wide range of file formats, Recognition is trained on locating and identifying even small text from moving videos and images at various angles. Hence, Recognition is optimal here.

164
Q

Amazon SageMaker image classification algorithm

A

The Amazon SageMaker image classification algorithm is a supervised learning algorithm that supports multi-label classification. It takes an image as input and outputs one or more labels assigned to that image. It uses a convolutional neural network that can be trained from scratch or trained using transfer learning when a large number of training images are not available. SageMaker image classification algorithms need certain ML experience to train and tune the model whereas Rekognition is already trained to identify labels.

165
Q

Amazon SageMaker JumpStart

A

Amazon SageMaker JumpStart is a machine learning (ML) hub that can help you accelerate your ML journey. With SageMaker JumpStart, you can evaluate, compare, and select Foundation Models (FMs) quickly based on pre-defined quality and responsibility metrics to perform tasks like article summarization and image generation. Pretrained models are fully customizable for your use case with your data, and you can easily deploy them into production with the user interface or SDK.

166
Q

object detection

A

The company should use object detection, which involves identifying and locating specific objects within an image

Object detection is the correct concept for this use case. It is a computer vision technique that identifies instances of objects within images and videos, such as detecting and classifying different types of animals. Object detection models, such as those based on Convolutional Neural Networks (CNNs) like YOLO (You Only Look Once) or Faster R-CNN, can analyze visual data to accurately locate and identify various animal species within an image, making it the most appropriate choice for this task.

167
Q

Named Entity Recognition

A

Named Entity Recognition (NER) is a text-based natural language processing technique, not a computer vision method. It identifies and classifies named entities in text, such as people, organizations, or locations. Since NER does not handle visual data or the identification of objects in images, it is not suitable for recognizing different types of animals in images.

168
Q

Face recognition

A

Face recognition is a computer vision technique that focuses exclusively on detecting and verifying human faces in images or video streams. It uses algorithms to match facial features against a database of known faces but is not designed to recognize or classify non-human objects, such as different animal species. As a result, face recognition cannot be applied effectively for identifying various types of animals in images.