AI Practice Test #2 Flashcards
Neural network
Neural networks consist of layers of nodes (neurons) that process input data, adjusting the weights of connections between nodes through training to recognize patterns and make predictions
Neural networks are composed of multiple layers of interconnected nodes (neurons). These nodes process input data and adjust the weights of the connections between them during the training phase. This process allows the network to learn to recognize patterns and make predictions based on the data.
via - https://aws.amazon.com/what-is/neural-network/
Cloud computing
Cloud computing refers to the on-demand delivery of IT resources and applications via the internet with pay-as-you-go pricing.
Cloud computing, as defined by AWS, is the on-demand delivery of IT resources and applications over the internet with pay-as-you-go pricing. This allows businesses to access computing power, storage, and applications as needed without investing in physical infrastructure.
https://aws.amazon.com/what-is-cloud-computing/
Reinforcement learning
Reinforcement learning focuses on an agent learning optimal actions through interactions with the environment and feedback, while supervised learning involves training models on labeled data to make predictions
Reinforcement learning is characterized by an agent that learns to make optimal decisions through interactions with the environment, receiving feedback in the form of rewards or penalties. This feedback helps the agent learn a policy to maximize cumulative rewards. In contrast, supervised learning involves training models using labeled datasets to make predictions or classifications based on the input data.
via - https://aws.amazon.com/what-is/reinforcement-learning/
Feature engineering
Feature engineering for structured data often involves tasks such as normalization and handling missing values, while for unstructured data, it involves tasks such as tokenization and vectorization
Feature engineering for structured data typically includes tasks like normalization, handling missing values, and encoding categorical variables. For unstructured data, such as text or images, feature engineering involves different tasks like tokenization (breaking down text into tokens), vectorization (converting text or images into numerical vectors), and extracting features that can represent the content meaningfully.
Structured data
Structured data can include numerical and categorical data
structured data may require less preprocessing
Unstructured data
Unstructured data includes text, images, audio, et cetera
Unstructured data typically requires more extensive preprocessing.
Self-supervised learning
It works when models are provided vast amounts of raw, almost entirely, or completely unlabeled data and then generate the labels themselves.
Foundation models use self-supervised learning to create labels from input data. In self-supervised learning, models are provided vast amounts of raw completely unlabeled data and then the models generate the labels themselves. This means no one has instructed or trained the model with labeled training data sets.
Reinforcement learning
Reinforcement learning is a method with reward values attached to the different steps that the algorithm must go through. So the model’s goal is to accumulate as many reward points as possible and eventually reach an end goal.
Supervised learning
In supervised learning, models are supplied with labeled and defined training data to assess for correlations. The sample data specifies both the input and the output for the model. For example, images of handwritten figures are annotated to indicate which number they correspond to. A supervised learning system could recognize the clusters of pixels and shapes associated with each number, given sufficient examples.
Data labeling is the process of categorizing input data with its corresponding defined output values. Labeled training data is required for supervised learning. For example, millions of apple and banana images would need to be tagged with the words “apple” or “banana.” Then machine learning applications could use this training data to guess the name of the fruit when given a fruit image.
Data labeling
Data labeling is the process of categorizing input data with its corresponding defined output values. Labeled training data is required for supervised learning. For example, millions of apple and banana images would need to be tagged with the words “apple” or “banana.” Then machine learning applications could use this training data to guess the name of the fruit when given a fruit image.
Unsupervised learning
Unsupervised learning algorithms train on unlabeled data. They scan through new data, trying to establish meaningful connections between the inputs and predetermined outputs. They can spot patterns and categorize data. For example, unsupervised algorithms could group news articles from different news sites into common categories like sports, crime, etc. They can use natural language processing to comprehend meaning and emotion in the article.
Amazon Q Business
Amazon Q Business is a fully managed, generative-AI powered assistant that you can configure to answer questions, provide summaries, generate content, and complete tasks based on your enterprise data. It allows end users to receive immediate, permissions-aware responses from enterprise data sources with citations, for use cases such as IT, HR, and benefits help desks.
Amazon Q Business also helps streamline tasks and accelerate problem-solving. You can use Amazon Q Business to create and share task automation applications or perform routine actions like submitting time-off requests and sending meeting invites.
Amazon Q Developer
Amazon Q Developer assists developers and IT professionals with all their tasks—from coding, testing, and upgrading applications, to diagnosing errors, performing security scanning and fixes, and optimizing AWS resources.
Amazon Q in QuickSight
With Amazon Q in QuickSight, customers get a generative BI assistant that allows business analysts to use natural language to build BI dashboards in minutes and easily create visualizations and complex calculations.
Amazon Q in Connect
Amazon Connect is the contact center service from AWS. Amazon Q helps customer service agents provide better customer service. Amazon Q in Connect uses real-time conversation with the customer along with relevant company content to automatically recommend what to say or what actions an agent should take to better assist customers.
SageMaker model cards
SageMaker model cards include information about the model such as intended use and risk rating of a model, training details and metrics, evaluation results, and observations. AI service cards provide transparency about AWS AI services’ intended use, limitations, and potential impacts
You can use Amazon SageMaker Model Cards to document critical details about your machine learning (ML) models in a single place for streamlined governance and reporting. You can catalog details such as the intended use and risk rating of a model, training details and metrics, evaluation results and observations, and additional call-outs such as considerations, recommendations, and custom information.
AI Service Cards are a form of responsible AI documentation that provides customers with a single place to find information on the intended use cases and limitations, responsible AI design choices, and deployment and performance optimization best practices for AI services from AWS.
Token
A token is a sequence of characters that a model can interpret or predict as a single unit of meaning
A sequence of characters that a model can interpret or predict as a single unit of meaning. For example, with text models, a token could correspond not just to a word, but also to a part of a word with grammatical meaning (such as “-ed”), a punctuation mark (such as “?”), or a common phrase (such as “a lot”).
Embedding
Embedding is a vector of numerical values that represents condensed information obtained by transforming input into that vector
The process of condensing information by transforming input into a vector of numerical values, known as the embeddings, in order to compare the similarity between different objects by using a shared numerical representation. For example, sentences can be compared to determine the similarity in meaning, images can be compared to determine visual similarity, or text and image can be compared to see if they’re relevant to each other.
Knowledge Bases for Amazon Bedrock
Use Knowledge Bases for Amazon Bedrock to supplement contextual information from the company’s private data to the FM using Retrieval Augmented Generation (RAG)
With the comprehensive capabilities of Amazon Bedrock, you can experiment with a variety of top FMs, customize them privately with your data using techniques such as fine-tuning and retrieval-augmented generation (RAG), and create managed agents that execute complex business tasks—from booking travel and processing insurance claims to creating ad campaigns and managing inventory—all without writing any code.
Using Knowledge Bases for Amazon Bedrock, you can provide foundation models with contextual information from your company’s private data for Retrieval Augmented Generation (RAG), enhancing response relevance and accuracy. This fully managed feature handles the entire RAG workflow, eliminating the need for custom data integrations and management.
via - https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html
Retrieval Augmented Generation (RAG)
Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources before generating a response. Large Language Models (LLMs) are trained on vast volumes of data and use billions of parameters to generate original output for tasks like answering questions, translating languages, and completing sentences. RAG extends the already powerful capabilities of LLMs to specific domains or an organization’s internal knowledge base, all without the need to retrain the model. It is a cost-effective approach to improving LLM output so it remains relevant, accurate, and useful in various contexts.
Reinforcement learning from human feedback (RLHF)
Reinforcement learning from human feedback (RLHF) is a machine learning (ML) technique that uses human feedback to optimize ML models to self-learn more efficiently. Reinforcement learning (RL) techniques train software to make decisions that maximize rewards, making their outcomes more accurate. RLHF incorporates human feedback in the rewards function, so the ML model can perform tasks more aligned with human goals, wants, and needs. RLHF is used throughout generative artificial intelligence (generative AI) applications, including in large language models (LLM).
Small language model (SLM)
A small language model (SLM) is an AI model designed to process and generate human language, with a compact architecture, fewer parameters, and lower computational requirements compared to large language models (LLMs).
A small language model (SLM) optimized for deployment on edge devices is specifically designed to be lightweight, efficient, and capable of running on devices with limited computational resources. Deploying the model directly on the edge device eliminates the need for network communication with a central server, thereby achieving the required low-latency inference needed for real-time IoT applications.
https://aws.amazon.com/about-aws/whats-new/2024/05/amazon-bedrock-mistral-small-foundation-model/
Edge device
In computer networking, an edge device is a device that provides an entry point into enterprise or service provider core networks.[1] Examples include routers,[2] routing switches, integrated access devices (IADs), multiplexers, and a variety of metropolitan area network (MAN) and wide area network (WAN) access devices. Edge devices also provide connections into carrier and service provider networks. An edge device that connects a local area network to a high speed switch or backbone (such as an ATM switch) may be called an edge concentrator.[3]
Central API
Central API and asynchronous inference endpoint introduces network latency.
using a central API with asynchronous inference endpoints still involves network communication that can result in latency
AWS Audit Manager
AWS Audit Manager helps automate the collection of evidence to continuously audit your AWS usage. It simplifies the process of assessing risk and compliance with regulations and industry standards, making it an essential tool for governance in AI systems.
AWS Artifact
AWS Artifact provides on-demand access to AWS’ compliance reports and online agreements. It is useful for obtaining compliance documentation but does not provide continuous auditing or automated evidence collection.
AWS Trusted Advisor
AWS Trusted Advisor offers guidance to help optimize your AWS environment for cost savings, performance, security, and fault tolerance. While it provides recommendations for best practices, it does not focus on auditing or evidence collection for compliance.
AWS CloudTrail
AWS CloudTrail records AWS API calls for auditing purposes and delivers log files for compliance and operational troubleshooting. It is crucial for tracking user activity but does not automate compliance assessments or evidence collection.
Foundation Model: Amazon Titan
Amazon Titan foundation models, developed by Amazon Web Services (AWS), are pre-trained on extensive datasets, making them robust and versatile models suitable for a wide range of applications. Amazon Titan foundation models (FMs) provide customers with a breadth of high-performing image, multimodal, and text model choices, via a fully managed API. Amazon Titan models are created by AWS and pretrained on large datasets, making them powerful, general-purpose models built to support a variety of use cases, while also supporting the responsible use of AI.
Foundation Model: Llama
Llama is a series of large language models trained on publicly available data. They are built on the transformer architecture, enabling them to handle input sequences of any length and produce output sequences of varying lengths. A notable feature of Llama models is their capacity to generate coherent and contextually appropriate text.
Foundation Model: Jurassic
Jurassic family of models from AI21 Labs supported use cases such as question answering, summarization, draft generation, advanced information extraction, and ideation for tasks requiring intricate reasoning and logic.
Foundation Model: Claude
Claude is Anthropic’s frontier, state-of-the-art large language model that offers important features for enterprises like advanced reasoning, vision analysis, code generation, and multilingual processing.
Amazon SageMaker Model Dashboard
Amazon SageMaker Model Dashboard is a centralized repository of all models created in your account. The models are generally the outputs of SageMaker training jobs, but you can also import models trained elsewhere and host them on SageMaker. Model Dashboard provides a single interface for IT administrators, model risk managers, and business leaders to track all deployed models and aggregate data from multiple AWS services to provide indicators about how your models are performing. You can view details about model endpoints, batch transform jobs, and monitoring jobs for additional insights into model performance.
The dashboard’s visual display helps you quickly identify which models have missing or inactive monitors, so you can ensure all models are periodically checked for data drift, model drift, bias drift, and feature attribution drift. Lastly, the dashboard’s ready access to model details helps you dive deep, so you can access logs, infrastructure-related information, and resources to help you debug monitoring failures.
Amazon SageMaker Model Monitor
Amazon SageMaker Model Monitor monitors the quality of Amazon SageMaker machine learning models in production. With Model Monitor, you can set up: Continuous monitoring with a real-time endpoint, Continuous monitoring with a batch transform job that runs regularly, and On-schedule monitoring for asynchronous batch transform jobs.
Amazon SageMaker Ground Truth
Amazon SageMaker Ground Truth offers the most comprehensive set of human-in-the-loop capabilities, allowing you to harness the power of human feedback across the ML lifecycle to improve the accuracy and relevancy of models. You can complete a variety of human-in-the-loop tasks with SageMaker Ground Truth, from data generation and annotation to model review, customization, and evaluation, either through a self-service or an AWS-managed offering.
Amazon SageMaker Clarify
SageMaker Clarify helps identify potential bias during data preparation without writing code. You specify input features, such as gender or age, and SageMaker Clarify runs an analysis job to detect potential bias in those features.
Multimodal model
A multimodal model can accept a mix of input types such as audio/text and create a mix of output types such as video/image
A multimodal model is an artificial intelligence system designed to process and understand multiple types of data, such as text, images, audio, and video. Unlike unimodal models, which handle a single type of data, multimodal models can integrate and make sense of information from various sources, allowing them to perform more complex and versatile tasks.
Multimodal models represent a significant advancement in AI, enabling the integration and understanding of multiple types of data. By combining different modalities, these models can perform a wide range of complex tasks, making them highly versatile and powerful tools in various fields.
Amazon Textract
Automatically extract printed text, handwriting, layout elements, and data from any document. Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, layout elements, and data from scanned documents. It goes beyond simple optical character recognition (OCR) to identify, understand, and extract specific data from documents.
Amazon Forecast
Forecast business outcomes easily and accurately using machine learning. Amazon Forecast uses machine learning (ML) to generate more accurate demand forecasts with just a few clicks, without requiring any prior ML experience. Amazon Forecast uses ML to learn not only the best algorithm for each item but the best ensemble of algorithms for each item, automatically creating the best model for your data.
Amazon Kendra
Easy-to-use enterprise search service that’s powered by machine learning. Amazon Kendra is a highly accurate and easy-to-use enterprise search service that’s powered by machine learning (ML). It allows developers to add search capabilities to their applications so their end users can discover information stored within the vast amount of content spread across their company.
Fine-tuning
Fine-tuning changes the weights of the FM.
Fine-tuning a pre-trained foundation model is an affordable way to take advantage of their broad capabilities while customizing a model on your own small, corpus. Fine-tuning is a customization method that involved further training and does change the weights of your model.
Retrieval-augmented generation (RAG)
Retrieval-augmented generation (RAG) does not change the weights of the FM.
Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources before generating a response. Large Language Models (LLMs) are trained on vast volumes of data and use billions of parameters to generate original output for tasks like answering questions, translating languages, and completing sentences. RAG extends the already powerful capabilities of LLMs to specific domains or an organization’s internal knowledge base, all without the need to retrain the model.
Retrieval Augmented Generation (RAG) allows you to customize a model’s responses when you want the model to consider new knowledge or up-to-date information. When your data changes frequently, like inventory or pricing, it’s not practical to fine-tune and update the model while it’s serving user queries.
Prompt engineering
Prompt engineering does NOT change the weights of the FM.
Another recommended way to first customize a foundation model to a specific use case is through prompt engineering. Providing your foundation model with well-engineered, context-rich prompts can help achieve desired results without any fine-tuning or changing of model weights.
Amazon SageMaker Model Cards
Describes how a model should be used in a production environment
Use Amazon SageMaker Model Cards to document critical details about your machine learning (ML) models in a single place for streamlined governance and reporting.
Catalog details such as the intended use and risk rating of a model, training details and metrics, evaluation results and observations, and additional call-outs such as considerations, recommendations, and custom information.
Model cards provide prescriptive guidance on what information to document and include fields for custom information. Specifying the intended uses of a model helps ensure that model developers and users have the information they need to train or deploy the model responsibly.
The intended uses of a model go beyond technical details and describe how a model should be used in production, the scenarios in which is appropriate to use a model, and additional considerations such as the type of data to use with the model or any assumptions made during development.
Amazon Textract
Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, layout elements, and data from scanned documents. It goes beyond simple optical character recognition (OCR) to identify, understand, and extract specific data from documents. Today, many companies manually extract data from scanned documents such as PDFs, images, tables, and forms, or through simple OCR software that requires manual configuration (which often must be updated when the form changes). To overcome these manual and expensive processes, Textract uses ML to read and process any type of document, accurately extracting text, handwriting, tables, and other data with no manual effort.
You can use one of AWS’s pre-trained or custom features to quickly automate document processing, whether you’re automating loan processing or extracting information from invoices and receipts. Textract provides you the ability to customize the pre-trained features to meet the document processing needs specific to your business. Textract can extract the data in minutes instead of hours or days.
Textract use cases: via - https://docs.aws.amazon.com/textract/latest/dg/what-is.html
Amazon Transcribe
Amazon Transcribe is an automatic speech recognition service that uses machine learning models to convert audio to text. You can use Amazon Transcribe as a standalone transcription service or add speech-to-text capabilities to any application.
Amazon Comprehend
Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find meaning and insights in text. Natural Language Processing (NLP) is a way for computers to analyze, understand, and derive meaning from textual information in a smart and useful way. By utilizing NLP, you can extract important phrases, sentiments, syntax, key entities such as brand, date, location, person, etc., and the language of the text.
Amazon Rekognition
Amazon Rekognition is a cloud-based image and video analysis service that makes it easy to add advanced computer vision capabilities to your applications. The service is powered by proven deep learning technology and it requires no machine learning expertise to use. Amazon Rekognition includes a simple, easy-to-use API that can quickly analyze any image or video file that’s stored in Amazon S3. While Rekognition can be used to extract text from images, Rekognition specializes in identifying text located spatially within an image, for instance, words displayed on street signs, t-shirts, or license plates. It’s not the ideal choice for images containing more than 100 words, as this exceeds its limitation.
Temperature
Temperature is a value between 0 and 1, and it regulates the creativity of the model’s responses. Use a lower temperature if you want more deterministic responses. Use a higher temperature if you want creative or different responses for the same prompt on Amazon Bedrock and this is how you might see hallucination responses.
A lower value of temperature results in deterministic responses, so there are fewer chances of hallucinations.
A higher temperature results in a higher likelihood of hallucinations.
via - https://docs.aws.amazon.com/bedrock/latest/userguide/inference-parameters.html
Hierarchical relationship between Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL), and Generative AI (GenAI)?
Artificial Intelligence > Machine Learning > Deep Learning > Generative AI
The correct hierarchy is as follows:
Artificial Intelligence (AI): The broadest field encompassing all aspects of creating machines that can perform tasks that typically require human intelligence.
Machine Learning (ML): A subset of AI focused on algorithms and statistical models that enable machines to improve their performance on tasks through experience.
Deep Learning (DL): A subset of ML that uses neural networks with many layers to learn from large amounts of data, allowing for more complex and abstract representations.
Generative AI (GenAI): A subset of Deep Learning focused on models that can generate new content, such as text, images, or music, by learning from existing data.
via - https://docs.aws.amazon.com/whitepapers/latest/aws-caf-for-ai/aws-caf-for-ai.html
Amazon SageMaker Data Wrangler
Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare tabular and image data for ML from weeks to minutes. With SageMaker Data Wrangler, you can simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow (including data selection, cleansing, exploration, visualization, and processing at scale) from a single visual interface. You can use SQL to select the data that you want from various data sources and import it quickly. Next, you can use the data quality and insights report to automatically verify data quality and detect anomalies, such as duplicate rows and target leakage. SageMaker Data Wrangler contains over 300 built-in data transformations, so you can quickly transform data without writing code.
With the SageMaker Data Wrangler data selection tool, you can quickly access and select your tabular and image data from various popular sources - such as Amazon Simple Storage Service (Amazon S3), Amazon Athena, Amazon Redshift, AWS Lake Formation, Snowflake, and Databricks - and over 50 other third-party sources - such as Salesforce, SAP, Facebook Ads, and Google Analytics. You can also write queries for data sources using SQL and import data directly into SageMaker from various file formats, such as CSV, Parquet, JSON, and database tables.
How Data Wrangler works: via - https://aws.amazon.com/sagemaker/data-wrangler/
Amazon SageMaker Model Dashboard
Amazon SageMaker Model Dashboard is a centralized portal, accessible from the SageMaker console, where you can view, search, and explore all of the models in your account. You can track which models are deployed for inference and if they are used in batch transform jobs or hosted on endpoints.
Amazon SageMaker Clarify
SageMaker Clarify helps identify potential bias during data preparation without writing code. You specify input features, such as gender or age, and SageMaker Clarify runs an analysis job to detect potential bias in those features.
Amazon SageMaker Feature Store
Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models. Features are inputs to ML models used during training and inference.
AWS Trainium
Leverage AWS Trainium for high-performance, cost-effective Deep Learning training.
AWS Trainium is the machine learning (ML) chip that AWS purpose-built for deep learning (DL) training of 100B+ parameter models. Each Amazon Elastic Compute Cloud (Amazon EC2) Trn1 instance deploys up to 16 Trainium accelerators to deliver a high-performance, low-cost solution for DL training in the cloud.
https://aws.amazon.com/machine-learning/trainium/
AWS Inferentia
Leverage AWS Inferentia for the deep learning (DL) and generative AI inference applications
AWS Inferentia accelerators are designed by AWS to deliver high performance at the lowest cost in Amazon EC2 for your deep learning (DL) and generative AI inference applications. The first-generation AWS Inferentia accelerator powers Amazon Elastic Compute Cloud (Amazon EC2) Inf1 instances, which deliver up to 2.3x higher throughput and up to 70% lower cost per inference than comparable Amazon EC2 instances.
https://aws.amazon.com/machine-learning/inferentia/
Image processing
Image processing focuses on enhancing and manipulating images for visual quality
Image processing is primarily concerned with the techniques used to enhance and manipulate images, such as filtering, noise reduction, and image transformation.
Computer vision
Computer vision involves interpreting and understanding the content of images to make decisions
Computer vision, on the other hand, focuses on interpreting and understanding the content of images to make decisions, such as object detection, facial recognition, and scene understanding. Computer vision often uses machine learning algorithms to achieve these tasks.
https://aws.amazon.com/what-is/computer-vision/
Inference parameter: Response length
Response length
Response length represents the minimum or maximum number of tokens to return in the generated response.
via - https://docs.aws.amazon.com/bedrock/latest/userguide/inference-parameters.html
Inference parameter: Stop sequence
Stop sequences specify the sequences of characters that stop the model from generating further tokens. If the model generates a stop sequence that you specify, it will stop generating after that sequence.
Inference parameter: Top P
Top P represents the percentage of most likely candidates that the model considers for the next token.
Inference parameter: Top K
Top K represents the number of most likely candidates that the model considers for the next token.
Amazon Q Developer
(1) Understand and manage your cloud infrastructure on AWS
Amazon Q Developer helps you understand and manage your cloud infrastructure on AWS. With this capability, you can list and describe your AWS resources using natural language prompts, minimizing friction in navigating the AWS Management Console and compiling all information from documentation pages.
For example, you can ask Amazon Q Developer, “List all of my Lambda functions”. Then, Amazon Q Developer returns the response with a set of my AWS Lambda functions as requested, as well as deep links so you can navigate to each resource easily.
(2) Get answers to your AWS account-specific cost-related questions using natural language
Amazon Q Developer can get answers to AWS cost-related questions using natural language. This capability works by retrieving and analyzing cost data from AWS Cost Explorer.
via - https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/what-is.html
via - https://aws.amazon.com/blogs/aws/amazon-q-developer-now-generally-available-includes-new-capabilities-to-reimagine-developer-experience/
Rule-Based Application
A rule-based application is the most suitable choice for this scenario. Probability questions, like calculating the chance of drawing a spade from a deck of cards, are based on well-defined mathematical rules and formulas. A rule-based system can be programmed with these rules to provide precise answers to such questions, making it an efficient and straightforward solution. This approach ensures accuracy, is easy to implement, and requires no training data, making it ideal for helping students understand fundamental mathematical concepts.
Reinforcement Learning (RL)
Reinforcement Learning (RL) is a machine learning technique used for decision-making tasks where an agent learns by interacting with an environment and receiving feedback in the form of rewards or penalties. RL is better suited for dynamic and complex environments, such as games or robotic control, where exploration and adaptation are necessary. It is not appropriate for solving straightforward mathematical problems with well-defined answers, as it does not leverage existing mathematical rules and requires significant computational resources for training.
Supervised learning
Supervised learning involves training a model on a labeled dataset to predict outcomes based on input features. While effective for tasks like image recognition or language translation, it is not suitable for answering mathematical questions that have precise, rule-based answers. Building a dataset of probability questions and answers would be inefficient and unnecessary, as the app can directly use mathematical formulas to provide correct responses without requiring model training.
Unsupervised learning
Unsupervised learning is designed to identify patterns and structures in data without any predefined labels, making it useful for tasks such as clustering or dimensionality reduction. However, it is not applicable for answering specific mathematical questions like those involving probability, which require exact calculations based on established mathematical principles. Therefore, unsupervised learning does not provide a direct or efficient means to achieve the app’s objective.
Amazon Inspector
Amazon Inspector is an automated security assessment service that helps improve the security and compliance of applications deployed on AWS. It automatically assesses applications for exposure, vulnerabilities, and deviations from best practices, making it an essential tool for ensuring the security of AI systems.
AWS Config
AWS Config is a service that enables you to assess, audit, and evaluate the configurations of your AWS resources. While it is important for governance and compliance monitoring, it does not perform automated security assessments of applications.
AWS Audit Manager
AWS Audit Manager helps you continuously audit your AWS usage to simplify how you assess risk and compliance with regulations and industry standards. It focuses on audit and compliance reporting rather than automated security assessments.
AWS Artifact
AWS Artifact provides on-demand access to AWS’ compliance reports and online agreements. It helps with compliance reporting but does not offer automated security assessments of applications.
Data access control
Data access control involves authentication and authorization of users
Data access control is about managing who can access data and what actions they can perform, typically through mechanisms like authentication and authorization.
Data Integrity
Data integrity ensures the data is accurate, consistent, and unaltered
Data integrity, on the other hand, focuses on maintaining the accuracy, consistency, and trustworthiness of data throughout its lifecycle, ensuring that data remains unaltered and accurate during storage, processing, and transmission.
Model Evaluation
Model evaluation on Amazon Bedrock involves a comprehensive process of preparing data, training models, selecting appropriate metrics, testing and analyzing results, ensuring fairness and bias detection, tuning performance, and continuous monitoring. Model Evaluation on Amazon Bedrock helps you to incorporate Generative AI into your application by giving you the power to select the foundation model that gives you the best results for your particular use case.
Amazon Bedrock Guardrails
Guardrails for Amazon Bedrock enables you to implement safeguards for your generative AI applications based on your use cases and responsible AI policies. You can create multiple guardrails tailored to different use cases and apply them across multiple foundation models (FM), providing a consistent user experience and standardizing safety and privacy controls across generative AI applications. You can use guardrails with text-based user inputs and model responses.
Amazon SageMaker Model Monitor
This tool is used for monitoring machine learning models in production to detect data and prediction quality issues. While it helps maintain model performance, it does not assist in model selection or content moderation.
Amazon Comprehend
Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. It is not specifically designed for selecting models or moderating content generated by LLMs.
Amazon SageMaker Clarify
Amazon SageMaker Clarify is used to detect bias in machine learning models and data. While it is crucial for ensuring fairness and transparency, it does not help with model selection or content moderation for generative AI applications.
Toxicity
Toxicity refers to AI model-generated content that can be deemed as offensive, disturbing, or inappropriate.
an example of toxicity, where the AI model generates harmful or offensive content about a specific group.
Hallucination
Hallucination refers to AI model-generated assertions or claims that sound true but are incorrect
an example of hallucination, where the AI model generates an irrelevant or incorrect response
Foundation models
Foundation models can perform a wide range of tasks across different domains by leveraging their extensive pre-training on large datasets
Foundation models are a form of generative artificial intelligence (generative AI). They generate output from one or more inputs (prompts) in the form of human language instructions.
In general, an FM uses learned patterns and relationships to predict the next item in a sequence. For example, with image generation, the model analyzes the image and creates a sharper, more clearly defined version of the image. Similarly, with text, the model predicts the next word in a string of text based on the previous words and their context. It then selects the next word using probability distribution techniques.
Foundation models use self-supervised learning to create labels from input data. This means no one has instructed or trained the model with labeled training data sets. This feature separates LLMs from previous ML architectures, which use supervised or unsupervised learning.
Foundation models, even though are pre-trained, can continue to learn from data inputs or prompts during inference. This means that you can develop comprehensive outputs through carefully curated prompts. Tasks that FMs can perform include language processing, visual comprehension, code generation, and human-centered engagement.
via - https://aws.amazon.com/what-is/foundation-models/
Feature Engineering
Feature Engineering involves selecting, modifying, or creating features from raw data to improve the performance of machine learning models, and it is important because it can significantly enhance model accuracy and efficiency
Feature Engineering is the process of selecting, modifying, or creating new features from raw data to enhance the performance of machine learning models. It is crucial because it can lead to significant improvements in model accuracy and efficiency by providing the model with better representations of the data.
via - https://docs.aws.amazon.com/wellarchitected/latest/machine-learning-lens/feature-engineering.html
ChatGPT
ChatGPT
ChatGPT or Chat Generative Pretrained Transformer is an example of a Transformer model. Transformer-based models use a self-attention mechanism. They weigh the importance of different parts of an input sequence when processing each element in the sequence.
To understand how transformer-based models work, imagine a sentence as a sequence of words. Self-attention helps the model focus on the relevant words as it processes each word. To capture different types of relationships between words, the transformer-based generative model employs multiple encoder layers called attention heads. Each head learns to attend to different parts of the input sequence. This allows the model to simultaneously consider various aspects of the data.
Diffusion model
Diffusion models work by first corrupting data with noise through a forward diffusion process and then learning to reverse this process to denoise the data. They use neural networks to predict and remove the noise step by step, ultimately generating new, structured data from random noise.
Amazon Comprehend
Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find meaning and insights in text. Natural Language Processing (NLP) is a way for computers to analyze, understand, and derive meaning from textual information in a smart and useful way. By utilizing NLP, you can extract important phrases, sentiments, syntax, key entities such as brand, date, location, person, etc., and the language of the text.
You can use Amazon Comprehend to identify the language of the text, extract key phrases, places, people, brands, or events, understand sentiment about products or services, and identify the main topics from a library of documents. The source of this text could be web pages, social media feeds, emails, or articles. You can also feed Amazon Comprehend a set of text documents, and it will identify topics (or groups of words) that best represent the information in the collection. The output from Amazon Comprehend can be used to understand customer feedback, provide a better search experience through search filters, and use topics to categorize documents.
How Amazon Comprehend works: via - https://aws.amazon.com/comprehend/
Amazon Transcribe
Amazon Transcribe is an automatic speech recognition service that uses machine learning models to convert audio to text. You can use Amazon Transcribe as a standalone transcription service or add speech-to-text capabilities to any application.
Amazon Translate
Amazon Translate is a text translation service that uses advanced machine learning technologies to provide high-quality translation on demand. You can use Amazon Translate to translate unstructured text documents or to build applications that work in multiple languages.
Amazon Rekognition
Amazon Rekognition is a cloud-based image and video analysis service that makes it easy to add advanced computer vision capabilities to your applications. You can add features that detect objects, text, and unsafe content, analyze images/videos, and compare faces to your application using Rekognition’s APIs.
Machine learning implementation
Difficulty in collecting and preparing high-quality data for training models
One of the main challenges in machine learning implementation is the difficulty in collecting and preparing high-quality data for training models. High-quality data is essential for building effective machine learning models, and ensuring that the data is clean, relevant, and well-prepared can be a complex and time-consuming process.
There are many machine learning algorithms available, but the challenge lies in other aspects of implementation.
While computational power can be a challenge for very large models, it is not a primary challenge for most machine learning implementations due to the availability of powerful computing resources.
Machine learning has a wide range of applications in real-world scenarios, and its use is not particularly limited.
How can you prevent model-overfitting in machine learning?
By using techniques such as cross-validation, regularization, and pruning to simplify the model and improve its generalization
To prevent overfitting, techniques such as cross-validation, regularization, and pruning are employed. Cross-validation helps ensure the model generalizes well to unseen data by dividing the data into multiple training and validation sets. Regularization techniques, such as L1 and L2 regularization, penalize complex models to reduce overfitting. Pruning simplifies decision trees by removing branches that have little importance.
via - https://docs.aws.amazon.com/machine-learning/latest/dg/model-fit-underfitting-vs-overfitting.html
Increasing the complexity of the model can lead to overfitting, as it may start capturing noise or random fluctuations in the training data.
Training on a small subset of data may lead to underfitting rather than preventing overfitting.
Avoiding model validation or testing does not prevent overfitting; it is essential to validate and test models to ensure they generalize well to new data.
Model training in Deep Learning
Model training in deep learning involves using large datasets to adjust the weights and biases of a neural network through multiple iterations, using techniques such as gradient descent to minimize the error
In Deep Learning, model training involves feeding large datasets into the neural network and adjusting the weights and biases through multiple iterations. Techniques such as gradient descent are used to minimize the error by computing the gradient of the loss function and updating the weights to reduce the prediction error. Model training in deep learning involves initializing a neural network, feeding it data, calculating losses, adjusting weights using optimization algorithms, and iterating through this process until the model achieves satisfactory performance. Proper data preparation, validation, and hyperparameter tuning are crucial steps to ensure the model generalizes well to new, unseen data.
https://aws.amazon.com/what-is/artificial-intelligence/
Weights and biases in a neural network are not set manually; they are learned during the training process.
Data is crucial for training deep learning models; the network learns from input data.
Deep learning primarily uses neural networks rather than support vector machines and decision trees, which are more common in traditional machine learning.