Acronyms Flashcards

Question

What are Guardrails?

Answer 1

Filters undesireable or harmful content

Answer 2

Uses machine learning to find insights and relationships in text - Used for Natural Language Processing (NLP) - Fully managed and serverless

Answer 3

Enables you to build and train models using custom categories (specific terms, noun-based phrases, etc)

Answer 4

Natural language translation. Enables localizing content and translating large amounts of text.

Answer 5

Automatically converts speech to text. Uses deep learning called Automatic Speech Recognition (ASR). Can automatically remove PII using Redaction. Multi-lingual.

Answer 6

Uses Deep Learning to turn text into lifelike speech. Allows you to create applications that talk.

Answer 7

Allows you to define how to read certain specific pieces of text ("AWS" -> "Amazon Web Services"

Answer 8

Speech Synthesis Markup Language - Markup for text to indicate how to pronouce it ("Hello, how are you?"

Answer 9

Allows you to encode where a sentence starts or ends in the audio.

Answer 10

Find objects, people, text, scenes in images and videos using ML. Facial analysis and facial search to do user verification, people counting.

Answer 11

Label your training images and upload them to Amazon Rekognition. - Examples: find your logo in social media posts, identify your products on stores

Answer 12

Automatically detects inappropriate, unwanted, or offensive content. Can reduce human review to 1-5% of total volume. - Integrates with Amazon Augmented AI (Amazon A2I)

Answer 13

Fully managed service that uses ML to deliver highly accurate forecasts. 50% more accurate than looking at the data itself.

Answer 14

Build chatbots quickly for your applications using voice and text - Example: a chatbot that allows your customers to order pizzas or book a hotel - Supports multiple languages - The bot will ask for ”Slots" (input parameters), if necessary

Answer 15

Fully managed ML-service to build apps with real-time personalized recommendations * Example: personalized product recommendations/re-ranking, customized direct marketing * Example: User bought gardening tools, provide recommendations on the next one to buy * Same technology used by Amazon.com

Answer 16

Automatically extracts text, handwriting, and data from any scanned documents using AI and ML

Answer 17

Fully managed document search service powered by Machine Learning * Extract answers from within a document (text, pdf, HTML, PowerPoint, MS Word, FAQs…) * Natural language search capabilities

Answer 18

Crowdsourcing marketplace to perform simple human tasks (Distributed virtual workforce) - Integrates with Amazon A2I, SageMaker Ground

Answer 19

Human oversight of Machine Learning predictions in production. The ML model can be built on AWS or elsewhere (SageMaker, Rekognition, etc)

Answer 20

Automatically convert medical-related speech to text (HIPAA compliant) Ability to transcribes medical terminologies such as: * Medicine names * Procedures * Conditions and diseases Can be real-time (microphone) or batch (upload files) transcription

Answer 21

Detects and returns useful information in unstructured clinical text: * Physician’s notes * Discharge summaries * Test results * Case notes Uses NLP to detect Protected Health Information (PHI) – DetectPHI API Use Amazon Transcribe to transcribe patient narratives into text that can be analyzed by Amazon Comprehend Medical

Answer 22

ML chip built to perform Deep Learning on 100B+ parameter models. 50% cost reduction when training a model.

Answer 23

ML chip built to deliver inference at high performance and low cost * Inf1, Inf2 instances are powered by AWS Inferentia * Up to 4x throughput and 70% cost reduction Trainium & Inferentia have the lowest environmental footprint

Answer 24

Fully managed service for developers / data scientists to build ML models

Answer 25

Automatic Model Tuning - AMT automatically chooses hyperparameter ranges, search strategy, maximum runtime of a tuning job, and early stop condition

Answer 26

Prepare tabular and image data for machine learning * Data preparation, transformation and feature engineering * Single interface for data selection, cleansing, exploration, visualization, and processing * SQL support * Data Quality tool

Answer 27

Features are inputs to ML models used during training and used for inference * Example - music dataset: song ratings, listening duration, and listener demographics

Answer 28

Used to evaluate Foundation Models, including human factors. It is part of Sagemaker Studio. Able to detect human bias

Answer 29

A set of tools to help explain how machine learning (ML) models make predictions

Answer 30

Reinforcement Learning from Human Feedback

Answer 31

A service that helps build training datasets for machine learning (ML) models. It labels data using human annotators, and can also learn from those labels to automatically label objects. Leverages humans for model grading and data labeling

Answer 32

Using humans to label the data

Answer 33

Contains essential Model information (intended use, risk ratings, and training details)

Answer 34

Centralized repository containing information and insights for all models. It's where you can view, search, and explore all of your models

Answer 35

Define roles for personas Example: data scientists, MLOps engineers

Answer 36

Monitor the quality of your model in production: continuous or on-schedule Alerts for deviations in the model quality: fix data & retrain model

Answer 37

Centralized repository allows you to track, manage, and version ML models Catalog models, manage model versions, associate metadata with a model Manage approval status of a model, automate model deployment, share models

Answer 38

A CI/CD workflow that automates the process of building, training, and deploying a ML model

Answer 39

ML Hub to find pre-trained Foundation Model (FM), computer vision models, or natural language processing models Option 1: ML HUB (FM's) Option 2: ML Solutions (solution templates)

Answer 40

Build ML models using a visual interface (no coding required) * Access to ready-to-use models from Bedrock or JumpStart * Build your own custom model using AutoML powered by SageMaker Autopilot * Part of SageMaker Studio

Answer 41

An open-source tool which helps ML teams manage the entire ML lifecycle MLFlow Tracking Servers * Used to track runs and experiments * Launch on SageMaker with a few clicks Fully integrated with SageMaker (part of SageMaker Studio)

Answer 42

Run SageMaker job containers without any outbound internet access * Can’t even access Amazon S3

Answer 43

Used to forecast time series data Leverages Recurrent Neural Network (RNN)

Answer 44

The degree to which a human can understand the cause of a decision.

Answer 45

Understand the nature and behavior of the model * Being able to look at inputs and outputs and explain w/o understanding exactly how the model came to the conclusion.

Answer 46

Form of responsible AI documentation * Help understand the service and its features

Answer 47

Fairness - promote inclusion and prevent discrimination Explainability Privacy and security - individuals control when and if their data is used Transparency Veracity and robustness - reliable even in unexpected situations Governance - define, implement and enforce responsible AI practices Safety - algorithms are safe and beneficial for individuals and society Controllability - ability to align to human values and intent

Answer 48

Supervised Learning Algorithm used for Classification and Regression tasks Splits data into branches based on feature values Easy to interpret, clear visual representation

Answer 49

Show how a single feature can influence the predicted outcome, while holding other features constant * Particularly helpful when the model is “black box” (i.e., Neural Networks) * Helps with interpretability and explainability

Answer 50

Approach to design AI systems with priorities for humans’ needs * Design for amplified decision-making * Design for unbiased decision-making * Design for human and AI learning

Answer 51

Generating content that is offensive, disturbing, or inappropriate * Defining what constitutes “toxicity” can be a challenge * Boundary between restricting toxic content and censorship

Answer 52

Assertions or claims that sound true, but are incorrect * This is due to the next-word probability sampling employed by LLM * This can lead to content that may not exist, even though the content may seem plausible

Answer 53

Intentional introduction of malicious or biased data into the training dataset of a model * Leads to the model producing biased, offensive, or harmful outputs (intentionally or unintentionally)

Answer 54

Influencing the outputs by embedding specific instructions within the prompts themselves * Hijack the model's behavior and make it produce outputs that align with the attacker's intentions (e.g., generating misinformation or running malicious code)

Answer 55

The risk of exposing sensitive or confidential information to a model during training or inference * The model can then reveal this sensitive data from their training corpus, leading to potential data leaks or privacy violations

Answer 56

The unintentional disclosure or leakage of the prompts or inputs used within a model * It can expose protected data or other data used by the model, such as how the model works

Answer 57

AI models are typically trained with certain ethical and safety constraints in place to prevent misuse or harmful outputs (e.g., filtering out offensive content, restricting access to sensitive information…) * Circumvent the constraints and safety measures implemented in a generative model to gain unauthorized access or functionality

Answer 58

Average of precision and recall

Answer 59

Consumer App Enterprise App Pre-training Models Fine-tuned Models Self-trained Models

Answer 60

Curated collections of data designed specifically at evaluating the performance of language models. Some benchmark datasets allow you to very quickly detect bias and potential discrimination against a group of people.

Answer 61

Manage and carry out various multi-step tasks related to infrastructure provisioning, application deployment, and operational activities

Answer 62

Provides access to Amazon Bedrock to your team so they can easily create AI-powered applications

Answer 63

On-Demand * Pay-as-you-go (no commitment) * Text Models – charged for every input/output token processed * Embedding Models – charged for every input token processed * Image Models – charged for every image generated * Works with Base Models only Batch: * Multiple predictions at a time (output is a single file in Amazon S3) * Can provide discounts of up to 50% Provisioned Throughput * Purchase Model units for a certain time (1 month, 6 months…) * Throughput – max. number of input/output tokens processed per minute * Works with Base, Fine-tuned, and Custom Models

Answer 64

1. Prompt Engineering * No model training needed (no additional computation or fine-tuning) 2. Retrieval Augmented Generation (RAG) * Uses external knowledge (FM doesn’t need to ”know everything”, less complex) * No FM changes (no additional computation or fine-tuning) 3. Instruction-based Fine-tuning * FM is fine-tuned with specific instructions (requires additional computation) 4. Domain Adaptation Fine-tuning * Model is trained on a domain-specific dataset (requires intensive computation)

Answer 65

Developing, designing, and optimizing prompts to enhance the output of FMs for your needs

Answer 66

Adding additional specific instructions to the prompts

Answer 67

A technique where you explicitly instruct the model on what not to include or do in its response.

Answer 68

Determines the level of creativity of the model's output.

Answer 69

Top-P (0 to 1): How large a pool of most likely words to consider when outputting. (.25=consider only top 25%) Top K: limits the number of probable words (eg. 10=less probable words so more coherent response)

Answer 70

How fast the model responds. Not impacted by Top K, Top P or Temperature.

Answer 71

Present a task to the model without providing examples or explicit training for that specific task

Answer 72

Provide examples of a task to the model to guide its output We provide a “few shots” to the model to perform the task

Answer 73

Provide just ONE example of a task to the model

Answer 74

Divide the task into a sequence of reasoning steps, leading to more structure and coherence. Can be combined with Zero-Shot or Few-Shots Prompting.

Answer 75

Simplify and standardize the process of generating Prompts

Answer 76

Fully managed Gen-AI assistant for your employees Based on your company’s knowledge and data Built on Amazon Bedrock (but you can’t choose the underlying FM)

Answer 77

AI code companion to help you code new applications (similar to GitHub Copilot) Answer questions about the AWS documentation and AWS service selection Answer questions about resources in your AWS account Suggest CLI (Command Line Interface) to run to make changes to your account

Answer 78

Amazon QuickSight is used to visualize your data and create dashboards about them Amazon Q understands natural language that you use to ask questions about your data

Answer 79

GenAI app-building playground (powered by Amazon Bedrock) Allows you to experiment creating GenAI apps with various FMs (no coding or AWS account required) UI is similar to Amazon Q Apps

Answer 80

Create Gen AI-powered apps without coding by using natural language Leverages your company’s internal data Possibile to leverage plugins (Jira, etc…)

Answer 81

1. Data Layer – collect vast amount of data 2. ML Framework and Algorithm Layer – data scientists and engineer work together to understand use cases, requirements, and frameworks that can solve them 3. Model Layer – implement a model and train it, we have the structure, the parameters and functions, optimizer function 4. Application Layer – how to serve the model, and its capabilities for your users

Answer 82

Able to process a sentence as a whole instead of word by word.

Answer 83

Generative Pre-trained Transformer – generate human text or computer code based on input prompts

Answer 84

Bidirectional Encoder Representations from Transformers – similar intent to GPT, but reads the text in two directions

Answer 85

Recurrent Neural Network – meant for sequential data such as time-series or text, useful in speech recognition, time-series prediction

Answer 86

Residual Network – Deep Convolutional Neural Network (CNN) used for image recognition tasks, object detection, facial recognition.

Answer 87

Support Vector Machine – ML algorithm for classification and regression

Answer 88

Model to generate raw audio waveform, used in Speech Synthesis

Answer 89

Generative Adversarial Network – models used to generate synthetic data such images, videos or sounds that resemble the training data.

Answer 90

Extreme Gradient Boosting – an implementation of gradient boosting

Answer 91

Learn a mapping function that can predict the output for new unseen input data Needs labeled data: very powerful, but difficult to perform on millions of datapoints

Answer 92

Used to predict a numeric value based on input data Use cases: used when the goal is to predict a quantity or a real value Examples: * Predicting House Prices – based on features like size, location, and number of bedrooms * Stock Price Prediction – predicting the future price of a stock based on historical data and other features * Weather Forecasting – predicting temperatures based onhistorical weather data

Answer 93

Used to predict the categorical label of input data Use cases: scenarios where decisions or predictions need to be made between distinct categories Examples: * Binary Classification – classify emails as "spam" or "not spam" * Multiclass Classification – classify animals in a zoo as "mammal," "bird," "reptile” * Multi-label Classification – assign multiple labels to a movie, like "action" and "comedy Key algorithm: K-nearest neighbors (k-NN) model

Answer 94

K-nearest neighbors (k-NN) model. Used by Classification Supervised Learning.

Answer 95

Used to train the model Percentage: typically, 60-80% of the dataset * Example: 800 labeled images from a dataset of 1000 images

Answer 96

Used to tune model parameters and validate performance Percentage: typically, 10-20% of the dataset * Example: 100 labeled images for hyperparameter tuning (tune the settings of the algorithm to make it more efficient)

Answer 97

Used to evaluate the final model performance Percentage: typically, 10-20% of the dataset * Example: 100 labeled images to test the model's accuracy

Answer 98

The process of using domain knowledge to select and transform raw data into meaningful features Particularly meaningful for "Supervised Learning"

Answer 99

Used to group similar data points together into clusters based on their features Example: Customer Segmentation * Scenario: e-commerce company wants to segment its customers to understand different purchasing behaviors * Data: A dataset containing customer purchase history (e.g., purchase frequency, average order value) * Goal: Identify distinct groups of customers based on their purchasing behavior * Technique: K-means Clustering

Answer 100

Example: Market Basket Analysis * Scenario: supermarket wants to understand which products are frequently bought together * Data: transaction records from customer purchases * Goal: Identify associations between products to optimize product placement and promotions * Technique: Apriori algorithm

Answer 101

Example: Fraud Detection * Scenario: detect fraudulent credit card transactions * Data: transaction data, including amount, location, and time * Goal: identify transactions that deviate significantly from typical behavior * Technique: Isolation Forest

Answer 102

Use a small amount of labeled data and a large amount of unlabeled data to train systems After that, the partially trained algorithm itself labels the unlabeled data This is called pseudo-labeling

Answer 103

Have a model generate pseudo-labels for its own data without having humans label any data first

Answer 104

A type of Machine Learning where an agent learns to make decisions by performing actions in an environment to maximize cumulative rewards

Answer 105

Reinforcement Learning from Human Feedback * Use human feedback to help ML models to self-learn more efficiently * In Reinforcement Learning there’s a reward function

Answer 106

Difference or error between predicted and actual value. The model doesn’t closely match the training data

Answer 107

How much the performance of a model changes if trained on a different dataset which has a similar distribution

Answer 108

Best way to evaluate the performance of a model that does classifications (True positive, False Negative, False Positive, True Negative)

Answer 109

Area under the curve-receiver operator curve

Answer 110

Used for evaluating models that predict a continuous value (i.e., regressions) * Example: Imagine you’re trying to predict how well students do on a test based on how many hours they study. - MAE, MAPE, RMSE – measure the error: how “accurate” the model is * if RMSE is 5, this means that, on average, your model’s prediction of a student's score is about 5 points off from their actual score - R² (R Squared) – measures the variance

Answer 111

When a model is making a prediction on new data

Answer 112

Small Language Model (SLM) on the edge device * Very low latency * Low compute footprint * Offline capability, local inference

Answer 113

1. Define business goals 2. ML problem framing 3. Data processing 4. Model development 5. Retrain 6. Deployment 7. Monitoring 8. Iterations

Answer 114

Visualize the data with graphs

Answer 115

Settings that define the model structure and learning algorithm and process * Set before training begins Important Ones: - Learning Rate - Batch Size - Number of Epochs (how many iterations) - Regularization (balance between simple & complex model)

Answer 116

For deterministic problems where the solution can be computed, it's better to just write computer code.

Answer 117

A fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data in AWS. * Macie helps identify and alert you to sensitive data, such as personally identifiable information (PII)

Answer 118

A Per Region service that helps with auditing and recording compliance of your AWS resources * Helps record configurations and changes over time

Answer 119

Automated Security Assessments - only for EC2 instances, Container Images & Lambda functions

Answer 120

Provides governance, compliance and audit for your AWS Account Get an history of events / API calls made within your AWS Account A trail can be applied to All Regions (default) or a single Region.

Answer 121

Portal that provides customers with on-demand access to AWS compliance documentation and AWS agreements

Answer 122

Assess risk and compliance of your AWS workloads(HIPAA, SOX, GDPR, etc) Continuously audit AWS services usage and prepare audits

Answer 123

No need to install anything – high level AWS account assessment Analyze your AWS accounts and provides recommendation on 6 categories: * Cost optimization * Performance * Security * Fault tolerance * Service limits * Operational Excellence

Answer 124

Private network to deploy your resources (regional resource)

Answer 125

Allows you to partition your network inside your VPC (Availability Zone resource)

Answer 126

Helps our VPC instances connect with the internet * Public Subnets have a route to the internet gateway.

Answer 127

Allows your instances in your Private Subnets to access the internet while remaining private

Answer 128

Access an AWS service privately without going over the public internet * Usually powered by AWS PrivateLink

Answer 129

Bedrock must have an IAM Role that gives it access to: * Amazon S3 * The KMS Key with the decrypt permission

Answer 130

Amazon Connect is the contact center service from AWS.

Acronyms Flashcards

Learn each of the main acronyms in AI