Domain 2 Flashcards

1
Q

What is generative AI?

A

Generative AI is a subset of deep learning. Like deep learning, generative AI is a multipurpose technology that helps to generate new original content rather than finding or classifying existing content. Generative AI focuses on creating new content, such as text, images, audio, video, and even code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Gen AI foundational models have small numbers of parameters

A

False. very large and complex neural network models with billions of parameters that are learned during the training phase or pre-training

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the current core element of Gen AI?

A

Transformer network. Transformers were introduced in a 2017 paper called “Attention Is All You Need.” Some LLMs, such as ChatGPT, are built on the transformer architecture. These LLMs are pre-trained on massive amounts of the text data from the internet. They can use this pre-training process to build up a broad knowledge base. And they can be fine-tuned for specific tasks with relatively little additional data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Prompt

A

The input that you sent into your generative model is called the prompt.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

inference

A

s the process that a trained machine learning model* uses to draw conclusions from brand-new data. An AI model capable of making inferences can do so without examples of the desired result.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Completion

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Context window

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Tokens

A

a token is a fundamental unit of data that is processed by algorithms, especially in natural language processing (NLP) and machine learning services. A token is essentially a component of a larger data set, which may represent words, characters, or phrases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

LLMs’ vocabulary

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Tokenizer

A

Before your text is sent to the AI, it gets turned into numbers in a process called tokenization. These tokens are how the AI reads and interprets text.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Prompt engineering

A

is the process of creating and refining inputs, or prompts, for AI models to produce the desired outputs. It’s a complex process that involves providing context, instructions, and examples to guide the AI model to understand the user’s intent and respond in a meaningful way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is in-context learning?

A

to get the model to produce better completions is to include examples of the task that you want the model to carry out. These examples can be incorporated inside the prompt. You can use few-shot, zero-shot, and one-shot inference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

These influence the model’s completion to the prompt

A

Inference configuration parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

every language-based generative AI model has this, which converts human text into a vector that contains token IDs or input IDs. Each input ID represents a token in the model’s vocabulary.

A

tokenizer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a vector?

A

A vector is an ordered list of numbers that represent features or attributes of some entity or concept. In the context of generative AI, vectors might represent words, phrases, sentences, or other units. The power of vector representations is in the ability to encode related relationships between the items and capture meaningful associations, analogies, and hierarchies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Embedding vectors are also called embeddings.

A

Embeddings are a numerical vectorized representation of any entity. Embeddings capture the semantic meaning of tokens such as text, image, video, or audio. For example, the vectors encode the meaning and context of tokens within a large body of text.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Name an innovation of transformers

A

An innovation of transformers is this self-attention mechanism. This mechanism helps the model to weigh the importance of different parts of the input when generating each output token. As a result, the model can capture long-range dependencies and contextual relationships that were difficult to learn with previous architectures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are position embeddings?

A

Transformers also introduce the concept of position embeddings, which encode the relative position of each token in the sequence. They help the model to distinguish between identical tokens that appear in different positions, which is important for understanding sentence structure and word order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

encoder

A

generates an embedding or vector representation for each token

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Self-attestation

A

Self-attestation is a process where an organization or individual declares that they comply with a specific set of rules or standards without the need for third-party verification. Self-attestation is often used in the context of cybersecurity and compliance, but it can also be used in other contexts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

decoder

A

also known as generators, are responsible for translating these latent vectors back into meaningful output data. They reconstruct the data based on the learned patterns and relationships from the encoded space, resulting in outputs that often exhibi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Soft max output

A

the softmax output is a vector of probabilities that represent the likelihood of each class label in a multi-class classification problem. The softmax function is an activation function that is often used in the final layer of a neural network model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What percentage of data is used for pre-training after the data quality curation step?

A

1% to 3%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Researchers have found that the larger a model is…

A

the more likely it is to work without additional in-context learning or further training. Because the model’s capability increases with size, it has supported the development of larger and larger models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Name the two kinds of Generative AI

A

Unimodal and multimodal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is unimodal Gen AI

A

Unimodal models work with one data modality. LLMs are an example of unimodal generative AI because the input and the output, or completion, are text.

27
Q

What is multimodal Gen AI

A

Multimodal is adding another modality such as image, video, or audio. Multimodal models can understand diverse data sources and can provide more robust forecasts. Multimodal generative AI use cases are marketing, image captioning, product design, customer service, chatbots, and avatars

28
Q

What are two important classes that go beyond text only applications?

A

Multimodal and diffusion models

29
Q

What are some examples of multimodal tasks?

A

These are image captioning, where the model is generating text descriptions of images, visual question answering, where the model answers questions about image content. Another example is text to image synthesis, which is generating images from textual descriptions

30
Q

What are some examples of models that produces realistic and diverse images?

A

DALL-E, Stable Diffusion, and Midjourney

31
Q

These models are a class of generative models that learn to reverse a gradual noising process.

A

Diffusion models. Diffusion-based architectures offer a higher degree of control in quality and diversity of images generated

32
Q

forward diffusion

A

The forward diffusion process begins by sampling from a basic, usually Gaussian, distribution. This initial simple sample undergoes a series of reversible, incremental modifications, where each step introduces a controlled amount of complexity through a Markov chain.

33
Q

reverse diffusion

A

is the opposite of the forward diffusion process, where a model learns to recover original data from noisy data. The process involves training a model to find reverse Markov transitions that maximize the likelihood of the training data

34
Q

Stable diffusion

A

a generative artificial intelligence (generative AI) model that produces unique photorealistic images from text and image prompts. It

35
Q

T or F: Diffusion models tend to produce lower quality outputs with less diversity and consistency, and they’re less stable and harder to train.

A

Diffusion models tend to produce higher quality outputs with more diversity and consistency, and they’re more stable and easier to train.

36
Q

What are some use cases for Generative AI?

A

writing or rewriting pieces of text to adapt to different audiences
text summarization
code generation and completion
information extraction, question answering, classification, identifying harmful content, translation, recommendation engines, personalized marketing and ads, chatbots, customer service agents, and search

37
Q

What is the rebranded name of Amazon CodeWhisper?

A

Amazon Q Developer

38
Q

What Gen AI services help with virtual production and 3D content creation?

A

Amazon Nimble Studio and Amazon Sumerian

39
Q

generative adversarial networks

A

Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of generative model for unsupervised learning, GANs have also proved useful for semi-supervised learning,[4] fully supervised learning,[5] and reinforcement learning.[6]

40
Q

variational autoencoders

A

artificial neural network architecture introduced by Diederik P. Kingma and Max Welling.[1] It is part of the families of probabilistic graphical models and variational Bayesian methods.[2]

In addition to being seen as an autoencoder neural network architecture, variational autoencoders can also be studied within the mathematical formulation of variational Bayesian methods, connecting a neural encoder network to its decoder through a probabilistic latent space (for example, as a multivariate Gaussian distribution) that corresponds to the parameters of a variational distribution.

41
Q

What are the stages in the Gen AI project life cycle?

A

identify use case; experiment and select;
adapt, align, and augment;
evaluate;
deploy and iterate;
monitor

42
Q

The most important step in any project is to define the scope as accurately and narrowly as you can.

A

You should think about what function the LLM will have in your specific application. Do you need the model to be able to carry out many different tasks, including long-form text generation, or is the task much more specific, like named entity recognition, so that your model only needs to be good at one thing. Getting specific about what you need your model to do, You can save time and perhaps more importantly, compute costs.

43
Q

What are the steps in the foundation model lifecycle?

A

data selection,
model selection (foundational or build your own)
pre-training,
fine-tuning, evaluation,
guardrails
deployment
monitoring
feedback

44
Q

What is an additional fine-tuning technique?

A

Reinforcement learning (from human feedback)

45
Q

Re-enforcement learning

A

a machine learning technique that teaches software how to make decisions to achieve the best outcomes. It’s based on the idea that the best way to learn is through trial and error, and it’s often used in robotics and gaming. RL algorithms learn by interacting with an environment and observing how it responds. They use a reward-and-punishment system to reinforce actions that help them achieve their goals, and ignore actions that don’t.

46
Q

What are three advantages to generative AI?

A

adaptability
responsiveness
simplicity

47
Q

Do LLM’s learn from interacting with you?

A

Every time you prompt your LLM, the LLM does not actually remember earlier conversations. It is similar to asking a different child for every single task. Therefore, you don’t get to train them over time on specifics of your business or the style you want them to write, but you could with fine-tuning

48
Q

What is Fine Tuning?

A

Fine tuning is an alternate approach to GenAI development that involves training an LLM on a smaller, specialized, labeled dataset and adjusting the model’s parameters and embeddings based on new data. This is different from RAG.

49
Q

What are two methods for model interpretability?

A

Intrinsic Analysis
Post Hoc Analysis

50
Q

ROUGE, or Recall-Oriented Understudy for Gisting Evaluation,

A

is primarily employed to assess the quality of automatically-generated summaries by comparing them to human-generated reference summaries

51
Q

BLEU, or Bilingual Evaluation Understudy,

A

is an algorithm designed to evaluate the quality of machine-translated texts by comparing it to human-generated translations.

52
Q

autoregressive models.

A

a class of machine learning (ML) models that automatically predict the next component in a sequence by taking measurements from previous inputs in the sequence.

53
Q

By analyzing large amounts of business data to forecast their future values or to detect outliers and understand the root cause is complex, time consuming, and not always accurate. AWS provides Amazon’s business metric analysis ML solution which uses Amazon Lookout for Metrics and Amazon Forecast to solve these problems. It uses machine learning to analyze large volumes of data while dynamically adapting to changing business requirements.

A
54
Q

Organizations need to evaluate potential return on investment and weighing the cost and benefits of FMs considering their application. Additionally, it’s important to understand the metrics for comparing operational costs

A
55
Q

What are the advantages of using OOTB services from AWS for Gen AI?

A

Accessibility
Lower barrier to entry
Efficiency
Cost-effectiveness
Speed to market
Ability to meet business objectives

56
Q

AWS Nitro

A

Security layer

57
Q

Three critical components of AI systems that need to be secured

A

Input
model
output

58
Q

Give some examples of AI system vulnerabilites

A

prompt injection, data poisoning, and model inversion vulnerabilities

59
Q

Two pricing models for LLMS

A

Host your own infra
token-based pricing

60
Q

AWS ML Stack

A

AWS AI Services (APIs, SDKs)

AWS ML Services
Amazon SageMaker

Amazon Infrastructure, EC2, GPUs, and more

61
Q

SageMaker Jump Start

A

Pretrained models
Find tune your data
Deploy using SageMaker

62
Q

Amazon Bedrock

A

Gives you the ability to interact with different best in class models, including its own models (Titan). mazon Bedrock adds the capability to import custom weights for supportive model architectures, and serve the custom model by using on-demand mode

63
Q

What is Amazon Bedrock’s Playground?

A

Playgrounds in Amazon Bedrock let you experiment by running model inference against different base foundation models that are supported within the service to help you align your use cases with the highest accuracy. Depending on the model selected for your playground, it will determine the right types of inference parameters that you can adjust. And remember, you can vary the number of inference parameters to determine different completion results.

64
Q

Remember that with generative AI, you can use vector databases, and that data is stored as embeddings. These embeddings are vectors that can be compressed, stored and indexed for advanced searches

A