Domain 3 Flashcards

1
Q

T or F: LLMs are cost-effective and easy to maintain

A

False. The duration and cost of training a model are important considerations because it can be expensive for hardware storage and more.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name five considerations of using foundational models

A

latency constraints, inference speed, and real-time requirements, Architecture and complexity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

T or F: Accuracy is recommended with datasets that are not evenly distributed or imbalanced.

A

Accuracy is not recommended with datasets that are not evenly distributed or imbalanced.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Name some metrics you can use to evaluate model performance

A

Such metrics might include accuracy, precision, recall, F1 score, root mean squared error or RMSE, mean average precision or MAP, and mean absolute error, MAE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

biases that might be present in the training data. It’s important to understand how to mitigate risks, address ethical concerns, and make informed decisions about model selection and fine-tuning

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Another consideration is the availability and compatibility of the pre-trained model

A

you should check whether the model is compatible with your framework, language, and environment, and check to ensure it has a license and documentation. You should also check whether the model has been updated and maintained regularly and whether it has any known issues or limitations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

interpretability

A

he ability to interpret and explain model outcomes is important. Being transparent refers to interpretability. it means being able to explain mathematically through coefficients and formulas why a model makes a certain prediction. This interpretability is possible if the model is simple enough, but foundation models are not interpretable by design because they are extremely complex. if interpretability is a requirement, then pre-trained foundation models might not be the best choice.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How is explainability different from intertability

A

Explainability attempts to explain this black box, by approximating it locally with a simpler model that is interpretable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Greater complexity might lead to enhanced performance…

A

, but can increase costs. The more complicated the model is, the harder it is to explain the outputs of the model. And there are more considerations too, such as hardware constraints, maintenance updates, data privacy, transfer learning, and more.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is inference?

A

he inference is where you process new data through the model to make predictions. It is the process of generating an output from an input that you provided to model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Amazon Bedrock foundation models supports what inference parameters?

A

Temperature, Top K, Top P to control randomness and diversity in the response. Amazon Bedrock also supports parameters such as response length, penalties, and stop sequences to limit the length of the response.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a prerequisite to creating a vector database?

A

A vector database is filled with dense vectors by processing input data, generally text data, and using an ML model, generally an embedding model. So it’s important to understand that a machine learning model is a prerequisite to create a vector database and the indexing technology itself. Vector databases are the factual reference of foundation model based applications, helping the model retrieve trustworthy data. Foundation models use vector databases as an external data source to improve their capabilities by enhancing search recommendations and text generation use cases. Vector databases add additional capabilities for efficient and fast lookup, and to provide data management, fault tolerance, authentication, and access control and query engine.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

knowledge bases for Amazon Bedrock

A

give you the ability of collecting data sources into a repository of information. This way, you build an application that takes advantage of retrieval augmented generation, RAG.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What two components does RAG have?

A

RAG combines two components, a retriever component, which searches through a knowledge base and a generator component, which produces outputs based on the retrieved information. This combination helps the model access up-to-date and domain-specific knowledge beyond their training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

AWS services that help store embeddings within vector databases.

A

Examples include Amazon OpenSearch Service, Amazon Aurora, Redis, Amazon Neptune, Amazon DocumentDB with MongoDB compatibility, and Amazon RDS with PostgreSQL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Amazon RDS for PostgreSQL also supports the pgvector extension

A

to store embeddings and perform efficient searches.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Agents for Amazon Bedrock

A

is a fully managed AI capability from AWS to help you build applications foundation models. Agents can automatically break down tasks and generate the required orchestration logic or write custom code, and agents can securely connect to your databases through APIs, they can ingest and structure the data for machine consumption and augment it with contextual details to produce more accurate responses and fulfill requests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Temperature:

A

Adjusts the randomness of the model’s response. A lower temperature results in more focused responses, while a higher temperature leads to more diverse outputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Top-k

A

defines the cut-off for the number of words (tokens) for each completion to choose from, ordered by their probabilities. A lower Top K value reduces the chance of an unusual word being selected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Top-p

A

Top P works similarly to Top K The percentage of most-likely candidates that the model considers for the next token.

Choose a lower value to decrease the size of the pool and limit the options to more likely outputs.

Choose a higher value to increase the size of the pool and allow the model to consider less likely outputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

The BERT score is a metric developed to assess the quality of generated responses compared to a set of reference responses. It uses pre-trained models to calculate semantic similarity between the generated responses and reference answers. Here’s how it works:

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language processing. The metrics compare an automatically produced summary or translation against a reference or a set of references (human-produced) summary or translation.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is few-shot prompting? I

A

t is when you provide a few examples to help the LLM models better perform and calibrate their output to meet your expectations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is zero-shot prompting?

A

It is a sentiment classification prompt with no examples provided to the prompt.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

You can also use a prompt template. .

A

Templates might include instructions, few-shot examples, and specific content and questions for different use cases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

chain-of-thought prompting to break down the reasoning process into intermediate steps. This kind of prompting can improve the quality and coherence of the final output.

A
27
Q

prompt tuning,

A

where the actual prompt text is replaced with a continuous embedding backer that is optimized during training. This technique helps the prompt to be fine-tuned for a specific task. At the same time, it keeps the rest of the model parameters frozen, which can be more efficient than full fine-tuning

28
Q

What are common tasks supported by LLMs on Amazon Bedrock?

A

A few are classification, question and answer with and without context, summarization, open-ended text generation, code generation, math, and reasoning or logical thinking.

29
Q

Latent space

A

is the encoded knowledge of language in a large language model. It’s the stored patterns of data that capture relationships and, when prompted, reconstruct language from those patterns.

30
Q

f you prompt a language model and get dissatisfactory or a negative response, then your prompt might be insufficient for the model.

A

But it’s also possible, especially if your model is smaller, that the model’s latent space might not have enough information about the topic of your prompt. That situation can cause the model to hallucinate a model that doesn’t know the exact specifics of a prompt because the knowledge isn’t in its latent space will choose the closest match. This result might be interpreted as a mistake, but the model is actually functioning correctly.

31
Q

Prompt engineering has several key techniques. First, be specific and provide clear instructions or specifications for the task at hand. For example, include the desired format, examples, comparison, style, tone, output length, and detailed context. Second, include examples of the desired behavior and direction, such as sample texts, data formats, templates, code, graphs, charts, and more. Third, experiment and use an iterative process to test prompts and understand how the modifications alter the responses. Fourth, know the strengths and weaknesses of your model. Fifth, balance simplicity and complexity in your prompts to avoid vague, unrelated, or unexpected answers. Six, specifically for your prompt engineers, use multiple comments to offer more context without cluttering your prompt. Seventh, add guardrails.

A
32
Q

hat is prompt injection?

A

It describes attacks of prompt manipulation.

33
Q

jailbreaking

A

When an attacker tries to bypass the guardrails that you have established, this is called jailbreaking. In this case, it is different because jailbreaking targets the safety measures put in place such as guardrails

34
Q

Hijacking

A

is an attempt to change or manipulate the original prompt with new instructions.

35
Q

Poisoning

A

is another risk of prompt engineering where harmful instructions are embedded in messages, emails, web pages, and more

36
Q

Guardrails

A

uardrails provide safety and privacy controls to manage interactions in your generative AI applications. You can define topics within the context of your application that are not desirable. You can set words to be blocked. You can configure thresholds to filter across categories that might be harmful and prompt attacks such as jailbreak and prompt injections. You can also filter inputs that might contain sensitive data.

37
Q

What are the key elements of training a foundation model? They include pre-training, fine-tuning, and continuous pre-training

A

They include pre-training, fine-tuning, and continuous pre-training

38
Q

pre-training,

A

which is a complex process. It requires millions of graphic processing units, GPUs, compute hours, terabytes and petabytes of data, trillions of tokens, trial and error, and time.

39
Q

Fine-tuning

A

is a process that extends the training of the model to improve the generation of completions for a specific task. It is a supervised learning process and you use a dataset of labeled examples to update the weights of the LLM

40
Q

happens when the whole fine-tuning process modifies the weights of the original LLM. This can improve the performance of the single task fine-tuning, but it can degrade performance on other tasks.

A

Catastrophic forgetting

41
Q

Parameter-efficient fine-tuning, PEFT,

A

is a process and set of techniques that freeze or preserve the parameters and weights of the original LLM and fine-tune or train a small number of task-specific adaptor layers and parameters. PEFT reduces the compute and memory that’s needed because it’s fine-tuning a small set of model parameters

42
Q

Low-rank adaptation or LoRA,

A

is a popular PEFT technique that also preserves or freezes the original weights of the foundation model and creates new trainable low-rank matrices into each layer of a transformer architecture.

43
Q

T or false PEFT and LoRA modify the weights of your model, but not the representations.

A

True

44
Q

Multitask fine-tuning

A

is an extension of fine-tuning a single task. Multitask fine-tuning requires a lot of data. For example, the dataset might contain examples that instruct a model to complete multiple tasks such as reviews or ratings, summarization, translating code, and more. This produces an instruction tuned model that has learned how to complete many different tasks simultaneously.

45
Q

What fine-tuning process modifies the weights of the model to adapt to domain-specific data?

A

Domain adaptation fine-tuning gives you the ability to use the pre-trained foundation models and adapt them to specific tasks by using limited domain-specific data. You can use domain adaptation fine-tuning to help your model work with domain-specific language such as industry jargon, technical terms, or other specialized data

46
Q

mazon SageMaker JumpStart provides the capability to fine-tune a large language model, particularly a text generation model on the domain- specific dataset. You can fine-tune models with their custom dataset to improve performance in specific domains.

A
47
Q

reinforcement learning from human feedback, or RLHF

A

RLHF uses reinforcement learning to fine-tune the LLM with human feedback data to better align the model with human preferences

48
Q

If you have low-code data preparation, you can use this to create data flows that define your ML data pre-processing. These data flows feature engineering workflows that use little to no coding.

A

Amazon SageMaker Canvas

49
Q

Suppose that you have data preparation that needs to detect bias in your data. You can use this to analyze your data and detect potential biases across multiple facets.

A

Amazon SageMaker Clarify

50
Q

One optimization technique is to improve application performance by reducing the size of the LLMs. This action can reduce the inference latency because the smaller size model loads more quickly. However,

A

remember that reducing the size of the model might decrease its performance.

51
Q

Other optimization techniques include

A

making a more concise prompt, reducing the size of the retrieved snippets and their number, and reducing generation through inference parameters and prompt.

52
Q

metrics, such as accuracy or root mean square error, RMSE, are more straightforward to calculate. This is because the predictions are deterministic and can be compared against the labels. Can you use these for Gen AI?

A

No. The output of generative AI models is non-deterministic and makes the evaluation more difficult to determine.

53
Q

Recall Oriented Understudy for Gisting Evaluation, or ROUGE,

A

is a set of metrics and a software package. It is used to evaluate automatic summarization tasks and machine translation software in natural language processing. It evaluates how well the input compares to the generated output.

54
Q

Bilingual Evaluation Understudy, or BLEU,

A

is an algorithm that is used for translation tasks. It evaluates the quality of text which has been machine translated from one natural language to another.

55
Q

You can use this to evaluate large language models, LLMs, and create model evaluation jobs. A model evaluation job helps to evaluate and compare model quality and metrics for text-based foundation models from SageMaker JumpStart.

A

Amazon SageMaker Clarify

56
Q

Amazon Bedrock provides this that can automatically compare generated responses and calculate a semantic similarity base score, BERTscore, against a human reference. It is suitable to evaluate faithfulness and hallucinations in text-generation tasks.

A

an evaluation module

57
Q

General Language Understanding Evaluation, GLUE,

A

It is a collection of natural language tasks, such as sentiment analysis and question answering. You can use these tasks to evaluate and compare model performance across a set of language tasks. Then, you can use the benchmark to measure and compare the model performance

58
Q

Massive Multitask Language Understanding, MMLU,

A

evaluates the knowledge and problem-solving capabilities of the model. To perform well, models must have extensive world knowledge and problem-solving ability. The models are tested on more than basic language understanding, such as history, mathematics, laws, computer science, and more.

59
Q

The Beyond the Imitation Game Benchmark, BIG-bench,

A

focuses on tasks that are beyond the capabilities of the current language models. It contains tasks such as math, biology, physics, bias, linguistics, reasoning, childhood development, software development, and more.

60
Q

Holistic Evaluation of Language Models, HELM,

A

which is a benchmark to help improve model transparency. It offers users guidance on which model performs well for a given task. HELM is a combination of metrics for tasks such as summarization, question and answer, sentiment analysis, and bias detection

61
Q

Instruction tuning

A

refers to the process of providing specific labeled examples to train a model on a specific task. Instruction tuning can help models follow specific tasks or responses to prompts in a specific way

62
Q

Length

A

Foundation models typically support parameters that limit the length of the response. Examples of these parameters are provided below.

Response length – An exact value to specify the minimum or maximum number of tokens to return in the generated response.

Penalties – Specify the degree to which to penalize outputs in a response. Examples include the following.

The length of the response.

Repeated tokens in a response.

Frequency of tokens in a response.

Types of tokens in a response.

Stop sequences – Specify sequences of characters that stop the model from generating further tokens. If the model generates a stop sequence that you specify, it will stop generating after that sequence.

63
Q
A