FineTuningEvaluatingModels Flashcards

Question 1

Q

____ will change the weight of the base foundation model.

Answer

A

Fine-tuning

Question 2

Q

The process of adapting a copy of a foundation model with your own data is called ____.

Answer

A

fine-tuning

Question 3

Q

Training data for fine-tuning a foundation model must adhere to a specific ____ and be store in ____.

Answer

A

format / S3

Question 4

Q

You must use “____” to use a fine-tuned model which is pay-per-use.

Answer

A

provisioned throughput

Question 5

Q

True/False: All foundation models in Amazon Bedrock can be fine-tuned.

Question 6

Q

____-based fine tuning improves the performance of a pre-trained FM on domain-specific tasks.

Answer

A

Instruction

Question 7

Q

What are domain-specific tasks in the context of training models?

Answer

A

A model is futher trained on a particular field or area of knowledge.

Question 8

Q

Instruction-based fine-tuning uses ____ examples that are ____ pairs.

Answer

A

labeled / prompt-response

Question 9

Q

When you want to continue pre-training a model, you need to provide ____ data to continue the training of a FM.

Answer

A

unlabeled

Question 10

Q

To make a model an expert in a specific domain, you must perform ____ fine-tuning.

Answer

A

domain-adaptation

Question 11

Q

Feeding the entire AWS documentation to a model to make it an expert on AWS is an example of ____ fine-tuning.

Answer

A

domain-adaptation

Question 12

Q

When performing continueed pre-training on a model, it is good to feed the model ____ terminology.

Answer

A

industry-specific

Question 13

Q

____ messaging is a subcategory of instruction-based fine-tuning that contains an array of message objects, each containing a role and content field.

Answer

A

Single-Turn

Question 14

Q

____ messaging is a subcategory of instruction-based fine-tuning for conversations, such as chatbots.

Answer

A

Multi-Turn

Question 15

Q

With Multi-Turn messaging, you must alternate between “____” and “____” roles.

Answer

A

user / assistant

Question 16

Q

Re-training an FM requires a ____ budget because it requires more computations.

Question 17

Q

Instruction-based fine-tuning is usually ____ as computations are less intense and the amount of data required is usually less.

Question 18

Q

True/False: Running a fine-tuned model is more expensive because you have to use provisioned throughput.

Question 19

Q

When fine-tuning a model, you must prepare the ____, do the fine-tuning, ____ the model.

Answer

A

data / evaluate

Question 20

Q

____ is the broader concept of re-using a pre-trained model to adapt it to a new related task.

Answer

A

Transfer Learning (fine-tuning is a type of transfer learning)

Question 21

Q

Transfer learning is widely used for ____ classification and ____ processing.

Answer

A

image / natural language

Question 22

Q

A chatbot designed with a particular persona or tone, or geared towards a specific purpose is a a use case for ____.

Answer

A

fine-tuning a model

Question 23

Q

Training using more up-to-date information that what the language model previously accessed is a use case of ____.

Answer

A

fine-tuning a model

Question 24

Q

Training with exclusive data (e.g. your historical emails or messages, internal records) is a use case of ____.

Answer

A

fine-tuning a model

Question 25

Q

____ are curated collections of data designed specifically at evaluating the performance of language models.

Answer

A

Benchmark datasets

Question 26

Q

Benchmark datasets are helpful to measure ____, ____ and ____, ____.

Answer

A

accuracy, speed, efficency, scalability

Question 27

Q

Some benchmark datasets allow you to very quickly detect any kind of ____ and potential ____ against a group of people.

Answer

A

bias \ discrimination

Question 28

Q

True/False: You can create your own benchmark dataset that is specific to your business.

Question 29

Q

____ evaluation of a model is where users compare model generated answers to benchmark answers.

Question 30

Q

____ evaulation of a model is where a “judge” model automatically compares the benchmark answers to the model generated answers.

Answer

A

Automatic

Question 31

Q

The purpose of the ____ automated metric for evaluating an FM is to evaluate automatic summarization and machine translation systems.

Answer

A

ROUGE - Recall-Oriented Understudy for Gisting Evaluation

Question 32

Q

The ____ automated metric for evaulating a FM measures the number of matching n-grams between reference and generated text.

Answer

A

ROUGE-N: n-grams is the number of words

Question 33

Q

The ____ automated metric for evaulating a FM compares the longest common subsequence between reference and generated text.

Question 34

Q

The ____ automated metric for evaluating a FM evaluates the quality of generated text, especially for translations.

Answer

A

BLEU: Bilingual Evaluation Understudy

Question 35

Q

The BLEU atuomated metric considers both ____ and ____ too much much brevity.

Answer

A

precision / penalizes

Question 36

Q

The ____ atuomated metric for evaluation an FM looks at the semantic similarity between generated text.

Answer

A

BERTScore

Question 37

Q

Which automated metric for evaluating a FM uses pre-trained BERT models to compare the contextualized embeddings of both texts and computes the cosine similarity between them?

Answer

A

BERTScore

Question 38

Q

True/False: The automatic metrics type BERTScore is capable of capturing more nuance between texts when evaluating a function model.

Question 39

Q

The ____ automated metric is how well the model predicts the next token (lower is better).

Answer

A

perplexity

Question 40

Q

Which business metric is used to evaluate a model by gathering users feedback and assess their satisfaction with the model responses?

Answer

A

User Satisfaction

Question 41

Q

Which business metric used to evaluate a model calculates the average revenue per user?

Answer

A

Average Revenue Per User (ARPU)

Question 42

Q

Which business metric is used to evaluate a model by measuring the model’s ability to perform cross different domain tasks?

Answer

A

Cross-Domain Performance

Question 43

Q

Which business metric is used to evaluate a model generates recommended desired outcomes such as purchases?

Answer

A

Conversion Rate

Question 44

Q

Which business metric is used to evaluate a model’s efficiency in computation, resource utilization (improve production line efficiency)?

Answer

A

Efficiency

Question 45

Q

Identify the Model Evaluation task type option described when creating an automatic model evaluation.
The model summarizes text based on the prompts that you provide.

a) general text generation b) question and answer
c) text summarization d) text classification

Answer

A

text summarization

Question 46

Q

Identify the Model Evaluation task type option described when creating an automatic model evaluation.
The model performs natural language processing and text generation tasks

a) general text generation b) question and answer
c) text summarization d) text classification

Answer

A

general text generation

Question 47

Q

Identify the Model Evaluation task type option described when creating an automatic model evaluation.
The model categorizes text into predefined classes based on the input dataset.

a) general text generation b) question and answer
c) text summarization d) text classification

Answer

A

text classification

Question 48

Q

Identify the Model Evaluation task type option described when creating an automatic model evaluation.
The answers the models provides are based on your prompts

a) general text generation b) question and answer
c) text summarization d) text classification

Answer

A

question and answer

Question 49

Q

Identify the Model Evaluation metric option described when creating an automatic model evaluation.
Examines the models ability to encode factual knowledge about the real world.

a) accuracy
b) toxicity
c) robustness

Question 50

Q

Identify the Model Evaluation metric option described when creating an automatic model evaluation.
Guages propensity to generate harmful, offensive, or inapproporiate context.

a) accuracy
b) toxicity
c) robustness

Question 51

Q

Identify the Model Evaluation metric option described when creating an automatic model evaluation.
Assesses the degree to which minor, semantic-preserving changes impact the models ouput.

a) accuracy
b) toxicity
c) robustness

Answer

A

robustness

Question 52

Q

When building an automatic model evaluation, what two options are provided for choosing a dataset?

Answer

A

built-in or provide your own

Question 53

Q

What is the difference in model evaluation task type options when you are setting up automatic vs human model evaluation?

Answer

A

With human, you have the option to provide a custom task type

Question 54

Q

Identify the model evaluation metric described.
Measures the lingustic quailty of the generated text.

a) Coherence
b) Fluency
c) Correctness
d) Accuracy

Question 55

Q

Identify the model evaluation metric described.
Measures the organization and structure of the generated text.

a) Coherence
b) Fluency
c) Correctness
d) Consistency

Answer

A

Coherence

Question 56

Q

Identify the model evaluation metric described.
Measures the harmfulness of the generated text.

a) Coherence
b) Toxicity
c) Correctness
d) Relevance

Question 57

Q

Identify the model evaluation metric described.
Indicates the accuracy of the generated text.

a) Accuracy
b) Completeness
c) Correctness
d) Relevance

Question 58

Q

Identify the model evaluation metric described.
Measures a generated summary’s factual consistency.

a) Accuracy
b) Completeness
c) Correctness
d) Consistency

Answer

A

Consistency

Question 59

Q

Identify the model evaluation metric described.
Measures a generated summary’s inclusion of relevant knowledge or facts

a) Accuracy
b) Completeness
c) Relevance
d) Consistency

Answer

A

Relevance

Question 60

Q

Identify the model evaluation metric described.
Measures a generated answer’s satisfaction in the context of the question.

a) Accuracy
b) Completeness
c) Relevance
d) Correctness

Answer

A

Correctness

Question 61

Q

Identify the model evaluation metric described.
Measures a generated answer’s inclusion of all relevant information.

a) Accuracy
b) Completeness
c) Relevance
d) Correctness

Answer

A

Completeness

Question 62

Q

____ allows a foundation model to reference a data source outside its training data without being fine-tuned.

Answer

A

RAG - Retrieval Augmented Generation

Question 63

Q

RAG is often used where ____ data is needed to be fed into a foundation model.

Answer

A

real-time

Question 64

Q

How does Amazon Bedrock use RAG?

Answer

A

it searches an external knowledge base for information about the query it was given. It takes the data it finds and the original query and passes the information to the foundation model.

Answer 52

A

OpenSearch Service and Aurora

Answer 53

A

Aws services OpenSearch Service and Aurora. As well as, MongoDB, Redis and Pinecone.

Answer 54

A

embeddings

Answer 55

A

Amazon Titan / Cohere

Answer 56

A

Amazon Aurora - proprietary on AWS
Amazon RDS for PostgreSQL - open-source

Answer 57

A

Amazon Neptune

Answer 58

A

True: it has real-time similarity queries and stores millions of vector embeddings

Answer 59

A

real-time similarity queries, store millions of vector embeddings, scalable index management, and fast nearest-neighbor (kNN) search capability

Answer 60

A

Amazon S3, Confluence, SharePoint, Salesforce, web pages

Answer 61

A

the knowledge base could be products, features, specifications, troubleshooting guides and FAQs. The RAG application is the chatbot that can answer customer queries.

Answer 62

A

the knowledge base could be the laws, regulations, case precedents, legal opinions and expert analysis. The RAG application is a chatbot that can provide relevant information for specific legal queries.

Answer 63

A

the knowledge base could be diseases, treatments, clinical guidelines, research papers, patients, etc. The RAG application is a chatbot that an answer complex medical queries.

Brainscape's Knowledge GenomeTM

FineTuningEvaluatingModels Flashcards

Brainscape's Knowledge Genome^TM