FineTuningEvaluatingModels Flashcards

1
Q

____ will change the weight of the base foundation model.

A

Fine-tuning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The process of adapting a copy of a foundation model with your own data is called ____.

A

fine-tuning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Training data for fine-tuning a foundation model must adhere to a specific ____ and be store in ____.

A

format / S3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

You must use “____” to use a fine-tuned model which is pay-per-use.

A

provisioned throughput

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

True/False: All foundation models in Amazon Bedrock can be fine-tuned.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

____-based fine tuning improves the performance of a pre-trained FM on domain-specific tasks.

A

Instruction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are domain-specific tasks in the context of training models?

A

A model is futher trained on a particular field or area of knowledge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Instruction-based fine-tuning uses ____ examples that are ____ pairs.

A

labeled / prompt-response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When you want to continue pre-training a model, you need to provide ____ data to continue the training of a FM.

A

unlabeled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

To make a model an expert in a specific domain, you must perform ____ fine-tuning.

A

domain-adaptation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Feeding the entire AWS documentation to a model to make it an expert on AWS is an example of ____ fine-tuning.

A

domain-adaptation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When performing continueed pre-training on a model, it is good to feed the model ____ terminology.

A

industry-specific

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

____ messaging is a subcategory of instruction-based fine-tuning that contains an array of message objects, each containing a role and content field.

A

Single-Turn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

____ messaging is a subcategory of instruction-based fine-tuning for conversations, such as chatbots.

A

Multi-Turn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

With Multi-Turn messaging, you must alternate between “____” and “____” roles.

A

user / assistant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Re-training an FM requires a ____ budget because it requires more computations.

A

higher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Instruction-based fine-tuning is usually ____ as computations are less intense and the amount of data required is usually less.

A

cheaper

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

True/False: Running a fine-tuned model is more expensive because you have to use provisioned throughput.

A

t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

When fine-tuning a model, you must prepare the ____, do the fine-tuning, ____ the model.

A

data / evaluate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

____ is the broader concept of re-using a pre-trained model to adapt it to a new related task.

A

Transfer Learning (fine-tuning is a type of transfer learning)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Transfer learning is widely used for ____ classification and ____ processing.

A

image / natural language

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

A chatbot designed with a particular persona or tone, or geared towards a specific purpose is a a use case for ____.

A

fine-tuning a model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Training using more up-to-date information that what the language model previously accessed is a use case of ____.

A

fine-tuning a model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Training with exclusive data (e.g. your historical emails or messages, internal records) is a use case of ____.

A

fine-tuning a model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

____ are curated collections of data designed specifically at evaluating the performance of language models.

A

Benchmark datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Benchmark datasets are helpful to measure ____, ____ and ____, ____.

A

accuracy, speed, efficency, scalability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Some benchmark datasets allow you to very quickly detect any kind of ____ and potential ____ against a group of people.

A

bias \ discrimination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

True/False: You can create your own benchmark dataset that is specific to your business.

A

t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

____ evaluation of a model is where users compare model generated answers to benchmark answers.

A

Human

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

____ evaulation of a model is where a “judge” model automatically compares the benchmark answers to the model generated answers.

A

Automatic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

The purpose of the ____ automated metric for evaluating an FM is to evaluate automatic summarization and machine translation systems.

A

ROUGE - Recall-Oriented Understudy for Gisting Evaluation

32
Q

The ____ automated metric for evaulating a FM measures the number of matching n-grams between reference and generated text.

A

ROUGE-N: n-grams is the number of words

33
Q

The ____ automated metric for evaulating a FM compares the longest common subsequence between reference and generated text.

A

ROUGE-L

34
Q

The ____ automated metric for evaluating a FM evaluates the quality of generated text, especially for translations.

A

BLEU: Bilingual Evaluation Understudy

35
Q

The BLEU atuomated metric considers both ____ and ____ too much much brevity.

A

precision / penalizes

36
Q

The ____ atuomated metric for evaluation an FM looks at the semantic similarity between generated text.

A

BERTScore

37
Q

Which automated metric for evaluating a FM uses pre-trained BERT models to compare the contextualized embeddings of both texts and computes the cosine similarity between them?

A

BERTScore

38
Q

True/False: The automatic metrics type BERTScore is capable of capturing more nuance between texts when evaluating a function model.

A

t

39
Q

The ____ automated metric is how well the model predicts the next token (lower is better).

A

perplexity

40
Q

Which business metric is used to evaluate a model by gathering users feedback and assess their satisfaction with the model responses?

A

User Satisfaction

41
Q

Which business metric used to evaluate a model calculates the average revenue per user?

A

Average Revenue Per User (ARPU)

42
Q

Which business metric is used to evaluate a model by measuring the model’s ability to perform cross different domain tasks?

A

Cross-Domain Performance

43
Q

Which business metric is used to evaluate a model generates recommended desired outcomes such as purchases?

A

Conversion Rate

44
Q

Which business metric is used to evaluate a model’s efficiency in computation, resource utilization (improve production line efficiency)?

A

Efficiency

45
Q

Identify the Model Evaluation task type option described when creating an automatic model evaluation.
The model summarizes text based on the prompts that you provide.

a) general text generation b) question and answer
c) text summarization d) text classification

A

text summarization

46
Q

Identify the Model Evaluation task type option described when creating an automatic model evaluation.
The model performs natural language processing and text generation tasks

a) general text generation b) question and answer
c) text summarization d) text classification

A

general text generation

47
Q

Identify the Model Evaluation task type option described when creating an automatic model evaluation.
The model categorizes text into predefined classes based on the input dataset.

a) general text generation b) question and answer
c) text summarization d) text classification

A

text classification

48
Q

Identify the Model Evaluation task type option described when creating an automatic model evaluation.
The answers the models provides are based on your prompts

a) general text generation b) question and answer
c) text summarization d) text classification

A

question and answer

49
Q

Identify the Model Evaluation metric option described when creating an automatic model evaluation.
Examines the models ability to encode factual knowledge about the real world.

a) accuracy
b) toxicity
c) robustness

A

accuracy

50
Q

Identify the Model Evaluation metric option described when creating an automatic model evaluation.
Guages propensity to generate harmful, offensive, or inapproporiate context.

a) accuracy
b) toxicity
c) robustness

A

toxicity

51
Q

Identify the Model Evaluation metric option described when creating an automatic model evaluation.
Assesses the degree to which minor, semantic-preserving changes impact the models ouput.

a) accuracy
b) toxicity
c) robustness

A

robustness

52
Q

When building an automatic model evaluation, what two options are provided for choosing a dataset?

A

built-in or provide your own

53
Q

What is the difference in model evaluation task type options when you are setting up automatic vs human model evaluation?

A

With human, you have the option to provide a custom task type

54
Q

Identify the model evaluation metric described.
Measures the lingustic quailty of the generated text.

a) Coherence
b) Fluency
c) Correctness
d) Accuracy

A

Fluency

55
Q

Identify the model evaluation metric described.
Measures the organization and structure of the generated text.

a) Coherence
b) Fluency
c) Correctness
d) Consistency

A

Coherence

56
Q

Identify the model evaluation metric described.
Measures the harmfulness of the generated text.

a) Coherence
b) Toxicity
c) Correctness
d) Relevance

A

Toxicity

57
Q

Identify the model evaluation metric described.
Indicates the accuracy of the generated text.

a) Accuracy
b) Completeness
c) Correctness
d) Relevance

A

Accuracy

58
Q

Identify the model evaluation metric described.
Measures a generated summary’s factual consistency.

a) Accuracy
b) Completeness
c) Correctness
d) Consistency

A

Consistency

59
Q

Identify the model evaluation metric described.
Measures a generated summary’s inclusion of relevant knowledge or facts

a) Accuracy
b) Completeness
c) Relevance
d) Consistency

A

Relevance

60
Q

Identify the model evaluation metric described.
Measures a generated answer’s satisfaction in the context of the question.

a) Accuracy
b) Completeness
c) Relevance
d) Correctness

A

Correctness

61
Q

Identify the model evaluation metric described.
Measures a generated answer’s inclusion of all relevant information.

a) Accuracy
b) Completeness
c) Relevance
d) Correctness

A

Completeness

62
Q

____ allows a foundation model to reference a data source outside its training data without being fine-tuned.

A

RAG - Retrieval Augmented Generation

63
Q

RAG is often used where ____ data is needed to be fed into a foundation model.

A

real-time

64
Q

How does Amazon Bedrock use RAG?

A

it searches an external knowledge base for information about the query it was given. It takes the data it finds and the original query and passes the information to the foundation model.

65
Q

What two Amazon services can be used as a RAG vector database with Bedrock?

A

OpenSearch Service and Aurora

66
Q

What are the 5 vector database options provided by Amazon Bedrock?

A

Aws services OpenSearch Service and Aurora. As well as, MongoDB, Redis and Pinecone.

67
Q

A ____ model takes information from S3 and creates the entries in the vector database.

A

embeddings

68
Q

What are the two types of embeddings model AWS makes available?

A

Amazon Titan / Cohere

69
Q

What are two common relational databases used as RAG vector databases?

A

Amazon Aurora - proprietary on AWS
Amazon RDS for PostgreSQL - open-source

70
Q

What graph database can be used as a RAG vector database?

A

Amazon Neptune

71
Q

True/False: Amazon DocumentDB (with MongoDB compatibility) can be used as a vector database.

A

True: it has real-time similarity queries and stores millions of vector embeddings

72
Q

What are some key features of Amazon OpenSearch Service for RAG vector database use?

A

real-time similarity queries, store millions of vector embeddings, scalable index management, and fast nearest-neighbor (kNN) search capability

73
Q

What are some common RAG data sources supported by Amazon Bedrock?

A

Amazon S3, Confluence, SharePoint, Salesforce, web pages

74
Q

Describe a RAG use case for a Customer Service Chatbot.

A

the knowledge base could be products, features, specifications, troubleshooting guides and FAQs. The RAG application is the chatbot that can answer customer queries.

75
Q

Describe a RAG use case for Legal Research and Analysis.

A

the knowledge base could be the laws, regulations, case precedents, legal opinions and expert analysis. The RAG application is a chatbot that can provide relevant information for specific legal queries.

76
Q

Describe a RAG use case for Healthcare Question-Answering

A

the knowledge base could be diseases, treatments, clinical guidelines, research papers, patients, etc. The RAG application is a chatbot that an answer complex medical queries.