Developing Generative AI Solutions Flashcards

1
Q

Key Metrics in Defining a Use case

A

Cost savings
Time Savings
Quality Improvements
Customer Satifaction
Productivity Gains

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Generative AI App Lifecycle

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

latency is the most crucial criterion for

A

a real-time application on resource-constrained mobile devices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Prompt engineering refers to

A

the process of carefully crafting the input prompts or instructions given to the model to generate desired outputs or behaviors. aims to optimize the prompts to steer the model’s generation in the desired direction, using the model’s capabilities while mitigating potential biases or undesirable outputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

PE: Augmentation:

A

Incorporating additional information or constraints into the prompts, such as examples, demonstrations, or task-specific instructions, to guide the model’s generation process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

PE: TUning

A

*
Tuning: Iteratively refining and adjusting the prompts based on the model’s outputs and performance, often through human evaluation or automated metrics

*

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

PE: Ensembling:

A

Combining multiple prompts or generation strategies to improve the overall quality and robustness of the outputs

*

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

PE: Mining:

A

Exploring and identifying effective prompts through techniques like prompt searching, prompt generation, or prompt retrieval from large prompt libraries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

PE: Design:

A

Crafting clear, unambiguous, and context-rich prompts that effectively communicate the desired task or output to the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Fine-tuning

A

Fine-tuning refers to the process of taking a pre-trained language model and further training it on a specific task or domain-specific dataset. Fine-tuning allows the model to adapt its knowledge and capabilities to better suit the requirements of the business use case.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

There are two ways to fine-tune a model:

A

1
Instruction fine-tuning uses examples of how the model should respond to a specific instruction. Prompt tuning is a type of instruction fine-tuning.

2
Reinforcement learning from human feedback (RLHF) provides human feedback data, resulting in a model that is better aligned with human preferences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Fine-tuning is particularly useful when

A

the target task has a limited amount of training data. This is because the pre-trained model can provide a strong foundation of general knowledge, which is then specialized during fine-tuning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Pursuing a more customized approach, such as training a model from scratch or heavily fine-tuning a pre-trained model, can potentially yield higher accuracy and better performance tailored to the specific use case. However,

A

this customization comes at a higher cost in terms of computational resources, data acquisition, and specialized expertise required for training and optimization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

By using these, organizations can achieve higher levels of automation, consistency, and efficiency in their cloud operations, while also improving visibility, control, and auditability of the processes involved.

A

By using agents for multi-step tasks,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Fine-tuning a pre-trained language model on domain-specific data is generally

A

the most cost-effective approach for customizing the model to a specific domain while maintaining high performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Benchmark Data Sets
The General Language Understanding Evaluation (GLUE)

A

benchmark is a collection of datasets for evaluating language understanding tasks like text classification, question answering, and natural language inference.

17
Q

SuperGLUE is an extension of GLUE with

A

more challenging tasks and a focus on compositional language understanding.

18
Q

Stanford Question Answering Dataset (SQuAD) is a dataset for

A

evaluating question-answering capabilities.

19
Q

Workshop on Machine Translation (WMT) is a series of datasets and tasks for

A

evaluating machine translation systems.

20
Q

Automated Metrics\automated metrics can provide a quick and scalable way to evaluate foundation model performance. These metrics typically measure specific aspects of the model’s outputs

A

Perplexity (a measure of how well the model predicts the next token)
BLEU score (for evaluating machine translation)
F1 score (for evaluating classification or entity recognition tasks)

21
Q

Recall-Oriented Understudy for Gisting Evaluation (ROUGE) is a set of metrics used for

A

evaluating automatic summarization and machine translation systems. It measures the quality of a generated summary or translation by comparing it to one or more reference summaries or translations.

22
Q

Bilingual Evaluation Understudy (BLEU) is a metric used to evaluate the quality of

A

machine-generated text, particularly in the context of machine translation. It measures the similarity between a generated text and one or more reference translations, considering both precision and brevity.

23
Q

BERTScore is a metric that evaluates

A

the semantic similarity between a generated text and one or more reference texts. It uses pre-trained Bidirectional Encoder Representations from Transformers (BERT) models to compute contextualized embeddings for the input texts, and then calculates the cosine similarity between them.

24
Q

Human evaluators can provide qualitative feedback on factors like

A

coherence, relevance, factuality, and overall quality of the model’s outputs.

25
Q

Metrics like ROUGE, BLEU, and BERTScore provide an initial assessment of

A

the foundation model’s capabilities