Implement generative AI solutions (10–15%) Flashcards by Dan Falcone

Describe the four stage process to develop and implement a plan for responsible AI when using generative models.

Identify potential harms that are relevant to your planned solution.
Measure the presence of these harms in the outputs generated by your solution.
Mitigate the harms at multiple layers in your solution to minimize their presence and impact, and ensure transparent communication about potential risks to users.
Operate the solution responsibly by defining and following a deployment and operational readiness plan.

How well did you know this?

Not at all

Perfectly

What steps do you take to measure identified potential harms?

Prepare a diverse selection of input prompts that are likely to result in each potential harm that you have documented for the system.
Submit the prompts to the system and retrieve the generated output.
Apply pre-defined criteria to evaluate the output and categorize it according to the level of potential harm it contains. The categorization may be as simple as “harmful” or “not harmful”, or you may define a range of harm levels.

How well did you know this?

Not at all

Perfectly

Should testing of potential harms be manual or automated?

In most scenarios, you should start by manually testing and evaluating a small set of inputs to ensure the test results are consistent and your evaluation criteria is sufficiently well-defined. Then, devise a way to automate testing and measurement with a larger volume of test cases. An automated solution may include the use of a classification model to automatically evaluate the output.

Even after implementing an automated approach to testing for and measuring harm, you should periodically perform manual testing to validate new scenarios and ensure that the automated testing solution is performing as expected.

How well did you know this?

Not at all

Perfectly

What are the four layers of mitigating potential harms be from a generative AI model?

Model
Safety System
Metaprompt and grounding
User experience

How well did you know this?

Not at all

Perfectly

What are examples of mitigations that could be used in the model layer?

Selecting a model that is appropriate for the intended solution use. For example, while GPT-4 may be a powerful and versatile model, in a solution that is required only to classify small, specific text inputs, a simpler model might provide the required functionality with lower risk of harmful content generation.
Fine-tuning a foundational model with your own training data so that the responses it generates are more likely to be relevant and scoped to your solution scenario.

How well did you know this?

Not at all

Perfectly

Define the safety system layer

The safety system layer includes platform-level configurations and capabilities that help mitigate harm.

How well did you know this?

Not at all

Perfectly

Describe the metaprompt and grounding layer

The metaprompt and grounding layer focuses on the construction of prompts that are submitted to the model. Harm mitigation techniques that you can apply at this layer include:
1. Specifying metaprompts or system inputs that define behavioral parameters for the model.
2. Applying prompt engineering to add grounding data to input prompts, maximizing the likelihood of a relevant, nonharmful output.
3. Using a retrieval augmented generation (RAG) approach to retrieve contextual data from trusted data sources and include it in prompts.

How well did you know this?

Not at all

Perfectly

How can you find the regions available for a service using a CLI command?

az account list-locations

How well did you know this?

Not at all

Perfectly

What are the current types of models available in OpenAI?

GPT-4 models are the latest generation of generative pretrained (GPT) models that can generate natural language and code completions based on natural language prompts.
GPT 3.5 models can generate natural language and code completions based on natural language prompts. In particular, GPT-35-turbo models are optimized for chat-based interactions and work well in most generative AI scenarios.
Embeddings models convert text into numeric vectors, and are useful in language analytics scenarios such as comparing text sources for similarities.
DALL-E models are used to generate images based on natural language prompts. Currently, DALL-E models are in preview.
Whisper models are used to convert speech to text.
Text to speech models are used to convert text to speech.

How well did you know this?

Not at all

Perfectly

What are some prompt types and what could the completions look like?

Task type Prompt example Completion example
Classifying content Tweet: I enjoyed the trip.
Sentiment: Positive
Generating new content List ways of traveling 1. Bike
2. Car …
Holding a conversation A friendly AI assistant See examples
Transformation (translation and symbol conversion) English: Hello
French: bonjour
Summarizing content Provide a summary of the content
{text} The content shares methods of machine learning.
Picking up where you left off One way to grow tomatoes is to plant seeds.
Giving factual responses How many moons does Earth have?

How well did you know this?

Not at all

Perfectly

What factors can affect the quality of completions you’ll get from a generative AI solution?

The way a prompt is engineered.
The model parameters
The data the model is trained on, which can be adapted through model fine-tuning with customization.

You have more control over the completions returned by training a custom model than through prompt engineering and parameter adjustment.

How well did you know this?

Not at all

Perfectly

What are the available endpoints for Azure OpenAI?

Completion - model takes an input prompt, and generates one or more predicted completions.
ChatCompletion - model takes input in the form of a chat conversation (where roles are specified with the message they send), and the next chat completion is generated.
Embeddings - model takes input and returns a vector representation of that input.

How well did you know this?

Not at all

Perfectly

When prompt engineering, what parameters are most likely to make an impact on the completion?

temperature and top_p (top_probability) are the most likely to impact a model’s response as they both control randomness in the model, but in different ways.
Higher values produce more creative and random responses, but will likely be less consistent or focused. Responses expected to be fictional or unique benefit from higher values for these parameters, whereas content desired to be more consistent and concrete should use lower values.
Specifically, high temperature allows for more variation in sentence structure and high top_p allows for more variation in words that are used (using a variety of synonyms).
Try adjusting these parameters with the same prompt to see how they impact the response. It’s recommended to change either temperature or top_p at a time, but not both.

How well did you know this?

Not at all

Perfectly

What does the system message do in a generative AI prompt?

The system message is included at the beginning of the prompt and is used to prime the model with context, instructions, or other information relevant to the use case. You can use the system message to describe the assistant’s personality, define what the model should and should not answer, and define the format of model responses.

How well did you know this?

Not at all

Perfectly

You have an Azure OpenAI solution. The solution uses a specific GPT-35-Turbo model version that was current during initial deployment. Auto-update is disabled.
Sometime later, you investigate the deployed solution and discover that it uses a newer version of the model.
Why was the model version updated?

The model version reached its retirement date.

As your use of Azure OpenAI evolves, and you start to build and integrate with applications, you might want to manually control model updates so that you can first test and validate whether model performance remains consistent for a use case before performing an upgrade.
When you select a specific model version for a deployment, this version will remain selected until you either choose to manually update it, or once you reach the retirement date of the model. When the retirement date is reached, the model will upgrade to the default version automatically at the time of retirement.

How well did you know this?

Not at all

Perfectly

You are creating an application that references the Azure OpenAI REST API for a DALL-E model.
You plan to use thumbnails of the images that DALL-E generates and display them in a table on a webpage.
You need to find the image URLs in the JSON response.
Which element should you review?

Study These Flashcards

the result element

The result from the initial request does not immediately return the results of the image generation process. Instead, the response includes an operation-location header with a URL for a callback service that your application code can poll until the results of the image generation are ready. The result element includes a collection of url elements, each of which references a PNG image file generated from the prompt.

You are deploying an Azure OpenAI service.
You plan to use your own data in the models you will deploy.
You need to ensure that the model can index your data sources.
Which additional Azure service should you deploy?

Study These Flashcards

Azure AI Search

Azure OpenAI on your data enables developers to use supported AI chat models that can reference specific sources of information to ground the response. Adding this information allows the model to reference both the specific data provided and its pretrained knowledge to provide more effective responses. Azure OpenAI on your data utilizes the search ability of Azure AI Search to add the relevant data chunks to a prompt.
Azure OpenAI on your data still uses a stateless API to connect to the model, which removes the requirement of training a custom model with your data and simplifies the interaction with the AI model. Cognitive Search first finds the useful information to answer the prompt, and Azure OpenAI forms the response based on that information.

You are building a GPT-based chat application that will answer questions about your company.
You plan to use the Using your data feature in Azure OpenAI to ground the model with company data.
Which four types of files can you use to ground the model? Each correct answer presents a complete solution.

Study These Flashcards

Currently only TXT, MD, HTML, PDF, Microsoft Word, and PowerPoint files can be used and are supported using the “Using your data” feature in Azure OpenAI.

What parameters are required in the body of a call to the OpenAI service?

Study These Flashcards

The request must contain the following parameters in a JSON body:

prompt: The description of the image to be generated
n: The number of images to be generated
size: The resolution of the image to be generated (256x256, 512x512, or 1024x1024)

What are some recommended prompt engineering techniques when using RAG with Azure OpenAI on your own data?

Study These Flashcards

Breaking down the task and using chain of thought prompting can help the model respond more effectively within the token limit.

You are building a GPT-based chat application that will answer questions about your company.
You plan to use the Using your data feature in Azure OpenAI to ground the model with your company data.
While testing, you discover that some responses are not accurate enough.
You need to configure the Azure OpenAI resource to filter out less-relevant documents for responses.
Which parameter should you configure?

Study These Flashcards

Strictness parameter sets the threshold to categorize documents as relevant to your queries. Raising the Strictness parameter value means a higher threshold for relevance and filters out more less-relevant documents for responses.

Describe the coherence metric of evaluating an LLM.

Study These Flashcards

Coherence evaluates how well the language model can produce output that flows smoothly, reads naturally, and resembles human-like language.

Describe the GPTSimilarity metric of evaluating an LLM.

Study These Flashcards

GPTSimilarity is a measure that quantifies the similarity between a ground truth sentence (or document) and the prediction sentence generated by an AI model.

It’s calculated by first computing sentence-level embeddings using the embeddings API for both the ground truth and the model’s prediction. These embeddings represent high-dimensional vector representations of the sentences, capturing their semantic meaning and context.

Describe the Fluency metric of evaluating an LLM.

Study These Flashcards

Fluency evaluates the language proficiency of a generative AI’s predicted answer. It assesses how well the generated text adheres to grammatical rules, syntactic structures, and appropriate usage of vocabulary, resulting in linguistically correct and natural-sounding responses.

What are the three primary loops in GenAIOps lifecycle?

Explore: Where you define the business need, or use case, and design the architecture, including necessary prompts and models. Build: Where you develop the initial application and evaluate it iteratively to reach quality and safety targets. Operationalize: Where you deploy the application for real-world use, and deliver reliable and responsible service. Overarching all these phases is the management loop, which focuses on governance, security, and compliance. It's a framework that balances speed in deliverables with strict adherence to standards.

Implement generative AI solutions (10–15%) Flashcards

(25 cards)