GCP AI Blog Flashcards

Question

Prompt Management

Answer 1

What it is: Vertex AI Prompt Management, now in preview, provides a library of prompts for use among teams, including versioning, the option to restore old prompts, and AI-generated suggestions to improve prompt performance. Why it matters: This feature makes it easier for organizations to get the best performance from gen AI models at scale, and to iterate more quickly from experimentation to production. Customers can compare prompt iterations side by side to assess how small changes impact outputs, and the service offers features like notes and tagging to boost collaboration.

Answer 2

What it is: We now support Gen AI Evaluation in GA to help users evaluate any generative model or application. While leaderboards and reports offer insights into overall model performance, they don't reveal how a model handles your specific needs. The Gen AI Evaluation Service helps you define your own evaluation criteria, ensuring a clear understanding of how well generative AI models and applications align with your unique use case. Users can define their own evaluation metrics and also access our templates for metrics with various criteria (e.g. text quality, instruction following, fluency). Why it matters: Evaluation is important at every step of your gen AI development process including model selection, prompt engineering, and model customization. Evaluating gen AI is integrated within Vertex AI to help you launch and reuse evaluations as needed.

Answer 3

Vertex AI Agent Builder allows you to easily and quickly build and customize AI Agents - for any skill level. A core component of the Vertex AI Agent Builder is Vertex AI Search, enabling you to ground the models in your data or the web.

Answer 4

What it is: High-fidelity mode is powered by a version of Gemini 1.5 Flash that’s been fine-tuned to only use customer-provided content to generate answers and ensures high levels of factuality in response. Why it matters: In data-intensive industries like financial services, healthcare, and insurance, generative AI use cases often require the generated response to be sourced from only the provided context, not the model’s world knowledge. Grounding with high-fidelity, announced in experimental preview, is purpose-built to support such grounding use cases, including summarization across multiple documents, data extraction against a set corpus of financial data, or processing across a predefined set of documents.

Answer 5

What it is: Vector Search, the ultra high performance vector database powering Vertex AI Search, DIY RAG, and other embedding use cases at global scale, now offers hybrid search in Public Preview. Why it matters: Embeddings are numerical representations that capture semantic relationships across complex data (text, images, etc.). Embeddings power multiple use cases, including recommendation systems, ad serving, and semantic search for RAG. Hybrid search combines vector-based and keyword-based search techniques to ensure the most relevant and accurate responses for users.

Answer 6

What it is: An agent development SDK and container runtime for LangChain. With LangChain on Vertex AI you can select the model you want to work with, define tools to access external APIs, structure the interface between the user and the system components in an orchestration framework, and deploy the framework to a managed runtime. Why it matters: LangChain on Vertex AI simplifies and speeds up deployment while being secure, private and scalable.

Answer 7

What it is: Vertex AI extensions are pre-built reusable modules to connect a foundation model to a specific API or tool. For example, our new code interpreter extension enables models to execute tasks that entail running Python code, such as data analysis, data visualization, and mathematical operations. Vertex AI function calling enables a user to describe a set of functions or APIs and have Gemini models intelligently select, for a given query, the right API or function to call, along with the appropriate API parameters. Vertex AI data connectors help ingest data from enterprise and third-party applications like ServiceNow, Hadoop, and Salesforce, connecting generative applications to commonly-used enterprise systems.

Answer 8

What it is: Genkit is an open-source TypeScript/JavaScript and Go framework designed by Firebase to simplify the development, deployment, and monitoring of production-ready AI applications. Why it matters: With the Vertex AI plugin for Genkit, developers can now take advantage of Google models like Gemini and Imagen 2, as well as text embeddings. Additionally Vertex Eval Service is baked into the Genkit local development experience along with OpenTelemetry tracing.

Answer 9

What it is: LlamaIndex on Vertex AI simplifies building your own search engine for retrieval-augmented generation (RAG), from data ingestion and transformation to embedding, indexing, retrieval, and generation. Why it matters: Vertex AI customers can leverage Google’s models and AI-optimized infrastructure alongside LlamaIndex’s simple, flexible, open-source data framework, to connect custom data sources to generative models.

Answer 10

The revolutionary nature of generative AI requires a platform that offers privacy, security, control, and compliance capabilities organizations can rely on. Google Cloud is committed to helping our customers leverage the full potential of generative AI with privacy, security, and compliance capabilities. Our goal is to build trust by protecting systems, enabling transparency, and offering flexible, always-available infrastructure, all while grounding efforts in our AI principles.

Answer 11

What it is: With Dynamic Shared Quota, we offer increasing the quota limits for a model (online serving) to the maximum allowed per region. This way we limit the number of queries per second (QPS) that customers can run by the shared capacity of all the queries running on a Servo station (multi-region), instead of limiting a customer’s QPS by a quota. Dynamic Shared Quota is only applicable to Pay-as-you-go Online Serving. For customers that require a consistent or more predictable service level, including SLAs, we offer Provision Throughput. Why it matters: By dynamically distributing on-demand capacity among all queries being processed for Pay-as-you-go customers, Google Cloud has eliminated the need to submit quota increase requests (QIRs). Customers can still set a self-imposed quota called a consumer quota override to control cost and prevent budget overruns.

Answer 12

What it is: Provisioned throughput lets customers responsibly scale their usage of Google’s first-party models, like 1.5 Flash, providing assurances for both capacity and price. Why it matters: This Vertex AI feature brings predictability and reliability to customer production workloads, giving them the assurance required to scale gen AI workloads aggressively. We have also made it easier than ever for customers to set up PT via a Self Service flow. Customers can now estimate their needs and purchase Provisioned Throughput for Google’s 1P foundation models via the console, bringing the E2E experience down from weeks to minutes for pre-approved orders subject to available capacity and removing the need for manual order forms.

Answer 13

What it is: We have data residency for data stored at-rest guarantees in 23 countries (13 of which were added in 2024), with additional guarantees around limiting related ML processing to the US and EU. We are also working on expanding our ML processing commitments to eight more countries, starting with four countries in 2024. Why it matters: Customers, especially those from regulated industries, demand control over where their data is stored and processed when using generative AI capabilities.