GCP AI Blog Flashcards
Updates to Gemini Models
What it is: The newly updated versions of Gemini 1.5 Pro and Flash models, both GA, deliver quality improvements in math, long context understanding, and vision.
Why it matters: Our objective is to bring you the best models suited for enterprise use cases, by pushing the boundaries across performance, latency, and costs. From a latency standpoint, the new version of Gemini 1.5 Flash is nearly 2.5x faster than GPT-4o mini.
Get started: Access Gemini 1.5 Pro and Flash in the Google Cloud console.
Reduced Gemini 1.5 Pro Pricing
What it is: We are reducing costs of Gemini 1.5 Pro by 50% across both input and output tokens, effective on Vertex AI on October 7, 2024.
Why it matters: We are committed to making AI accessible for every enterprise. In August, we improved Gemini 1.5 Flash to reduce costs by up to 80% (see below). These world class models can be coupled with capabilities like context caching to even further reduce the cost and latency of your long context queries. Using Batch API instead of standard requests can further optimize costs for latency insensitive tasks.
Get started: Visit the pricing page to learn more.
Updates to Imagen 3
What it is: Google’s latest image generation model, delivering outstanding image quality, multi-language support, built-in safety features like Google DeepMind’s SynthID digital watermarking, and support for multiple aspect ratios.
Why it matters: There are several improvements over Imagen 2 — including over 40% faster generation for rapid prototyping and iteration; better prompt understanding and instruction-following; photo-realistic generations, including of groups of people; and greater control over text rendering within an image.
Controlled generation is now GA
What it is: Controlled generation lets customers define Gemini model outputs according to specific formats or schemas.
Why it matters: Most models cannot guarantee the format and syntax of their outputs, even with specified instructions. Vertex AI controlled generation lets customers choose the desired output format via pre-built options like JSON and ENUM, or by defining custom formats.
Batch API
What it is: Batch API (currently in preview) is a super-efficient way to send large numbers of non-latency sensitive text prompt requests, supporting use cases such as classification and sentiment analysis, data extraction, and description generation.
Why it matters: It helps speed up developer workflows and reduces costs by enabling multiple prompts to be sent to models in a single request.
Supervised fine tuning (SFT) for Gemini 1.5 Flash and Pro
What it is: SFT for Gemini 1.5 Flash and Pro is now generally available. SFT adapts model behavior with a labeled dataset, adjusting the model’s weights to minimize the difference between its predictions and the actual labels.
Why it matters: SFT allows you to tune the model to be more precise for your enterprise task. It’s particularly effective for domain-specific applications where the language or content significantly differs from the data the large model was originally trained on.
Distillation techniques in Vertex AI
What it is: Train smaller, specialized models that inherit the knowledge of the larger Gemini model, achieving comparable performance with the flexibility of self-hosting your custom model on Vertex AI.
Why it matters: Deploying large language models can be a resource-intensive challenge. With distillation techniques in Vertex AI, you can leverage the power of those large models while keeping your deployments lean and efficient.
Prompt Optimizer, now in preview
What it is: Based on Google Research’s publication on automatic prompt optimization (APO) methods, Prompt Optimizer adapts your prompts using the optimal instructions and examples to elicit the best performance from your chosen model.
Why it matters: Vertex AI’s Prompt Optimizer helps you avoid the tedious trial-and-error of prompt engineering. Plus, our prompt strategies guide helps you make the models more verbose and conversational.
Prompt Management SDK
What it is: Vertex AI’s Prompt Management SDK allows users to retrieve and organize prompts. It lets you version prompts, restore old prompts, and generate suggestions to improve performance.
Why it matters: This makes it easier for you to get the best performance from gen AI models at scale, and to iterate more quickly from experimentation to production.
Multimodal function calling
What it is: Function calling is a built-in feature of the Gemini API that translates natural language into structured data and back.
Why it matters: Now, with multimodal function calling, your agents can also execute functions where your user can provide images, along with text, to help the model pick the right function and function parameters to call.
Org policy for models in Model Garden
What it is: You can now govern the models available in Model Garden. This includes limiting access to gen AI features, to only specific vetted models, or to tuning and other advanced capabilities. Org policies can be applied to all models in Model Garden as well as those imported from Hugging Face through Model Garden. These policies can be set on an organization, folder or project resource to enforce the constraint on that resource and any child resources.
Why it matters: The ability to control access to the models made available through the Model Garden has been a top priority for many customers looking for governance and control tooling. Org Policies allows granular access control of models and enhances Google Cloud’s enterprise readiness in an age of many models from many providers.
Best models from Google and the industry
We’re committed to providing the best model for enterprises to use - Vertex AI Model Garden provides access to 150+ models from Google, Partners and the open community so customers can select the model for the right price, performance, and latency considerations.
No matter what foundation model you use, it comes with enterprise ready tooling and integration to our end to end platform.
Gemini 1.5 Flash is GA
What it is: Gemini 1.5 Flash combines low latency, highly competitive pricing, and our 1 million-token context window.
Why it matters: Gemini 1.5 Flash is an excellent option for a wide variety of use cases at scale, from retail chat agents, to document processing, to research agents that can synthesize entire repositories.
Gemini 1.5 Pro, GA with 2-million -token context capabilities
What it is: Now available with an industry-leading context window of up to 2 million tokens, Gemini 1.5 Pro is equipped to unlock unique multimodal use cases that no other model can handle.
Why it matters: Processing just six minutes of video requires over 100,000 tokens and large code bases can exceed 1 million tokens — so whether the use case involves finding bugs across countless lines of code, locating the right information across libraries of research, or analyzing hours of audio or video, Gemini 1.5 Pro’s expanded context window is helping organizations break new ground.
More Languages for Gemini
What it is: We’re enabling Gemini 1.5 Flash and Gemini 1.5 Pro to understand and respond in 100+ languages.
Why it matters: We’re making it easier for our global community to prompt and receive responses in their native languages.