GCP AI Blog Flashcards

1
Q

Updates to Gemini Models

A

What it is: The newly updated versions of Gemini 1.5 Pro and Flash models, both GA, deliver quality improvements in math, long context understanding, and vision.

Why it matters: Our objective is to bring you the best models suited for enterprise use cases, by pushing the boundaries across performance, latency, and costs. From a latency standpoint, the new version of Gemini 1.5 Flash is nearly 2.5x faster than GPT-4o mini.

Get started: Access Gemini 1.5 Pro and Flash in the Google Cloud console.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Reduced Gemini 1.5 Pro Pricing

A

What it is: We are reducing costs of Gemini 1.5 Pro by 50% across both input and output tokens, effective on Vertex AI on October 7, 2024.

Why it matters: We are committed to making AI accessible for every enterprise. In August, we improved Gemini 1.5 Flash to reduce costs by up to 80% (see below). These world class models can be coupled with capabilities like context caching to even further reduce the cost and latency of your long context queries. Using Batch API instead of standard requests can further optimize costs for latency insensitive tasks.

Get started: Visit the pricing page to learn more.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Updates to Imagen 3

A

What it is: Google’s latest image generation model, delivering outstanding image quality, multi-language support, built-in safety features like Google DeepMind’s SynthID digital watermarking, and support for multiple aspect ratios.

Why it matters: There are several improvements over Imagen 2 — including over 40% faster generation for rapid prototyping and iteration; better prompt understanding and instruction-following; photo-realistic generations, including of groups of people; and greater control over text rendering within an image.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Controlled generation is now GA

A

What it is: Controlled generation lets customers define Gemini model outputs according to specific formats or schemas.

Why it matters: Most models cannot guarantee the format and syntax of their outputs, even with specified instructions. Vertex AI controlled generation lets customers choose the desired output format via pre-built options like JSON and ENUM, or by defining custom formats.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Batch API

A

What it is: Batch API (currently in preview) is a super-efficient way to send large numbers of non-latency sensitive text prompt requests, supporting use cases such as classification and sentiment analysis, data extraction, and description generation.

Why it matters: It helps speed up developer workflows and reduces costs by enabling multiple prompts to be sent to models in a single request.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Supervised fine tuning (SFT) for Gemini 1.5 Flash and Pro

A

What it is: SFT for Gemini 1.5 Flash and Pro is now generally available. SFT adapts model behavior with a labeled dataset, adjusting the model’s weights to minimize the difference between its predictions and the actual labels.

Why it matters: SFT allows you to tune the model to be more precise for your enterprise task. It’s particularly effective for domain-specific applications where the language or content significantly differs from the data the large model was originally trained on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Distillation techniques in Vertex AI

A

What it is: Train smaller, specialized models that inherit the knowledge of the larger Gemini model, achieving comparable performance with the flexibility of self-hosting your custom model on Vertex AI.

Why it matters: Deploying large language models can be a resource-intensive challenge. With distillation techniques in Vertex AI, you can leverage the power of those large models while keeping your deployments lean and efficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Prompt Optimizer, now in preview

A

What it is: Based on Google Research’s publication on automatic prompt optimization (APO) methods, Prompt Optimizer adapts your prompts using the optimal instructions and examples to elicit the best performance from your chosen model.

Why it matters: Vertex AI’s Prompt Optimizer helps you avoid the tedious trial-and-error of prompt engineering. Plus, our prompt strategies guide helps you make the models more verbose and conversational.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Prompt Management SDK

A

What it is: Vertex AI’s Prompt Management SDK allows users to retrieve and organize prompts. It lets you version prompts, restore old prompts, and generate suggestions to improve performance.

Why it matters: This makes it easier for you to get the best performance from gen AI models at scale, and to iterate more quickly from experimentation to production.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Multimodal function calling

A

What it is: Function calling is a built-in feature of the Gemini API that translates natural language into structured data and back.

Why it matters: Now, with multimodal function calling, your agents can also execute functions where your user can provide images, along with text, to help the model pick the right function and function parameters to call.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Org policy for models in Model Garden

A

What it is: You can now govern the models available in Model Garden. This includes limiting access to gen AI features, to only specific vetted models, or to tuning and other advanced capabilities. Org policies can be applied to all models in Model Garden as well as those imported from Hugging Face through Model Garden. These policies can be set on an organization, folder or project resource to enforce the constraint on that resource and any child resources.

Why it matters: The ability to control access to the models made available through the Model Garden has been a top priority for many customers looking for governance and control tooling. Org Policies allows granular access control of models and enhances Google Cloud’s enterprise readiness in an age of many models from many providers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Best models from Google and the industry

A

We’re committed to providing the best model for enterprises to use - Vertex AI Model Garden provides access to 150+ models from Google, Partners and the open community so customers can select the model for the right price, performance, and latency considerations.

No matter what foundation model you use, it comes with enterprise ready tooling and integration to our end to end platform.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Gemini 1.5 Flash is GA

A

What it is: Gemini 1.5 Flash combines low latency, highly competitive pricing, and our 1 million-token context window.

Why it matters: Gemini 1.5 Flash is an excellent option for a wide variety of use cases at scale, from retail chat agents, to document processing, to research agents that can synthesize entire repositories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Gemini 1.5 Pro, GA with 2-million -token context capabilities

A

What it is: Now available with an industry-leading context window of up to 2 million tokens, Gemini 1.5 Pro is equipped to unlock unique multimodal use cases that no other model can handle.

Why it matters: Processing just six minutes of video requires over 100,000 tokens and large code bases can exceed 1 million tokens — so whether the use case involves finding bugs across countless lines of code, locating the right information across libraries of research, or analyzing hours of audio or video, Gemini 1.5 Pro’s expanded context window is helping organizations break new ground.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

More Languages for Gemini

A

What it is: We’re enabling Gemini 1.5 Flash and Gemini 1.5 Pro to understand and respond in 100+ languages.

Why it matters: We’re making it easier for our global community to prompt and receive responses in their native languages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Gemma 2

A

What it is: The next generation in Google’s family of open models built to give developers and researchers the ability to share and commercialize their innovations, using the same technologies used to create Gemini.

Why it matters: Available in both 9-billion (9B) and 27-billion (27B) parameter sizes, Gemma 2 is much more powerful and efficient than the first generation, with significant safety advancements built in.

17
Q

Meta’s Llama 3.1

A

What it is: Llama 3.1 models are now available on Vertex AI as a pay as you go API, this includes 405B, 70B and 8B (coming in early September).

Why it matters: 405B is the largest openly available foundation model to date. 8B and 70B are also new versions that excel at understanding language nuances, grasping context, and performing complex tasks such as translation and dialogue generation. You can access the new models in just a few clicks using Model-as-a-Service, without any setup or infrastructure hassles.

18
Q

Mistral AI’s latest models

A

What it is: We added Mistral Large 2, Nemo and Codestral (Google Cloud is the first hyperscaler to introduce Codestral).

Why it matters: Mistral Large 2 is their flagship model which offers their best performance and versatility to date and Mistral Nemo is a 12B model that delivers exceptional performance at a fraction of the cost. Codestral is Mistral AI’s first open-weight generative AI model explicitly designed for code generation tasks. You can access the new models in just a few clicks using Model-as-a-Service, without any setup or infrastructure hassles.

19
Q

Jamba 1.5 Model Family from AI21 Labs

A

What it is: Jamba 1.5 Model Family — AI21 Labs’ new family of open models — is in public preview on Vertex AI Model Garden, including:

Jamba 1.5 Mini: AI21’s most efficient and lightweight model, engineered for speed and efficiency in tasks including customer support, document summarization, and text generation.

Jamba 1.5 Large: AI21’s most advanced and largest model that can handle advanced reasoning tasks — such as financial analysis — with exceptional speed and efficiency.

Why it matters: AI21’s new models join over 150 models already available on Vertex AI Model Garden, further expanding your choice and flexibility to choose the best models for your needs and budget, and to keep pace with the continued rapid pace of innovation.

20
Q

Anthropic’s Claude 3.5 Sonnet

A

What it is: We recently added Anthropic’s newly released model, Claude 3.5 Sonnet, to Vertex AI. This expands the set of Anthropic models we offer, including Claude 3 Opus, Claude 3 Sonnet, Claude 3 Haiku. You can access the new models in just a few clicks using Model-as-a-Service, without any setup or infrastructure hassles.

Why it matters: We’re committed to empowering customer choice and innovation through our curated collection of first-party, open, and third-party models available on Vertex AI.

21
Q

End-to-end model building platform with choice at every level

A

Vertex AI Model Builder enables you to build or customize your own models, with all the capabilities you need to move from prototype to production.

22
Q

Lower cost with context caching for both Gemini 1.5 Pro and Flash

A

What it is: Context caching is a technique that involves storing previous parts of a conversation or interaction (the “context”) in memory so that the model can refer back to it when generating new responses

Why it matters: As context length increases, it can be expensive and slow to get responses for long-context applications, making it difficult to deploy to production. Vertex AI context caching helps customers significantly reduce input costs, by 75 percent, leveraging cached data of frequently-used context. Today, Google is the only provider to offer a context caching API.

23
Q

New model monitoring capabilities

A

What it is: The new Vertex AI Model Monitoring includes

Support for models hosted outside of Vertex AI (e.g. GKE, Cloud Run, even multi-cloud & hybrid-cloud)

Unified monitoring job management for both online and batch prediction

Simplified configuration and metrics visualization attached to the model, not the endpoint

Why it matters: Vertex AI’s new model monitoring features provide a more flexible, extensible, and consistent monitoring solution for models deployed on any serving infrastructure (even outside of Vertex AI, e.g. Google Kubernetes Engine, Cloud Run, Google Compute Engine and more).

24
Q

Ray on Vertex AI is GA

A

What it is: Ray provides a comprehensive and easy-to-use Python distributed framework. With Ray, you configure a scalable cluster of computational resources and utilize a collection of domain-specific libraries to efficiently distribute common AI/ML tasks like training, serving, and tuning.

Why it matters: This integration empowers AI developers to effortlessly scale their AI workloads on Vertex AI’s versatile infrastructure, which unlocks the full potential of machine learning, data processing, and distributed computing.

25
Q

Prompt Management

A

What it is: Vertex AI Prompt Management, now in preview, provides a library of prompts for use among teams, including versioning, the option to restore old prompts, and AI-generated suggestions to improve prompt performance.

Why it matters: This feature makes it easier for organizations to get the best performance from gen AI models at scale, and to iterate more quickly from experimentation to production. Customers can compare prompt iterations side by side to assess how small changes impact outputs, and the service offers features like notes and tagging to boost collaboration.

26
Q

Gen AI Evaluation Services

A

What it is: We now support Gen AI Evaluation in GA to help users evaluate any generative model or application. While leaderboards and reports offer insights into overall model performance, they don’t reveal how a model handles your specific needs. The Gen AI Evaluation Service helps you define your own evaluation criteria, ensuring a clear understanding of how well generative AI models and applications align with your unique use case. Users can define their own evaluation metrics and also access our templates for metrics with various criteria (e.g. text quality, instruction following, fluency).

Why it matters: Evaluation is important at every step of your gen AI development process including model selection, prompt engineering, and model customization. Evaluating gen AI is integrated within Vertex AI to help you launch and reuse evaluations as needed.

27
Q

Develop and deploy agents faster, grounded in your enterprise truth

A

Vertex AI Agent Builder allows you to easily and quickly build and customize AI Agents - for any skill level. A core component of the Vertex AI Agent Builder is Vertex AI Search, enabling you to ground the models in your data or the web.

28
Q

Grounding with high-fidelity mode

A

What it is: High-fidelity mode is powered by a version of Gemini 1.5 Flash that’s been fine-tuned to only use customer-provided content to generate answers and ensures high levels of factuality in response.

Why it matters: In data-intensive industries like financial services, healthcare, and insurance, generative AI use cases often require the generated response to be sourced from only the provided context, not the model’s world knowledge. Grounding with high-fidelity, announced in experimental preview, is purpose-built to support such grounding use cases, including summarization across multiple documents, data extraction against a set corpus of financial data, or processing across a predefined set of documents.

29
Q

Expanding Vector Search to support hybrid search

A

What it is: Vector Search, the ultra high performance vector database powering Vertex AI Search, DIY RAG, and other embedding use cases at global scale, now offers hybrid search in Public Preview.

Why it matters: Embeddings are numerical representations that capture semantic relationships across complex data (text, images, etc.). Embeddings power multiple use cases, including recommendation systems, ad serving, and semantic search for RAG. Hybrid search combines vector-based and keyword-based search techniques to ensure the most relevant and accurate responses for users.

30
Q

LangChain on Vertex

A

What it is: An agent development SDK and container runtime for LangChain. With LangChain on Vertex AI you can select the model you want to work with, define tools to access external APIs, structure the interface between the user and the system components in an orchestration framework, and deploy the framework to a managed runtime.

Why it matters: LangChain on Vertex AI simplifies and speeds up deployment while being secure, private and scalable.

31
Q

Vertex AI extensions, function calling and ​​data connectors

A

What it is:

Vertex AI extensions are pre-built reusable modules to connect a foundation model to a specific API or tool. For example, our new code interpreter extension enables models to execute tasks that entail running Python code, such as data analysis, data visualization, and mathematical operations.

Vertex AI function calling enables a user to describe a set of functions or APIs and have Gemini models intelligently select, for a given query, the right API or function to call, along with the appropriate API parameters.

Vertex AI data connectors help ingest data from enterprise and third-party applications like ServiceNow, Hadoop, and Salesforce, connecting generative applications to commonly-used enterprise systems.

32
Q

Firebase Genkit

A

What it is: Genkit is an open-source TypeScript/JavaScript and Go framework designed by Firebase to simplify the development, deployment, and monitoring of production-ready AI applications.

Why it matters: With the Vertex AI plugin for Genkit, developers can now take advantage of Google models like Gemini and Imagen 2, as well as text embeddings. Additionally Vertex Eval Service is baked into the Genkit local development experience along with OpenTelemetry tracing.

33
Q

LlamaIndex on Vertex AI

A

What it is: LlamaIndex on Vertex AI simplifies building your own search engine for retrieval-augmented generation (RAG), from data ingestion and transformation to embedding, indexing, retrieval, and generation.

Why it matters: Vertex AI customers can leverage Google’s models and AI-optimized infrastructure alongside LlamaIndex’s simple, flexible, open-source data framework, to connect custom data sources to generative models.

34
Q

Built on a foundation of scale & enterprise readiness

A

The revolutionary nature of generative AI requires a platform that offers privacy, security, control, and compliance capabilities organizations can rely on. Google Cloud is committed to helping our customers leverage the full potential of generative AI with privacy, security, and compliance capabilities. Our goal is to build trust by protecting systems, enabling transparency, and offering flexible, always-available infrastructure, all while grounding efforts in our AI principles.

35
Q

Dynamic Shared Quota

A

What it is: With Dynamic Shared Quota, we offer increasing the quota limits for a model (online serving) to the maximum allowed per region. This way we limit the number of queries per second (QPS) that customers can run by the shared capacity of all the queries running on a Servo station (multi-region), instead of limiting a customer’s QPS by a quota. Dynamic Shared Quota is only applicable to Pay-as-you-go Online Serving. For customers that require a consistent or more predictable service level, including SLAs, we offer Provision Throughput.

Why it matters: By dynamically distributing on-demand capacity among all queries being processed for Pay-as-you-go customers, Google Cloud has eliminated the need to submit quota increase requests (QIRs). Customers can still set a self-imposed quota called a consumer quota override to control cost and prevent budget overruns.

36
Q

Provisioned Throughput is GA

A

What it is: Provisioned throughput lets customers responsibly scale their usage of Google’s first-party models, like 1.5 Flash, providing assurances for both capacity and price.

Why it matters: This Vertex AI feature brings predictability and reliability to customer production workloads, giving them the assurance required to scale gen AI workloads aggressively. We have also made it easier than ever for customers to set up PT via a Self Service flow. Customers can now estimate their needs and purchase Provisioned Throughput for Google’s 1P foundation models via the console, bringing the E2E experience down from weeks to minutes for pre-approved orders subject to available capacity and removing the need for manual order forms.

37
Q

Data residency for data stored at-rest guarantees in more countries

A

What it is: We have data residency for data stored at-rest guarantees in 23 countries (13 of which were added in 2024), with additional guarantees around limiting related ML processing to the US and EU. We are also working on expanding our ML processing commitments to eight more countries, starting with four countries in 2024.

Why it matters: Customers, especially those from regulated industries, demand control over where their data is stored and processed when using generative AI capabilities.

38
Q
A
39
Q
A