未知数据源 2024年09月15日
Your ultimate guide to the latest in generative AI on Vertex AI
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Google Vertex AI 在 2024 年持续创新,发布了一系列新功能和更新,包括降低 Gemini 1.5 Flash 的价格,扩展 Gemini 的语言支持,以及在 Vertex AI 上提供 Llama 3.1 和 Mistral AI 的最新模型。这些更新旨在让用户更轻松地使用这些强大的模型,并将其应用于各种实际场景中。此外,Vertex AI 还推出了新的模型监控功能,以及用于构建和定制模型的工具,进一步增强了平台的功能和灵活性。

💻 **降低 Gemini 1.5 Flash 价格**:Google 降低了 Gemini 1.5 Flash 的输入和输出成本,使其成为更具性价比的模型选择。Gemini 1.5 Flash 拥有强大的功能,包括 100 万个 token 的上下文窗口和多模态输入,使其适用于各种场景。

📲 **扩展 Gemini 的语言支持**:Gemini 1.5 Flash 和 Gemini 1.5 Pro 现在支持 100 多种语言,让全球用户可以更便捷地使用这些模型。

🔍 **在 Vertex AI 上提供 Llama 3.1 和 Mistral AI 的最新模型**:Meta 的 Llama 3.1 模型和 Mistral AI 的最新模型,包括 Mistral Large 2、Nemo 和 Codestral,现已在 Vertex AI 上作为即付即用 API 提供。

📈 **新的模型监控功能**:Vertex AI 的新模型监控功能提供更加灵活和一致的解决方案,支持在任何服务基础设施上部署的模型,包括 Google Kubernetes Engine、Cloud Run、Google Compute Engine 以及其他平台。

🏳 **构建和定制模型的工具**:Vertex AI Model Builder 允许用户构建或定制自己的模型,并提供从原型到生产所需的所有功能。

📷 **Prompt Management**:Vertex AI Prompt Management 提供了一个用于管理提示的工具库,包括版本控制、恢复旧提示以及 AI 生成的提示建议,帮助用户优化模型性能。

📡 **Evaluation Services**:Vertex AI 提供了 Rapid Evaluation 功能,帮助用户评估模型性能,包括各种维度(例如相似性、指令遵循、流畅性)的指标以及针对特定任务的指标包(例如文本生成质量)。

🎉 **RayonVertexAI**:Ray 是一种 Python 分布式框架,使用户可以轻松地配置可扩展的计算资源集群,并使用各种特定领域库来有效地分发常见的 AI/ML 任务,例如训练、服务和调整。

📄 **Batch API**:Vertex AI Batch API 提供了一种高效的方式来发送大量非延迟敏感的文本提示请求,支持各种用例,例如分类和情感分析、数据提取和描述生成。

🔁 **Controlled generation**:Vertex AI Controlled generation 允许用户根据特定格式或模式定义 Gemini 模型的输出。

📢 **Context caching**:Vertex AI Context caching 帮助用户通过利用缓存数据显著降低输入成本,从而降低长上下文应用的成本和延迟。

🔊 **Vertex AI Model Garden**:Vertex AI Model Garden 提供了 150 多个来自 Google、合作伙伴和开源社区的模型,让用户可以根据价格、性能和延迟考虑因素选择合适的模型。

📖 **Gemini 1.5 Pro 和 Gemini 1.5 Flash**:Gemini 1.5 Pro 拥有 200 万个 token 的上下文窗口,而 Gemini 1.5 Flash 提供了低延迟和具有竞争力的定价。这两个模型都非常适合大规模应用,例如零售聊天机器人、文档处理和研究代理。

📰 **Imagen 3**:Google 的最新图像生成模型 Imagen 3 提供了出色的图像质量、多语言支持、内置安全功能以及对多种纵横比的支持。

💡 **Gemma 2**:Google 的开放模型系列的下一代产品,提供 90 亿参数和 270 亿参数的模型,比上一代更强大、更高效,并具有显著的安全改进。

📜 **Anthropic’s Claude 3.5 Sonnet**:Anthropic 的最新模型 Claude 3.5 Sonnet 现已在 Vertex AI 上提供,与 Claude 3 Opus 和 Claude 3 Haiku 一起,为用户提供了更多选择。

📱 **其他功能**:Vertex AI 还提供了其他功能,例如用于构建和定制模型的工具、模型监控功能、提示管理工具以及评估服务。

The world of generative AI is evolving at a pace that's nothing short of mind-blowing. It feels like just yesterday we were marveling at AI-generated images, and now we're having full-fledged conversations with AI chatbots that can write code, craft poetry, and even serve our customers (check out our list of 101 real-world gen AI use cases from the world's leading organizations).

The pace of innovation can be hard to keep up with — in 2023 we introduced over 500 new features in Vertex AI and we’re not slowing down this year. We put this blog together to help you keep track of the biggest announcements and make sense of what they mean for your business. We’ll keep it updated as new announcements come out, so bookmark this link and check out our new video series below also covering new announcements. 

Catch up on the latest announcements

We recently announced several updates to make Gemini, Meta and Mistral models more accessible, followed shortly by announcing the Jamba 1.5 Model Family from AI21 Labs.

Lower pricing for Gemini 1.5 Flash

  • What it is: We've updated Gemini 1.5 Flash to reduce the input costs by up to ~85% and output costs by up to ~80%, starting August 12th, 2024. 

  • Why it matters: This is a big price drop on Gemini Flash, a world-class model with a 1 million context window and multi-modal inputs. Plus, coupled with capabilities like context caching you can significantly reduce the cost and latency of your long context queries. Using Batch API instead of standard requests can further optimize costs for latency insensitive tasks. 

  • Get started: View pricing to learn more and try out Gemini 1.5 Flash today.

More Languages for Gemini

  • What it is: We're enabling Gemini 1.5 Flash and Gemini 1.5 Pro to understand and respond in 100+ languages. 

  • Why it matters: We’re making it easier for our global community to prompt and receive responses in their native languages.

  • Get started: View documentation to learn more.

Meta’s Llama 3.1 

  • What it is: Llama 3.1 models are now available on Vertex AI as a pay as you go API, this includes 405B, 70B and 8B (coming in early September). 

  • Why it matters: 405B is the largest openly available foundation model to date. 8B and 70B are also new versions that excel at understanding language nuances, grasping context, and performing complex tasks such as translation and dialogue generation. You can access the new models in just a few clicks using Model-as-a-Service, without any setup or infrastructure hassles. 

  • Get started: To access Llama 3.1, visit Model Garden

Mistral AI’s latest models

  • What it is: We added Mistral Large 2, Nemo and Codestral (Google Cloud is the first hyperscaler to introduce Codestral). 

  • Why it matters: Mistral Large 2 is their flagship model which offers their best performance and versatility to date and Mistral Nemo is a 12B model that delivers exceptional performance at a fraction of the cost. Codestral is Mistral AI’s first open-weight generative AI model explicitly designed for code generation tasks. You can access the new models in just a few clicks using Model-as-a-Service, without any setup or infrastructure hassles. 

  • Get started: To access the Mistral AI models, visit Model Garden (Codestral, Large 2, Nemo) or check out the documentation

Jamba 1.5 Model Family from AI21 Labs

  • What it is: Jamba 1.5 Model Family  — AI21 Labs’ new family of open models — is in public preview on Vertex AI Model Garden, including:  

    • Jamba 1.5 Mini: AI21’s most efficient and lightweight model, engineered for speed and efficiency in tasks including customer support, document summarization, and text generation.

    • Jamba 1.5 Large: AI21’s most advanced and largest model that can handle advanced reasoning tasks — such as financial analysis — with exceptional speed and efficiency. 

  • Why it matters: AI21’s new models join over 150 models already available on Vertex AI Model Garden, further expanding your choice and flexibility to choose the best models for your needs and budget, and to keep pace with the continued rapid pace of innovation. 

  • Get started: Select the Jamba 1.5 Mini or Jamba 1.5 Large model tile in Vertex AI Model Garden. 

Previous Announcements

Best models from Google and the industry

We’re committed to providing the best model for enterprises to use - Vertex AI Model Garden provides access to 150+ models from Google, Partners and the open community so customers can select the model for the right price, performance, and latency considerations.

No matter what foundation model you use, it comes with enterprise ready tooling and integration to our end to end platform. 

Gemini 1.5 Flash is GA

  • What it is: Gemini 1.5 Flash combines low latency, highly competitive pricing, and our 1 million-token context window.

  • Why it matters: Gemini 1.5 Flash is an excellent option for a wide variety of use cases at scale, from retail chat agents, to document processing, to research agents that can synthesize entire repositories.

  • Get started: Click here to get started now with Gemini 1.5 Flash on Vertex AI. 

Gemini 1.5 Pro, GA with 2-million -token context capabilities 

  • What it is: Now available with an industry-leading context window of up to 2 million tokens, Gemini 1.5 Pro is equipped to unlock unique multimodal use cases that no other model can handle.

  • Why it matters: Processing just six minutes of video requires over 100,000 tokens and large code bases can exceed 1 million tokens — so whether the use case involves finding bugs across countless lines of code, locating the right information across libraries of research, or analyzing hours of audio or video, Gemini 1.5 Pro’s expanded context window is helping organizations break new ground. 

  • Get started: Click here to get started now 

Imagen 3 is GA

  • What it is: Google’s latest image generation model, delivering outstanding image quality, multi-language support, built-in safety features like Google DeepMind’s SynthID digital watermarking, and support for multiple aspect ratios.

  • Why it matters: There are several improvements over Imagen 2 — including over 40% faster generation for rapid prototyping and iteration; better prompt understanding and instruction-following; photo-realistic generations, including of groups of people; and greater control over text rendering within an image. 

  • Get started: Apply for access to Imagen 3 on Vertex AI. 

Gemma 2

  • What it is: The next generation in Google’s family of open models built to give developers and researchers the ability to share and commercialize their innovations, using the same technologies used to create Gemini. 

  • Why it matters: Available in both 9-billion (9B) and 27-billion (27B) parameter sizes, Gemma 2 is much more powerful and efficient than the first generation, with significant safety advancements built in. 

  • Get started: Access Gemma 2 on Vertex AI here.

Anthropic’s Claude 3.5 Sonnet

  • What it is: We recently added Anthropic’s newly released model, Claude 3.5 Sonnet, to Vertex AI. This expands the set of Anthropic models we offer, including Claude 3 Opus, Claude 3 Sonnet, Claude 3 Haiku. You can access the new models in just a few clicks using Model-as-a-Service, without any setup or infrastructure hassles. 

  • Why it matters: We're committed to empowering customer choice and innovation through our curated collection of first-party, open, and third-party models available on Vertex AI. 

  • Get started: Begin experimenting with or deploying in production Claude 3.5 Sonnet on Vertex AI.

End-to-end model building platform with choice at every level

Vertex AI Model Builder enables you to build or customize your own models, with all the capabilities you need to move from prototype to production.

Lower cost with context caching for both Gemini 1.5 Pro and Flash

  • What it is: Context caching is a technique that involves storing previous parts of a conversation or interaction (the "context") in memory so that the model can refer back to it when generating new responses

  • Why it matters: As context length increases, it can be expensive and slow to get responses for long-context applications, making it difficult to deploy to production. Vertex AI context caching helps customers significantly reduce input costs, by 75 percent, leveraging cached data of frequently-used context. Today, Google is the only provider to offer a context caching API. 

  • Get started: Learn more in documentation.

Controlled generation

  • What it is: Controlled generation lets customers define Gemini model outputs according to specific formats or schemas. 

  • Why it matters: Most models cannot guarantee the format and syntax of their outputs, even with specified instructions. Vertex AI controlled generation lets customers choose the desired output format via pre-built options like YAML and XML, or by defining custom formats. 

  • Get started: Visit documentation to learn more.  

Batch API

  • What it is: Finally, batch API is a super-efficient way to send large numbers of non-latency sensitive text prompt requests, supporting use cases such as classification and sentiment analysis, data extraction, and description generation. 

  • Why it matters: It helps speed up developer workflows and reduces costs by enabling multiple prompts to be sent to models in a single request.

  • Get started: View documentation to get started.

New model monitoring capabilities

  • What it is: The new Vertex AI Model Monitoring includes

    • Support for models hosted outside of Vertex AI (e.g. GKE, Cloud Run, even multi-cloud & hybrid-cloud)

    • Unified monitoring job management for both online and batch prediction

    • Simplified configuration and metrics visualization attached to the model, not the endpoint

  • Why it matters: Vertex AI’s new model monitoring features provide a more flexible, extensible, and consistent monitoring solution for models deployed on any serving infrastructure (even outside of Vertex AI, e.g. Google Kubernetes Engine, Cloud Run, Google Compute Engine and more).

  • Get started: Learn more in this blog.

Ray on Vertex AI is GA

  • What it is: Ray provides a comprehensive and easy-to-use Python distributed framework. With Ray, you configure a scalable cluster of computational resources and utilize a collection of domain-specific libraries to efficiently distribute common AI/ML tasks like training, serving, and tuning. 

  • Why it matters: This integration empowers AI developers to effortlessly scale their AI workloads on Vertex AI's versatile infrastructure, which unlocks the full potential of machine learning, data processing, and distributed computing.

  • Get started: Ready the blog to learn more.

Prompt Management

  • What it is: Vertex AI Prompt Management, now in preview, provides a library of prompts for use among teams, including versioning, the option to restore old prompts, and AI-generated suggestions to improve prompt performance. 

  • Why it matters: This feature makes it easier for organizations to get the best performance from gen AI models at scale, and to iterate more quickly from experimentation to production. Customers can compare prompt iterations side by side to assess how small changes impact outputs, and the service offers features like notes and tagging to boost collaboration. 

  • Get started: Visit documentation to learn more.

Evaluation Services 

  • What it is: We now support Rapid Evaluation in preview to help users evaluate model performance when iterating on the best prompt design. Users can access metrics for various dimensions (e.g., similarity, instruction following, fluency) and bundles for specific tasks (e.g., text generation quality). We also launched RAG and Grounded Generation evaluation metrics for summarization and question answering (eg: groundedness, answer_quality, coherence). For a side by side comparative evaluation, AutoSxS is now generally available, and helps teams compare the performance of two models, including explanations for why one model outperforms another and certainty scores that help users understand the accuracy of an evaluation.

  • Why it matters: Evaluation tools in Vertex AI help customers compare models for a specific set of tasks in order to get the best performance. 

  • Get started: Learn more in documentation.

Develop and deploy agents faster, grounded in your enterprise truth

Vertex AI Agent Builder allows you to easily and quickly build and customize AI Agents - for any skill level. A core component of the Vertex AI Agent Builder is Vertex AI Search, enabling you to ground the models in your data or the web. 

Grounding at Vertex

You have many options for Grounding and RAG at Vertex. These capabilities address some of the most significant hurdles limiting the adoption of generative AI in the enterprise: the fact that models do not know information outside their training data, and the tendency of foundation models to “hallucinate,” or generate convincing yet factually inaccurate information. Retrieval Augmented Generation (RAG), a technique developed to mitigate these challenges, first “retrieves” facts about a question, then provides those facts to the model before it “generates” an answer – this is what we mean by grounding. Getting relevant facts quickly to augment a model's knowledge is ultimately a search problem.  

Read more at this blog post.

Grounding with Google Search is GA

  • What it is: When customers select Grounding with Google Search for their Gemini model, Gemini will use Google Search, and generate an output that is grounded with the relevant internet search results. Grounding with Google Search also offers dynamic retrieval, a new capability to help customers balance quality with cost efficiency by intelligently selecting when to use Google Search results and when to use the model’s training data. 

  • Why it matters: Grounding with Google Search is simple to use and makes the world’s knowledge available to Gemini.  Dynamic retrieval will save you money and will save your users time, only grounding when needed.

  • Get started: Read documentation to learn more about how to get started.

Grounding with third-party datasets

  • What it is: Vertex AI will offer a new service that lets customers ground their models and AI agents with specialized third-party data. We are working with providers like Moody's, MSCI, Thomson Reuters and Zoominfo to enable access to their datasets.

  • Why it matters: These capabilities will help customers build AI agents and applications that offer more accurate and helpful responses. 

  • Get started: Coming soon, contact sales to learn more. 

Grounding with high-fidelity mode 

  • What it is: High-fidelity mode is powered by a version of Gemini 1.5 Flash that’s been fine-tuned to only use customer-provided content to generate answers and ensures high levels of factuality in response. 

  • Why it matters: In data-intensive industries like financial services, healthcare, and insurance, generative AI use cases often require the generated response to be sourced from only the provided context, not the model’s world knowledge. Grounding with high-fidelity, announced in experimental preview, is purpose-built to support such grounding use cases, including summarization across multiple documents, data extraction against a set corpus of financial data, or processing across a predefined set of documents.

  • Get started: Contact sales to learn more. 

Expanding Vector Search to support hybrid search

  • What it is: Vector Search, the ultra high performance vector database powering Vertex AI Search, DIY RAG, and other embedding use cases at global scale, now offers hybrid search in Public Preview. 

  • Why it matters: Embeddings are numerical representations that capture semantic relationships across complex data (text, images, etc.). Embeddings power multiple use cases, including recommendation systems, ad serving, and semantic search for RAG. Hybrid search combines vector-based and keyword-based search techniques to ensure the most relevant and accurate responses for users. 

  • Get started: Visit documentation to learn more about Vector Search.

LangChain on Vertex

  • What it is: An agent development SDK and container runtime for LangChain. With LangChain on Vertex AI you can select the model you want to work with, define tools to access external APIs, structure the interface between the user and the system components in an orchestration framework, and deploy the framework to a managed runtime.

  • Why it matters: LangChain on Vertex AI simplifies and speeds up deployment while being secure, private and scalable. 

  • Get started: Visit documentation to learn more. 

Vertex AI extensions, function calling and ​​data connectors 

  • What it is: 

    • Vertex AI extensions are pre-built reusable modules to connect a foundation model to a specific API or tool. For example, our new code interpreter extension enables models to execute tasks that entail running Python code, such as data analysis, data visualization, and mathematical operations. 

    • Vertex AI function calling enables a user to describe a set of functions or APIs and have Gemini models intelligently select, for a given query, the right API or function to call, along with the appropriate API parameters.

    • Vertex AI data connectors help ingest data from enterprise and third-party applications like ServiceNow, Hadoop, and Salesforce, connecting generative applications to commonly-used enterprise systems.

  • Why it matters: With these capabilities, Vertex AI Agent Builder makes it easy to augment grounding outputs and take action on your user’s behalf. 

  • Get started: Visit documentation to learn more about Vertex AI extensions, function calling and ​​data connectors.

Firebase Genkit 

  • What it is: Genkit is an open-source TypeScript/JavaScript and Go framework designed by Firebase to simplify the development, deployment, and monitoring of production-ready AI applications.

  • Why it matters: With the Vertex AI plugin for Genkit, developers can now take advantage of Google models like Gemini and Imagen 2, as well as text embeddings. Additionally Vertex Eval Service is baked into the Genkit local development experience along with OpenTelemetry tracing.

  • Get started: Learn more in documentation.

LlamaIndex on Vertex AI

  • What it is: LlamaIndex on Vertex AI simplifies building your own search engine for retrieval-augmented generation (RAG), from data ingestion and transformation to embedding, indexing, retrieval, and generation.

  • Why it matters: Vertex AI customers can leverage Google’s models and AI-optimized infrastructure alongside LlamaIndex’s simple, flexible, open-source data framework, to connect custom data sources to generative models. 

  • Get started: Visit documentation to learn more.

Built on a foundation of scale & enterprise readiness

The revolutionary nature of generative AI requires a platform that offers privacy, security, control, and compliance capabilities organizations can rely on. Google Cloud is committed to helping our customers leverage the full potential of generative AI with privacy, security, and compliance capabilities. Our goal is to build trust by protecting systems, enabling transparency, and offering flexible, always-available infrastructure, all while grounding efforts in our AI principles.

Dynamic Shared Quota

  • What it is: With Dynamic Shared Quota, we offer increasing the quota limits for a model (online serving) to the maximum allowed per region. This way we limit the number of queries per second (QPS) that customers can run by the shared capacity of all the queries running on a Servo station (multi-region), instead of limiting a customer’s QPS by a quota. Dynamic Shared Quota is only applicable to Pay-as-you-go Online Serving. For customers that require a consistent or more predictable service level, including SLAs, we offer Provision Throughput.

  • Why it matters: By dynamically distributing on-demand capacity among all queries being processed for Pay-as-you-go customers, Google Cloud has eliminated the need to submit quota increase requests (QIRs). Customers can still set a self-imposed quota called a consumer quota override to control cost and prevent budget overruns.

  • Get started: Learn more in documentation.

Provisioned Throughput is GA 

  • What it is: Provisioned throughput lets customers responsibly scale their usage of Google’s first-party models, like 1.5 Flash, providing assurances for both capacity and price. 

  • Why it matters: This Vertex AI feature brings predictability and reliability to customer production workloads, giving them the assurance required to scale gen AI workloads aggressively.  We have also made it easier than ever for customers to set up PT via a Self Service flow. Customers can now estimate their needs and purchase Provisioned Throughput for Google’s 1P foundation models via the console, bringing the E2E experience down from weeks to minutes for pre-approved orders subject to available capacity and removing the need for manual order forms.

  • Get started: Follow these steps to purchase a Provisioned Throughput subscription.

Data residency for data stored at-rest guarantees in more countries

  • What it is: We have data residency for data stored at-rest guarantees in 23 countries (13 of which were added in 2024), with additional guarantees around limiting related ML processing to the US and EU. We are also working on expanding our ML processing commitments to eight more countries, starting with four countries in 2024.

  • Why it matters: Customers, especially those from regulated industries, demand control over where their data is stored and processed when using generative AI capabilities. 

  • Get started: Learn more here.

To keep up with all of the latest releases, don’t forget to check our Vertex AI release notes

All of these enhancements are a direct response to what you, our customers, have been asking for. We believe an enterprise AI platform is key to success in production and our goal is to not just build the best platform, but to provide an AI ecosystem that makes enterprise-scale AI accessible.

To learn about how Vertex AI can help you, contact us for a free consultation.

aside_block
<ListValue: [StructValue([('title', 'Talk to an expert'), ('body', <wagtail.rich_text.RichText object at 0x3eea46ea4910>), ('btn_text', 'Contact sales'), ('href', 'https://inthecloud.withgoogle.com/global-gen-ai-contact-sales/cs-bsm.html?utm_source=cgc-blog&utm_medium=blog&utm_campaign=FY23-Q3-global-ENDM228-website-sm-gen-ai-contact-us-lp&utm_content=vertex-ai-evergreen-announcements&utm_term=-'), ('image', None)])]>

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Vertex AI 生成式 AI Gemini Llama 3.1 Mistral AI
相关文章