TechCrunch News 2024年12月05日
AWS brings prompt routing and caching to its Bedrock LLM service
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

为了降低使用大型语言模型(LLM)的成本,AWS在re:invent大会上宣布了Bedrock LLM托管服务的两项新功能:缓存服务和智能提示路由。缓存服务可以存储和重用已处理过的查询,从而减少重复计算,降低成本并提升响应速度。智能提示路由则能够根据查询的复杂程度,自动将查询路由到不同模型,以平衡性能和成本。此外,AWS还推出了Bedrock模型市场,提供更多专业化模型供用户选择。这些新功能旨在帮助企业更有效地利用LLM,降低成本并提升效率。

🤔 **缓存服务降低成本和延迟:**AWS Bedrock引入缓存服务,存储已处理过的查询结果,避免重复计算,从而降低成本,据AWS称,成本可降低高达90%。同时,缓存服务还显著降低了响应延迟,最高可达85%。Adobe在Bedrock上测试提示缓存后,响应时间缩短了72%。

🚀 **智能提示路由优化成本和性能:**Bedrock的智能提示路由功能可以根据查询的复杂度,自动将查询路由到不同模型。该系统利用小型语言模型预测不同模型对特定查询的性能,并相应地路由请求,从而在性能和成本之间找到最佳平衡。

🏪 **Bedrock模型市场扩展模型选择:**AWS推出了Bedrock模型市场,提供约100个新兴和专业化模型,用户可以根据自身需求选择合适的模型。需要注意的是,用户需要自行配置和管理这些模型的基础设施。

🤝 **AWS与模型提供商合作:**AWS与众多大型模型提供商合作,同时支持数百个专业化模型,满足不同用户的需求。

💡 **未来规划:**AWS计划扩展智能提示路由系统,为用户提供更多自定义选项,并不断丰富Bedrock模型市场。

As businesses move from trying out generative AI in limited prototypes to putting them into production, they are becoming increasingly price conscious. Using large language models isn’t cheap, after all. One way to reduce cost is to go back to an old concept: caching. Another is to route simpler queries to smaller, more cost-efficient models. At its re:invent conference in Las Vegas, AWS today announced both of these features for its Bedrock LLM hosting service.

Let’s talk about the caching service first. “Say there is a document, and multiple people are asking questions on the same document. Every single time you’re paying,” Atul Deo, the director of product for Bedrock, told me. “And these context windows are getting longer and longer. For example, with Nova, we’re going to have 300k [tokens of] context and 2 million [tokens of] context. I think by next year, it could even go much higher.”

Image Credits:AWS

Caching essentially ensures that you don’t have to pay for the model to do repetitive work and reprocess the same (or substantially similar) queries over and over again. According to AWS, this can reduce cost by up to 90% but one additional byproduct of this is also that the latency for getting an answer back from the model is significantly lower (AWS says by up to 85%). Adobe, which tested prompt caching for some of its generative AI applications on Bedrock, saw a 72% reduction in response time.

The other major new feature is intelligent prompt routing for Bedrock. With this, Bedrock can automatically route prompts to different models in the same model family to help businesses strike the right balance between performance and cost. The system automatically predicts (using a small language model) how each model will perform for a given query and then route the request accordingly.

Image Credits:AWS

“Sometimes, my query could be very simple. Do I really need to send that query to the most capable model, which is extremely expensive and slow? Probably not. So basically, you want to create this notion of ‘Hey, at run time, based on the incoming prompt, send the right query to the right model,’” Deo explained.

LLM routing isn’t a new concept, of course. Startups like Martian and a number of open source projects also tackle this, but AWS would likely argue that what differentiates its offering is that the router can intelligently direct queries without a lot of human input. But it’s also limited, in that it can only route queries to models in the same model family. In the long run, though, Deo told me, the team plans to expand this system and give users more customizability.

Image Credits:AWS

Lastly, AWS is also launching a new marketplace for Bedrock. The idea here, Deo said, is that while Amazon is partnering with many of the larger model providers, there are now hundreds of specialized models that may only have a few dedicated users. Since those customers are asking the company to support these, AWS is launching a marketplace for these models, where the only major difference is that users will have to provision and manage the capacity of their infrastructure themselves — something that Bedrock typically handles automatically. In total, AWS will offer about 100 of these emerging and specialized models, with more to come.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AWS Bedrock LLM 缓存 模型路由
相关文章