Latent 02月06日
LLM Gateway: The One Decision That Removes 100 AI Engineering Decisions
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了AI工程师在构建基于大型语言模型(LLM)的应用时,面临的从短期简单到长期复杂的挑战。最初,工程师们可能使用OpenAI SDK/API构建GPT封装器,但随着Claude Sonnet、DeepSeek、Amazon Nova和Gemini 2等模型的出现,模型路由变得复杂。虽然LangChain、LlamaIndex等工具可以简化集成,但需要学习特定的抽象。此外,可观测性、安全防护、以及团队协作中的密钥管理等问题,都会增加AI工程的复杂性。文章指出,解决这些问题的关键在于采用LLM网关。

🔑AI工程最初可能很简单,例如使用OpenAI SDK/API构建GPT封装器,但随着更多具有竞争力的模型出现,如Claude Sonnet、DeepSeek、Amazon Nova和Gemini 2,模型路由变得复杂,需要考虑不同API的兼容性和成本效益。

🛠️LangChain、LlamaIndex等工具可以简化LLM集成,但它们有其特定的抽象和局限性,工程师需要学习这些抽象,并且这些工具可能不支持所有需求。此外,可观测性(logging)的引入,虽然有助于追踪bug、理解用户行为和控制成本,但也会带来额外的开销和复杂性。

🛡️AI工程还需要考虑安全防护,包括防止生成不适当的内容、验证输出质量等。即使是简单的JSON输出,也可能需要使用Instructor等工具进行验证和纠正。此外,团队协作中的密钥管理也是一个挑战,需要避免共享密钥或创建大量的密钥,以确保安全和控制。

🌐文章作者提倡采用LLM网关来解决这些复杂性,LLM网关可以集中管理模型路由、可观测性、安全防护和密钥管理等问题,从而简化AI工程,避免重复发明轮子。

Sponsorships and the Agents Engineering track for AIE Summit NYC are sold out. Speakers and schedule is up. Last calls for AI Leadership track for CTOs/VP’s of AI!


Dear AI Engineer,

We get it. You want to keep things simple. At first you built your GPT wrapper with the openai SDK/API, then shipped it, got a million users. Awesome!

Model Routing…

Then Claude Sonnet ships with incredible coding ability and vibes. DeepSeek comes along with incredibly cheap reasoning. Amazon Nova comes out 1000x cheaper than GPT4. Gemini 2 launches with the highest intelligence-for-cost ratio in the world. Their APIs don’t quite match.

Okay, no problem, keep it simple. Cursor is the fastest growing SaaS in the history of SaaS, with a few prompts you’ve got an abstract ModelProvider interface that unifies all the things, if you just add the right environment variables.

Look familiar? Or maybe not, you’re smarter than this, you use LangChain or LlamaIndex or Pydantic AI and they’ve done it for you! Just pip install and RTFM.

Of course, you have to learn their specific abstractions, and they don’t support everything you want, but that’s okay, it’s open source, better than nothing, keep it simple.

… and Observability…

Then you wanted to add some logging. Just to track down bugs and p99 latency, understand your user behavior, get a handle on costs, any number of reasons. No problem, you kept it simple. You already use Datadog, Mixpanel, Honeycomb et al. OpenTelemetry already added Semantic Conventions for GenAI. Log a few events, you’re good. Maybe use “AI native” players like HumanLoop, LangSmith, or Braintrust:

That’s at least a 7 LOC overhead on every single LLM call. This is a problem as old as observability with a standard library of solutions: Decorator approaches can sweep it under the rug, or you can monkey patch the SDK, and auto-instrumentation approaches exist.

At scale, the invoices for logging everything start being real. It’s a lot of money to pay for a wall of data that just sits there with nobody looking at it. Maybe you add a little if random.random() < sample_rate: in there. I hope you remembered to figure out how to log with retries and agent loops and tool calls and streaming and pair with human feedback and anomaly detection and…

… and Guardrails…

Speaking of retries: Everyone knows that AI Engineering involves building reliable systems atop non-deterministic LLMs. This is both simpler and bigger than Safety and Security, though those are important: There’s Guardrails AI and NeMo-Guardrails for the general problem, but even JSON/structured output needs Instructor (and we have found that naively-retried structured output can often outperform constrained-decoding structured output like the ones supported by OpenAI1).

We’re not just talking about removing PII or checking for embarrassing content, but also plain simple validations on output quality - for an auto-titler module I have had models generate three titles in one string when I just wanted one, or for formatted markdown output summary generation I’ve had models duplicate my desired **bolding** to cause ****unnecessary bolding****.

Sure, keep it simple, write some regex, or use a library.

… and 100 other simple decisions…

When working on a team of N people using M models, you have a few options:

There’s a long tail of little problems, each of which are simple to do on their own…

… but all together it is madness to reinvent the same 100 wheels that everyone has in every app and every framework.

Short Term Simplicity, Long Term Complexity

I haven’t been very subtle about where we’re going here. I think all these decisions can be solved at origin by adopting the obvious software bundle that has emerged: The LLM Gateway, which we mentioned in the Humanloop writeup and developed in the Braintrust writeup:

Read more

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI工程 LLM网关 模型路由 可观测性 安全防护
相关文章