TechCrunch News 02月07日
Composo helps enterprises monitor how well AI apps work
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Composo是一家位于伦敦的初创公司,致力于解决企业在使用大型语言模型(LLM)驱动的应用程序时面临的可靠性问题。Composo通过其定制模型,帮助企业评估LLM应用在准确性和质量方面的表现。与Agenta、Freeplay等公司类似,Composo也提供基于LLM的解决方案,以替代人工测试。Composo的独特之处在于同时提供无需代码操作的选项和API接口,扩大了潜在市场,使得非开发人员也能评估AI应用的一致性、质量和准确性。Composo结合奖励模型和特定标准,评估AI应用的输出。目前已获得埃森哲、Palantir等客户,并完成了200万美元的pre-seed轮融资。

💡Composo旨在解决企业采用AI时面临的关键瓶颈,即AI应用的可靠性和一致性问题。许多企业对AI的炒作感到厌倦,开始关注AI是否真的能改变业务,以及如何证明其价值。

🎯Composo通过定制模型评估LLM驱动应用的准确性和质量,提供无代码选项和API,使得企业内不同角色的人员都能参与到AI应用的评估过程中。Composo结合奖励模型和特定标准,评估AI应用的输出,例如,医疗分诊聊天机器人可以设置自定义指南来检查危险信号,Composo可以评估应用执行这些检查的一致性。

💰Composo近期推出了公共API——Composo Align,用于评估LLM应用在任何标准上的表现。虽然pre-seed轮融资额为200万美元,数额相对较小,但Composo认为其方法不需要大量资本投入。Composo计划利用这笔资金扩大工程团队,获取更多客户,并加强研发工作。

🛡️Composo认为其竞争优势在于研发投入和数据积累。Composo Align是在“大量专家评估数据集”上训练的。此外,Composo通过评估应用在灵活的标准下的表现,更好地适应了Agentic AI的兴起。

AI and the large language models (LLMs) that power them have a ton of useful applications, but for all their promise, they’re not very reliable.

No one knows when this problem will be solved, so it makes sense that we’re seeing startups finding an opportunity in helping enterprises make sure the LLM-powered apps they’re paying for work as intended.

London-based startup Composo feels it has a headstart in trying to solve that problem, thanks to its custom models that can help enterprises evaluate the accuracy and quality of apps that are powered by LLMs.

The company’s similar to Agenta, Freeplay, Humanloop and LangSmith, which all claim to offer a more solid, LLM-based alternative to human testing, checklists and existing observability tools. But Composo claims it’s different because it offers both a no-code option and an API. That’s notable because this widens the scope of its potential market — you don’t have to be a developer to use it, and domain experts and executives can evaluate AI apps for inconsistencies, quality and accuracy themselves.

In practice, Composo combines a reward model trained on the output a person would prefer to see from an AI app with a defined set of critera that are specific to that app to create a system that essentially evaluates outputs from the app against those criteria. For instance, a medical triage chatbot can have its client set custom guidelines to check for red flag symptoms, and Composo can score how consistently the app does it.

The company recently launched a public API for Composo Align, a model for evaluating LLM applications on any criteria.

The strategy seems to be working somewhat: It has names like Accenture, Palantir and McKinsey in its customer base, and it recently raised $2 million in pre-seed funding. The small amount raised here is not uncommon for a startup in today’s venture climate, but it is notable because this is AI Land, after all — funding to such companies is abundant.

But according to Composo’s co-founder and CEO, Sebastian Fox, the relatively low number is because the startup’s approach is not particularly capital intensive.

“For the next three years at least, we don’t foresee ourselves raising hundreds of millions because there’s a lot of people building foundation models and doing so very effectively, and that’s not our USP,” Fox, a former Mckinsey consultant, said. “Instead, each morning, if I wake up and see a news piece that OpenAI has made a huge advance in their models, that is good for my business.”

With the fresh cash, Composo plans to expand its engineering team (led by co-founder and CTO Luke Markham, a former machine learning engineer at Graphcore), acquire more clients and bolster its R&D efforts. “The focus from this year is much more about scaling the technology that we now have across those companies,” Fox said.

British AI pre-seed fund Twin Path Ventures led the seed round, which also saw participation from JVH Ventures and EWOR (the latter had backed the startup through its accelerator program). “Composo is addressing a critical bottleneck in the adoption of enterprise AI,” a spokesperson for Twin Path said in a statement.

That bottleneck is a big problem for the overall AI movement, particularly in the enterprise segment, Fox said. “People are over the hype of excitement and are now thinking, ‘Well, actually, does this really change anything about my business in its current form? Because it’s not reliable enough, and it’s not consistent enough. And even if it is, you can’t prove to me how much it is,’” he said.

That bottleneck could make Composo more valuable to companies that want to implement AI but could incur reputational risk from doing so. Fox says that’s why his company chose to be industry agnostic, but still have resonance in the compliance, legal, health care and security spaces.

As for its competitive moat, Fox feels that the R&D required to get here is not trivial. “There’s both the architecture of the model and the data that we’ve used to train it,” he said, explaining that Composo Align was trained on a “large dataset of expert evaluations.”

There’s still the question of what tech giants could do if they simply tapped their massive war chests to enter this problem, but Composo thinks it has a first mover advantage. “The other [thing] is the data that we accrue over time,” Fox said, referring to how Composo has built evaluation preferences.

Because it assesses apps against a flexible set of criteria, Composo also sees itself as better suited to the rise of agentic AI than competitors that use a more constrained approach. “In my opinion, we are definitely not at the stage where agents work well, and that’s actually what we’re trying to help solve,” Fox said.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Composo LLM AI可靠性 企业AI应用
相关文章