TechCrunch News 01月23日
Coval evaluates AI voice and chat agents like self-driving cars
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Coval是由前Waymo技术负责人Brooke Hopkins创立的AI Agent评估平台,旨在通过模拟测试评估AI语音和聊天机器人的性能。该平台可以同时运行数千个模拟,例如模拟代理进行餐厅预订或回答客户服务问题。Coval根据一套通用指标评估代理,企业也可以自定义评估标准,并使用Coval持续评估回归情况。用户可以利用这些数据和洞察力,向终端客户展示代理的运行情况。Coval已获得330万美元的种子轮融资,将用于扩大工程团队和实现产品与市场的契合。

💡Coval平台通过构建AI语音和聊天代理的模拟环境,来测试和评估它们在执行任务时的表现,类似于Hopkins在Waymo测试自动驾驶汽车的方式。

📊Coval的评估技术不仅提供通用的评估指标,还允许公司自定义评估标准,以便持续监控和防止性能衰退。

🤝Coval平台不仅可以帮助企业选择合适的供应商,还可以向客户展示AI代理的实际效果,增强客户对技术的信任。

🚀Coval在Y Combinator夏季孵化后迅速发展,并在短短两个月内需求激增,表明市场对AI代理评估工具的迫切需求。

What do AI voice agents and self-driving cars have in common? Their performance can be evaluated in the same way, argues Brooke Hopkins, a former tech lead at Waymo. Coval, Hopkins’ new startup, looks to do just that.

“When I left Waymo, I realized a lot of these problems that we had at Waymo were exactly what the rest of the AI industry was facing,” Hopkins (pictured above in the center) told TechCrunch. “But everyone was saying that this is a new paradigm, we’re having to come up with testing practices from first principles and that basically we all have to recreate everything. And I looked at that and said, wait, we’ve spent the last 10 years in self driving figuring out how to do this.”

In 2024, she decided to launch Coval, a platform that builds simulations for AI voice and chat agents that tests and evaluates how they perform tasks in the same way Hopkins tested self-driving cars at Waymo. Coval can run thousands of simulations simultaneously, like having the agent make a restaurant reservation or having the agent respond to a customer service question asked in an indirect way.

Coval’s tech evaluates the agents on a general set of metrics, but companies can also customize what they are looking for and use Coval to continue to evaluate for regressions. Users can also take this data, and the insights they gleam off of it, and bring it to their end-customers either for a demo or as a monitoring tool to show their customers the agent is working as intended.

“One of the biggest blockers to agents being adopted by enterprises is them feeling confident that this isn’t just a demo with smoke and mirrors,” Hopkins said. “Choosing between vendors is a really complicated task for these executives because it’s just very hard to know what you even ask or how do you even prove that these agents are doing what you expect. And so this gives our companies the ability to really show that and demonstrate it.”

Hopkins really formulated the idea behind Coval during the Y Combinator Summer 2024 batch before launching the product publicly in October 2024. She said that demand has been strong and has become explosive in the last two months, with customers asking how quickly they can get their agents evaluated.

The San Francisco-based startup is now announcing a $3.3 million seed round led by MaC Venture Capital with participation from Y Combinator and General Catalyst. The startup will use the capital to build out its engineering team and work to achieve product-market fit. Hopkins added that the company will also be working toward enabling its users to evaluate other types of AI agents, like web-based agents, in the future.

Coval comes on the scene while both momentum — and hype — around AI agents appears to be at an all-time high. Enterprise tech leaders like Marc Benioff have been praising (and marketing) the technology by saying Salesforce will deploy more than a billion of its AI agents by next year. OpenAI is rumored to be releasing its take on an AI agent very soon.

There are also numerous startups building in the space, too. There were more than 100 startups building AI agents across Y Combinator’s three 2024 cohorts alone. Some AI agent startups have landed sizable venture funding rounds too. One, /dev/agents, raised a $55 million seed round at a $500 million valuation in November 2024, less than a year after it was founded.

This momentum means it’s likely that there will be more companies looking for help to evaluate their agents too. Hopkins said Coval has a good shot at standing out from the pack because, unlike the inevitable new entrants, Coval has a head start.

“I think where we really stand out is I’ve been working in this space for half a decade and I’ve built these systems over and over,” she said. “We’ve built multiple iterations and we’ve seen how they fail and how they scale and we’re building the same concepts into Coval and all of those learnings.”

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI Agent Coval 评估平台 模拟测试
相关文章