TechCrunch News 01月21日
OpenAI’s agent tool may be nearing release
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

OpenAI可能即将发布一款能控制用户电脑并代其执行操作的AI工具Operator。该工具被传具有自主处理任务的能力,预计1月发布。有软件工程师发现相关证据,且OpenAI网站已有相关提及。同时,文中还涉及该工具的性能、与其他系统的比较以及安全测试等内容。

🎯OpenAI可能发布能控制电脑的Operator工具

📈Operator性能在不同任务和基准中有差异

🔒Operator的安全测试情况及相关评价

💪OpenAI与其他公司在AI代理领域的竞争

OpenAI may be close to releasing an AI tool that can take control of your PC and perform actions on your behalf.

Tibor Blaho, a software engineer with a reputation for accurately leaking upcoming AI products, claims to have uncovered evidence of OpenAI’s long-rumored Operator tool. Publications including Bloomberg have previously reported on Operator, which is said to be an “agentic” system capable of autonomously handling tasks like writing code and booking travel.

According to The Information, OpenAI is targeting January as Operator’s release month. Code uncovered by Blaho this weekend adds credence to that reporting.

OpenAI’s ChatGPT client for macOS has gained options, hidden for now, to define shortcuts to “Toggle Operator” and “Force Quit Operator,” per Blaho. And OpenAI has added references to Operator on its website, Blaho said — albeit references that aren’t yet publicly visible.

According to Blaho, OpenAI’s site also contains not-yet-public tables comparing the performance of Operator to other computer-using AI systems. The tables may well be placeholders. But if the numbers are accurate, they suggest that Operator isn’t 100% reliable, depending on the task.

On OSWorld, a benchmark that tries to mimic a real computer environment, “OpenAI Computer Use Agent (CUA)” — possibly the AI model powering Operator — scores 38.1%, ahead of Anthropic’s computer-controlling model but well short of the 72.4% humans score. OpenAI CUA surpases human performance on WebVoyager, which evaluates an AI’s ability to navigate and interact with websites. But the model falls short of human-level scores on another web-based benchmark, WebArena, according to the leaked benchmarks.

Operator also struggles with tasks a human could perform easily, if the leak is to be believed. In a test that tasked Operator with signing up with a cloud provider and launching a virtual machine, Operator was only successful 60% of the time. Tasked with creating a Bitcoin wallet, Operator succeeded only 10% of the time.

OpenAI’s imminent entry into the AI agent space comes as rivals including the aforementioned Anthropic, Google, and others make plays for the nascent segment. AI agents may be risky and speculative, but tech giants are already touting them as the next big thing in AI. According to analytics firm Markets and Markets, the market for AI agents could be worth $47.1 billion by 2030.

Agents today are rather primitive. But some experts have raised concerns about their safety, should the technology rapidly improve.

One of the leaked charts shows Operator performing well on selected safety evaluations, including tests that try to get the system to perform “illicit activities” and search for “sensitive personal data.” Reportedly, safety testing is among the reasons for Operator’s long development cycle. In a recent X post, OpenAI co-founder Wojciech Zaremba criticized Anthropic for releasing an agent he claims lacks safety mitigations.

“I can only imagine the negative reactions if OpenAI made a similar release,” Zaremba wrote.

It’s worth noting that OpenAI has been criticized by AI researchers, including ex-staff, for allegedly de-emphasizing safety work in favor of quickly productizing its technology.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

OpenAI Operator AI代理 安全测试
相关文章