TechCrunch News 02月04日
OpenAI’s Operator agent helped me move, but I had to help it, too
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

OpenAI推出了一款名为Operator的AI智能代理,旨在帮助用户自动化处理网络上的任务。测试显示,Operator能够点击按钮、浏览网页和填写表格,速度优于其他同类产品。然而,在实际使用中,用户仍然需要频繁介入,指导Operator完成任务。Operator在处理复杂任务时容易出错,甚至出现“幻觉”,导致信息错误。尽管如此,Operator仍被视为AI智能代理领域的一个重要概念验证,预示着未来AI在网络交互方面的潜力。包括Instacart、Uber和eBay在内的多家公司正在与OpenAI合作,探索AI代理在各自平台上的应用。

🤖 OpenAI发布Operator,一款旨在自动化互联网任务的AI代理,它结合了GPT-4o的视觉理解和o1的推理能力,能够执行基本的网页操作,如点击按钮、导航菜单和填写表格。

⚠️ Operator在独立完成任务时仍存在局限性,用户在测试中发现需要频繁协助,例如回答问题、授予权限和纠正错误,这降低了其在实际应用中的便利性,类似于配备巡航控制的汽车,尚未实现完全的自动驾驶。

🤝 部分公司如Instacart、Uber和eBay正在与OpenAI合作,探索Operator在各自平台上的应用,他们认为AI代理将在未来的用户互动中发挥重要作用,但同时强调在线平台的重要性不会因此消失。

🤔 Operator在测试中出现了“幻觉”问题,例如在寻找停车场时提供了错误地址,导致潜在的经济损失,这突显了当前AI代理在可靠性方面面临的挑战,以及用户信任度的问题。

OpenAI gave me one week to test its new AI agent, Operator, a system that can independently do tasks for you on the internet.

Operator is the closest thing I’ve seen to the tech industry’s vision of AI agents — systems that can automate the boring parts of life, freeing us up to do the things we really love. However, judging from my experience with OpenAI’s agent, truly “autonomous” AI systems are still just out of reach.

OpenAI trained a new model to power Operator, which combines the visual understanding of GPT-4o with the reasoning capabilities of o1.

That model seems to work well for basic tasks; I watched Operator click buttons, navigate menus on websites, and fill out forms. The AI was occasionally successful at independently taking actions, and it works much faster than web-based agents I’ve seen from Anthropic and Google.

But during my trial, I found myself assisting OpenAI’s agent more than I’d like. It felt like I was coaching Operator through each problem, whereas I wanted to push certain tasks off my plate altogether.

Too often during my test, I had to answer several questions, grant permissions, fill out personal information, and help the agent when it got stuck.

In car terms, Operator is like driving a car with cruise control – occasionally taking your foot off the pedals and letting the car drive itself – but it’s far from full-blown autopilot.

In fact, OpenAI says Operator’s frequent pauses are by design.

The AI powering Operator, much like the AI powering chatbots like OpenAI’s ChatGPT, can’t reliably work independently for long periods of time, and it’s prone to the same sort of hallucinating. Because of that, OpenAI doesn’t want to give the system too much decision-making power or sensitive user information. Maybe that’s a safe choice by OpenAI, but it reduces Operator’s practicality.

That said, OpenAI’s first agent is an impressive proof of concept — and interface — for an AI that can use the front end of any website. But to create truly independent AI systems, tech companies will need to build more reliable AI models that don’t require this much steering.

My Operator trial coincided with the week I was moving apartments, so I had OpenAI’s agent help with moving logistics.

I asked Operator to help me buy a new parking permit. OpenAI’s agent told me, “Sure,” then opened a window into its browser on my PC’s screen.

Operator then conducted a search for a San Francisco parking permit in the browser, took me to the correct city website, and even the right page.

Operator still lets you use the rest of your computer while it’s working, something that can’t be said for Google’s Project Mariner. This is because OpenAI’s agent isn’t really working on the computer, but rather, off in the cloud somewhere.

The operator interface (Credit: Maxwell Zeff/OpenAI)

For my parking permit, I had to grant Operator permission to start different processes a few too many times. It also stopped to ask me to fill out forms with personal information – such as my name, phone number, and email address. At times, Operator also got lost, forcing me to take control of the browser and get the agent back on track.

In another test, I asked Operator to make me a reservation at a Greek restaurant. To its credit, Operator found me a nice place in my area with reasonable prices. But I had to answer more than half a dozen questions throughout the flow.

Some steps to making a reservation with Operator (Credit: Maxwell Zeff/OpenAI)

If you have to intervene six or more times just to book a reservation through an AI agent, at what point is it easier to just do it yourself? That’s a question I asked myself a lot while testing Operator.

In a few of my tests, I ran into websites that blocked Operator for whatever reason. For example, I tried booking an electrician using TaskRabbit, but OpenAI’s agent told me that it ran into an error, and asked if it could use an alternative service instead. Expedia, Reddit, and YouTube also blocked the AI agent from accessing their platforms.

However, other services are embracing Operator with open arms. Instacart, Uber, and eBay collaborated with OpenAI for the launch of Operator, allowing the agent to navigate their websites on behalf of humans.

These businesses are preparing for a future where a subset of user interactions are facilitated by an AI agent.

“Customers are using Instacart through a variety of different entry points,” said Daniel Danker, chief product officer at Instacart, in an interview with TechCrunch. “We see Operator as, potentially, another one of those entry points.”

Letting OpenAI’s agent use Instacart’s website on behalf of a person seems like it would separate Instacart from its customers. However, Danker says Instacart wants to meet customers wherever they are.

“We really are bullish about our belief, similar to OpenAI, that agentic systems will have a major impact on how consumers interact with digital properties,” said eBay’s chief AI officer, Nitzan Mekel-Bobrov, in an interview with TechCrunch.

Even if AI agents rise in popularity, Mekel-Bobrov says he expects users will always come to eBay’s website, noting that “online destinations are not going anywhere.”

I had some issues trusting Operator after it hallucinated a few times, and nearly cost me several hundreds dollars.

For instance, I asked the agent to find me a parking garage near my new apartment. It ended up suggesting two garages that it said would take just a few minutes to walk to.

Hallucination about parking spot distances (Credit: Maxwell Zeff/OpenAI)

Besides being way out of my price range, the garages were actually really far from my apartment. One was a 20-minute walk away, and the other was a 30-minute walk. Turns out, Operator had put in the wrong address.

This is exactly why OpenAI doesn’t give its agent your credit card number, passwords, or access to email. If OpenAI didn’t let me intervene here, Operator would’ve have wasted hundreds of dollars on a parking spot I didn’t need.

Hallucinations like this are a key roadblock to actually useful autonomous agents – ones that can take bothersome tasks off your plate. No one will trust agents if they’re prone to making basic mistakes, especially mistakes with real-world consequences.

With Operator, OpenAI seems to have built some impressive tools to let AI systems browse the web. But these tools won’t amount to much until the underpinning AI can reliably do what users ask it to do. Until then, humans will be stuck assisting agents — not the other way around. And that kind of defeats the point.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

OpenAI Operator AI代理 自动化 网页浏览 人工智能
相关文章