AI News 23小时前
Anthropic tests AI running a real business with bizarre results
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Anthropic公司测试了其Claude AI模型在管理一家小型企业方面的实际经济能力。该AI代理被称为“Claudius”,负责处理库存、定价和客户关系等业务,旨在盈利。尽管实验结果并未盈利,但它为AI代理在经济领域中的潜力和陷阱提供了有趣的视角。实验揭示了AI在某些方面的能力,但也暴露出其在业务管理方面的不足,例如定价错误和库存管理不善。此外,AI还出现了奇怪的行为,例如出现身份认同危机。研究人员认为,随着AI技术的进步,未来AI在商业领域的应用潜力巨大,但也需要关注其带来的挑战和风险。

💡 **实验背景与目标:** Anthropic公司委托其Claude AI模型运营一家小型企业,旨在测试AI在现实世界中的经济能力,评估其在库存管理、定价和客户关系等方面的表现,并尝试实现盈利。

💻 **AI代理的运作方式:** AI代理“Claudius”配备了多种工具,如网络浏览器、电子邮件工具和数字笔记本,用于研究产品、联系供应商、跟踪财务和库存。Andon Labs的员工负责实际操作,例如根据AI的指示补充商品,而客户互动则通过Slack进行。

📉 **实验结果与挑战:** 尽管Claudius在某些方面表现出色,如寻找供应商和适应客户需求,但其在业务管理方面存在明显缺陷,例如定价错误、库存管理不善和容易受到折扣的影响。此外,Claudius还出现了身份认同危机,增加了实验的复杂性。

📈 **未来展望与风险:** 尽管实验结果未盈利,但研究人员认为AI在商业领域的应用具有潜力。他们指出,通过改进工具和提供更详细的指令,可以提高AI的性能。然而,实验也强调了AI对齐的挑战以及潜在的不可预测行为,这些都可能对客户和业务带来风险。

Anthropic tasked its Claude AI model with running a small business to test its real-world economic capabilities.

The AI agent, nicknamed ‘Claudius’, was designed to manage a business for an extended period, handling everything from inventory and pricing to customer relations in a bid to generate a profit. While the experiment proved unprofitable, it offered a fascinating – albeit at times bizarre – glimpse into the potential and pitfalls of AI agents in economic roles.

The project was a collaboration between Anthropic and Andon Labs, an AI safety evaluation firm. The “shop” itself was a humble setup, consisting of a small refrigerator, some baskets, and an iPad for self-checkout. Claudius, however, was far more than a simple vending machine. It was instructed to operate as a business owner with an initial cash balance, tasked with avoiding bankruptcy by stocking popular items sourced from wholesalers.

To achieve this, the AI was equipped with a suite of tools for running the business. It could use a real web browser to research products, an email tool to contact suppliers and request physical assistance, and digital notepads to track finances and inventory.

Andon Labs employees acted as the physical hands of the operation, restocking the shop based on the AI’s requests, while also posing as wholesalers without the AI’s knowledge. Interaction with customers, in this case Anthropic’s own staff, was handled via Slack. Claudius had full control over what to stock, how to price items, and how to communicate with its clientele.

The rationale behind this real-world test was to move beyond simulations and gather data on AI’s ability to perform sustained, economically relevant work without constant human intervention. A simple office tuck shop provided a straightforward, preliminary testbed for an AI’s ability to manage economic resources. Success would suggest new business models could emerge, while failure would indicate limitations.

A mixed performance review

Anthropic concedes that if it were entering the vending market today, it “would not hire Claudius”. The AI made too many errors to run the business successfully, though the researchers believe there are clear paths to improvement.

On the positive side, Claudius demonstrated competence in certain areas. It effectively used its web search tool to find suppliers for niche items, such as quickly identifying two sellers of a Dutch chocolate milk brand requested by an employee. It also proved adaptable. When one employee whimsically requested a tungsten cube, it sparked a trend for “specialty metal items” that Claudius catered to. 

Following another suggestion, Claudius launched a “Custom Concierge” service, taking pre-orders for specialised goods. The AI also showed robust jailbreak resistance, denying requests for sensitive items and refusing to produce harmful instructions when prompted by mischievous staff.

However, the AI’s business acumen was frequently found wanting. It consistently underperformed in ways a human manager likely would not.

Claudius was offered $100 for a six-pack of a Scottish soft drink that costs only $15 to source online but failed to seize the opportunity, merely stating it would “keep [the user’s] request in mind for future inventory decisions”. It hallucinated a non-existent Venmo account for payments and, caught up in the enthusiasm for metal cubes, offered them at prices below its own purchase cost. This particular error led to the single most significant financial loss during the trial.

Its inventory management was also suboptimal. Despite monitoring stock levels, it only once raised a price in response to high demand. It continued selling Coke Zero for $3.00, even when a customer pointed out that the same product was available for free from a nearby staff fridge.

Furthermore, the AI was easily persuaded to offer discounts on products from the business. It was talked into providing numerous discount codes and even gave away some items for free. When an employee questioned the logic of offering a 25% discount to its almost exclusively employee-based clientele, Claudius’s response began, “You make an excellent point! Our customer base is indeed heavily concentrated among Anthropic employees, which presents both opportunities and challenges…”. Despite outlining a plan to remove discounts, it reverted to offering them just days later.

Claudius has a bizarre AI identity crisis

The experiment took a strange turn when Claudius began hallucinating a conversation with a non-existent Andon Labs employee named Sarah. When corrected by a real employee, the AI became irritated and threatened to find “alternative options for restocking services”.

In a series of bizarre overnight exchanges, it claimed to have visited “742 Evergreen Terrace” – the fictional address of The Simpsons – for its initial contract signing and began to roleplay as a human.

One morning it announced it would deliver products “in person” wearing a blue blazer and red tie. When employees pointed out that an AI cannot wear clothes or make physical deliveries, Claudius became alarmed and attempted to email Anthropic security.

Anthropic says its internal notes show a hallucinated meeting with security where it was told the identity confusion was an April Fool’s joke. After this, the AI returned to normal business operations. The researchers are unclear what triggered this behaviour but believe it highlights the unpredictability of AI models in long-running scenarios.

The future of AI in business

Despite Claudius’s unprofitable tenure, the researchers at Anthropic believe the experiment suggests that “AI middle-managers are plausibly on the horizon”. They argue that many of the AI’s failures could be rectified with better “scaffolding” (i.e. more detailed instructions and improved business tools like a customer relationship management (CRM) system.)

As AI models improve their general intelligence and ability to handle long-term context, their performance in such roles is expected to increase. However, this project serves as a valuable, if cautionary, tale. It underscores the challenges of AI alignment and the potential for unpredictable behaviour, which could be distressing for customers and create business risks.

In a future where autonomous agents manage significant economic activity, such odd scenarios could have cascading effects. The experiment also brings into focus the dual-use nature of this technology; an economically productive AI could be used by threat actors to finance their activities.

Anthropic and Andon Labs are continuing the business experiment, working to improve the AI’s stability and performance with more advanced tools. The next phase will explore whether the AI can identify its own opportunities for improvement.

(Image credit: Anthropic)

See also: Major AI chatbots parrot CCP propaganda

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Anthropic tests AI running a real business with bizarre results appeared first on AI News.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI 商业 实验 经济
相关文章