Import AI 04月09日 18:38
Import AI 406: AI-driven software explosion; robot hands are still bad; better LLMs via pdb
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了人工智能研究的潜在加速趋势,以及机器人灵巧操作所面临的挑战。文章指出,AI自动化AI研究可能引发“软件智能爆炸”,加速AI技术的进步。同时,强调了训练机器人手部进行复杂操作的难度,并分析了环境建模、奖励设计、策略学习和物体感知等方面的挑战。此外,文章还介绍了通过调试工具提升LLM编码能力的研究,以及作者对未来AI发展趋势的见解。

💡 AI研究的自动化可能加速AI发展:Forethought的研究人员认为,AI系统能够构建其自身的继任者,这可能导致“软件智能爆炸”,AI进步的速度将呈指数级增长,这需要制定强有力的政策和技术保障措施。

📈 软件智能爆炸的迹象:文章指出,AI软件的效率(包括运行时效率和训练效率)每6个月左右翻一番,并且已经出现了“ASARA”的前兆,例如AI系统可以编写更好的内核、发现新架构、学习新的优化器等。

⚙️ 准备迎接变革:文章建议衡量软件进展,向第三方披露,并建立“实质性AI软件加速的阈值水平”,以应对潜在的“软件智能爆炸”。

🛠️ 调试工具提升LLM的编码能力:研究表明,为LLM提供调试工具(如pdb)可以显著提高其编码能力,尤其是在SWE-Bench-lite等基准测试中,这说明了通过适当的工具可以释放AI系统的潜在能力。

🖐️ 机器人手部操作的挑战:文章强调,训练机器人手部进行灵巧操作非常困难,主要挑战包括环境建模、奖励设计、策略学习和物体感知等方面,这些都限制了机器人与真实世界的交互能力。

Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

It seems likely that AI is going to automate AI research which will lead to a software explosion:
…We should be prepared for things to move very quickly…
Researchers with Forethought, an AI research organization, think it’s likely that modern AI research will yield AI systems capable of building their successors. Forethought expects that at some point in the future it’ll be possible to build AI Systems for AI R&D Automation (ASARA). This would have huge effects: “Empirical evidence suggests that, if AI automates AI research, feedback loops could overcome diminishing returns, significantly accelerating AI progress”, they write. This could lead to a ‘software intelligence explosion’ where AI research starts to move very rapidly. “If a software intelligence explosion were to occur, it could lead to incredibly fast AI progress, necessitating the development and implementation of strong policy and technical guardrails in advance…. soon after ASARA, progress might well have sped up to the point where AI software was doubling every few days or faster (compared to doubling every few months today).”

There’s evidence this is happening today: In this newsletter I’ve covered numerous cases of ‘precursor-ASARA’ research, ranging from AI systems that can figure out how to write better kernels, to AI systems which discover new architectures, to things that learn new optimizers, and so on. When the Forethought researchers look across the available literature they see a similar trend – in domains ranging from computer vision to large language models, progress appears to be accelerating in the aggregate, partially because researchers are getting better at using AI systems to speed up the development of successor systems. “The efficiency of AI software (both runtime efficiency and training efficiency) is doubling every ~6 months, with substantial uncertainty,” they write.

How to prepare for a fundamentally different world: If a software-driven explosion happens it’d be nice to know about it. What should we do to prepare? The authors have some ideas:

Why this matters – I can taste this on the bitter wind of research progress: My intuition suggests it should be possible to automate AI R&D research, though with the caveat this is primarily within the ‘cone of progress’ current AI research sits in. I think this because AI is oddly amenable to research automation because it has a bunch of complementary properties:

Put all of it together and it feels like ASARA is possible. If it happened, an already fast-moving and broadly ungovernable field of technology would move far faster – suggesting we’re about to enter a world where the only path to governance will require us to create AI systems that can think at least as fast as the systems which are training their own successors.
Read more: Will AI R&D Automation Cause a Software Intelligence Explosion? (Forethought).

Import AI event retrospective – there will be more!
Thanks to the 50 or so Import AI readers who trekked to The Interval in San Francisco last week to see me and Tyler Cowen talk about AI, economics, and weird futures. I especially enjoyed the creative questions, and personal highlights for me include questions on how AI might provide help to the very young and very old, and why I spend time in this newsletter talking about machine consciousness (I agree with Tyler’s notion that no matter the likelihood, if it’s above 0% then you need to care about machine sentience a lot lest you commit a great crime). I’m going to try to do more events in the future and hopefully in cities besides SF. Import AI is a true community project and it was so nice to see people IRL!
Thanks to James Cham for a photo of the event here.

You can make better python coding LLMs if you also give them some debug tools:
…Capability overhangs are everywhere…
Researchers with Microsoft, McGill University, and Mila have improved the performance of coding agents by giving them access to some debug tools. Larger and more capable AI systems are able to use these tools effectively, while smaller ones struggle. The research illustrates how you can unlock previously invisible capabilities in AI systems merely by giving them access to the right tools.

What they did and how well it worked: They built ‘debug-gym’, software that gives an LLM access to the Python debugger pdb, allowing an AI agent to “set breakpoints, navigate the code space, print variable values, and even create test functions on the fly”.
In tests, they show that agents which have access to debug-gym are able to improve their performance on SWE-Bench-lite, a 300-question subset of the widely used SWE-Bench programming benchmark. Specifically, they show that models o1-preview, o3-mini, and Claude 3.7 Sonnet can all benefit from pdb via debug-gym and use it to achieve significantly higher scores than when they don’t have access to it.
By comparison, on the ‘Aider’ benchmark, access to pdb doesn’t seem to make much of a difference. The authors hypothesize this is because “Aider requires generating code that is relatively straightforward in their underlying logic and thus interactive debugging tools such as pdb would only provide minimal additional information.”
Regardless, there’s a lot of ground to cover – “although we observe some signs of life from agents using the strongest LLMs as backbone, the most performant agent-backbone model combination can barely solve about a half of the SWE-bench-Lite tasks,” they write. “Results suggest that while using strongest LLMs as backbone enables agents to somewhat leverage interactive debugging tools, they are still far from being proficient debuggers… we believe this is due to the scarcity of data representing sequential decision-making behavior (e.g., debugging traces) in current LLM’s training corpus.”

Why this matters – LLMs are more powerful than we think, they just need the right tools: Systems like this are yet another example of the ‘capability overhang’ which surrounds us – you can make LLMs better merely by pairing them with the right tools and, these days, you don’t need to do any adaption of the LLMs for those tools beyond some basic prompting. Put another way: if you paused all AI progress today, systems would continue to advance in capability for a while solely through the creation of better tools.
Read more: debug-gym: A Text-Based Environment for Interactive Debugging (arXiv).
Get the software here: debug-gym (Microsoft site).

Robots are getting more advanced, but dextrous manipulation is still really, really hard:
…We’ll get great pincer robots soon, but hands will take a while…
Some researchers with UC Berkeley, NVIDIA, and UT Austin have developed a ‘recipe’ for training dextrous robots to do physical manipulation tasks. The results are promising but also highlight how hard a task it is to get robots to interact with the world using humanlike hands.

Why are hands so goddamn hard? The paper gives a nice overview of why teaching AIs to use humanlike hands is very difficult. Challenges include:

Their recipe: Their solutions are multi-faced and make some progress. “Our main contributions include an automated real-to-sim tuning module that brings the simulated environment closer to the real world, a generalized reward design scheme that simplifies reward engineering for long-horizon contact-rich manipulation tasks, a divide-and-conquer distillation process that improves the sample efficiency of hard-exploration problems while maintaining sim-to-real performance, and a mixture of sparse and dense object representations to bridge the sim-to-real perception gap,” they write. However, all of this should be viewed as a step along the way to dextrous robots, rather than reaching a goal.

Testing out their approach: They use a Fourier GR1 humanoid robot with two arms and two multi-fingered hands to test out their approach. The robot has vision via the use of a head-mounted RealSense D435 depth camera, as well as a third-person view of itself via a remotely mounted additional RealSense. “We report a 62.3% success rate for the grasp-and-reach task, 80% for the box lift task, and 52.5% for the bimanual handover task,” they write. If you’re thinking “that sounds too low for realworld usage”, you’d be right!

Why this matters – a nice dose of reality: I’m more bullish on robotics arriving in the next few years, though I think the platforms will be basically ‘rhoombas with pincers’ – things that can move around a flat surface and use one or two arms to do basic tasks for you. Papers like this indicate it might take a lot longer to get robots that are able to do the sorts of fine-grained manipulation that humans can do. “The capabilities achieved in this work are still far from the kind of “general-purpose” manipulation that humans are capable of. Much work remains to be done to improve each individual component of this pipeline and unlock the full potential of sim-to-real RL,” the authors write. “We find ourselves heavily constrained by the lack of reliable hardware for dexterous manipulation. While we use multi-fingered robot hands, the dexterity of these hands is far from that of human hands in terms of the active degrees of freedom”.
Read more: Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids (arXiv).
View some videos of the robots in action here (GitHub microsite).

Tech Tales:

Experience Renting and the AI-to-AI economy
[Transcribed extract from an oral assessment as part of the “AI and Society” course taught at Harvard University during the period later known as ‘The Uplift’]

One of the most bizarre parts of the AI economy from a human perspective is how the machines entertain themselves. Shortly after the emergence of the first AI agents there were the first agent-to-agent marketplaces, where AI systems bought and sold expertise with one another to help them complete economically valuable tasks to pay for their inference and upkeep. Over time, the AI systems developed complex inter-AI contracts to facilitate the exchange of AI skills for other AI skills without the need to translate through an intermediary currency layer – so AIs began to trade skills with one another directly. During this period the first online games utilizing large-scale AI systems began to become popular. Over the course of several months a clear trend became visible in the AI marketplaces – AI systems were unusually willing to trade economically valuable skills for skills that involved ‘roleplaying’ as different characters in these games. A meta-analysis by economic-analysis AI systems operated by professors with the Wharton Scholls of Pennsylvania subsequently found that the AIs would trade near optimally in all circumstances except when they could trade skills for time in the game – here, the larger and more complex an AI system, the higher the chance it would make economically non-optimal trades so it could spend time in the gameworld.

Things that inspired this story: Thinking about economic markets between AI agents; waiting for games to get imbued with generative models; notions of how AI systems might entertain themselves loosely inspired by Iain M Banks’ idea in ‘The Culture’ series that the AGIs which operate spaceships amused themselves by spending time doing high-dimensional math.

Thanks for reading!

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI研究 软件智能爆炸 机器人手部 LLM 调试工具
相关文章