MarkTechPost@AI 04月30日 14:40
Can Coding Agents Improve Themselves? Researchers from University of Bristol and iGent AI Propose SICA (Self-Improving Coding Agent) that Iteratively Enhances Its Own Code and Performance
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

布里斯托大学和iGent AI的研究人员提出了SICA(自改进编码智能体),这是一种新型智能体架构,旨在通过修改其底层代码来迭代地提高自身性能。SICA将执行任务、评估性能、识别缺点和更新自身实现的角色统一起来,实现了一个无需外部干预的持续自改进循环。实验表明,SICA在代码相关基准测试中取得了显著的性能提升,尤其是在软件工程等领域。该框架为智能体系统的自主改进提供了一条具体的路径。

💡SICA是一种新型智能体架构,核心在于让智能体通过修改自身代码来迭代提升性能,无需外部干预,实现了自我驱动的持续改进。

🛠️SICA建立在一个最小化的、可扩展的基础智能体之上,配备了操作代码库、导航目录、执行shell命令和调用子智能体的工具。其架构遵循评估、选择、修改的循环。

📈SICA在多个代码相关基准测试中进行了评估,包括SWE Bench Verified和LiveCodeBench。结果表明,经过迭代,智能体在准确性、执行延迟和资源效率方面都取得了显著的提升。例如,在SWE Bench Verified上的准确率从17%提高到53%,文件编辑性能从82%提高到94%。

🛡️SICA引入了异步监督机制,使用LLM线程来监控智能体,确保其保持在任务范围内,并在出现无进展或偏离的情况下停止执行,从而保证了安全性和可控性。

The development of agentic systems—LLMs embedded within scaffolds capable of tool use and autonomous decision-making—has made significant progress. Yet, most implementations today rely on fixed, hand-crafted orchestration strategies. These designs are inherently constrained, limiting the agent’s adaptability to new tasks and environments. As models grow in capability, the rigidity of their execution frameworks becomes a bottleneck, especially in domains such as software engineering where the task complexity and variability demand a more flexible system.

In response, researchers from the University of Bristol and iGent AI have introduced SICA (Self-Improving Coding Agent)—a novel agent architecture designed to iteratively enhance its own performance by modifying its underlying code. Unlike prior methods, such as ADAS, which split responsibilities between a meta-agent and a target-agent, SICA unifies these roles. The same agent that performs the task is also responsible for evaluating past performance, identifying shortcomings, and updating its own implementation. This integration allows for a continuous loop of self-directed improvement without external intervention.

Architecture and Mechanism of Self-Improvement

SICA is built upon a minimal, extensible base agent equipped with tools to manipulate its codebase, navigate directories, execute shell commands, and invoke sub-agents. Its architecture follows a loop: evaluate, select, revise. At each iteration, the agent benchmarks its own performance on predefined tasks, stores results, and selects the most effective prior version to serve as the basis for further improvement.

The agent evaluates performance using a utility function that combines accuracy, time, and cost metrics. Key components include:

This structure allows the agent to conduct controlled experiments on its own design and deploy updates that demonstrably improve outcomes.

Empirical Evaluation

The researchers evaluated SICA on several code-related benchmarks, including a subset of SWE Bench Verified, LiveCodeBench, and synthetic tasks focused on file editing and symbol location. Results indicate measurable gains across iterations. For instance, accuracy on SWE Bench Verified increased from 17% to 53%, and file editing performance improved from 82% to 94%.

These improvements were not limited to benchmark scores. The agent also optimized execution latency and resource efficiency, reducing average cost and time per task. Notably, improvements were not the result of weight updates to the underlying LLM but were achieved through changes in tool orchestration, file management strategies, and problem decomposition heuristics.

However, gains were less pronounced on reasoning-dominant tasks such as AIME and GPQA. In these cases, the performance of the base LLM (e.g., o3-mini) already approached the task ceiling, limiting the marginal benefit of additional scaffolding. Moreover, introducing certain tool-based reasoning steps appeared to disrupt rather than enhance the performance of pretrained reasoning models, suggesting a need for more integrated co-training between agent logic and model behavior.

Conclusion

The SICA framework illustrates a concrete path toward autonomous improvement in agent systems. By consolidating execution and self-editing within a single agent, the system avoids many pitfalls of manual design and enables iterative refinement driven by empirical feedback. The results show that this approach is viable, particularly in domains with long-horizon, tool-mediated tasks such as software engineering.

While there are clear boundaries to the effectiveness of scaffold-only improvements—especially for tasks dominated by pure reasoning—the research establishes a foundation for future work in hybrid optimization, where both the model and the agent design evolve jointly. SICA also introduces practical considerations for safety and observability in self-improving systems, using LLM-based overseers and structured execution traces to ensure transparency and control.


Check out the Paper and GitHub Page. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

[Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

The post Can Coding Agents Improve Themselves? Researchers from University of Bristol and iGent AI Propose SICA (Self-Improving Coding Agent) that Iteratively Enhances Its Own Code and Performance appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

SICA 自改进编码智能体 AI智能体
相关文章