MarkTechPost@AI 07月19日 07:40
EG-CFG: Enhancing Code Generation with Real-Time Execution Feedback
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

EG-CFG是一种由以色列特拉维夫大学研究人员提出的新型代码生成方法,其核心创新在于将实时执行反馈引入代码生成过程。与传统模型依赖静态代码示例不同,EG-CFG在代码编写过程中同步评估部分代码的执行效果,从而更有效地引导模型生成功能正确的代码。该方法结合了束搜索(beam search)生成多个代码选项,并通过抽象语法树(AST)解析进行语法检查,接着在测试用例上执行以获取详细的运行时信息。这些反馈被整合到模型提示中,指导后续预测,从而实现代码的逐步优化和纠错。EG-CFG在MBPP、HumanEval等基准测试中表现出色,甚至超越了GPT-4和Claude等大型闭源模型,为AI代码生成领域带来了显著的进步。

🌟 EG-CFG通过实时执行反馈革新代码生成:该方法模拟人类程序员边写边测试的习惯,在代码生成过程中即时评估部分代码的执行情况,而非等到全部生成后再进行测试和修正。这种连续的反馈循环能显著提高代码的正确性和功能性。

🚀 束搜索与AST解析的结合:EG-CFG利用束搜索同时生成多个潜在的代码补全选项,并通过抽象语法树(AST)解析来确保其语法正确性。只有通过语法检查的代码片段才会被执行,从而收集关键的运行时数据,如变量状态和错误信息。

💡 执行反馈的整合机制:收集到的运行时反馈被无缝地注入到模型的提示中,用于指导模型进行下一步的预测。这种机制允许模型逐步修正其输出,直至代码能够成功通过所有测试用例,大大提升了生成功能性代码的效率和准确率。

🏆 基准测试中的卓越表现:EG-CFG在MBPP、HumanEval、CodeContests等多个标准代码生成基准测试中展现出强大的性能,尤其在HumanEval上,使用DeepSeek V3的模型能正确解决90.1%的任务,超越了GPT-4(85.5%)和Claude 2(83.2%)。即使是较小的1.3B参数模型,在EG-CFG的引导下,性能也得到了显著提升。

⚙️ 模拟人类调试过程:EG-CFG的核心在于其能够模拟人类程序员的调试过程,通过结构化且可操作的反馈来指导语言模型。这种方法不仅提高了代码生成的质量,还支持多代理并行工作,进一步提升了整体效率。

LLMs have made impressive strides in generating code for various programming tasks. However, they mostly rely on recognizing patterns from static code examples rather than understanding how the code behaves during execution. This often leads to programs that look correct but fail when run. While recent methods introduce iterative refinement and self-debugging, they typically act in separate steps, generating, testing, and then revising. Unlike human programmers who constantly run fragments of code and adjust based on real-time output, these models cannot integrate execution feedback continuously, limiting their effectiveness in producing truly functional code.

The Role of Program Synthesis and Prompting in Code Generation

Program synthesis has long been used to evaluate LLMs and automate code generation benchmarks, such as MBPP, HumanEval, and CodeContests, by testing models on various coding challenges. While prompting strategies, such as few-shot and Chain-of-Thought, have improved performance, newer methods now incorporate feedback loops that utilize tools or execution results to refine outputs. Some frameworks even assign tasks to multiple LLM agents, each tackling different aspects of the problem. However, most approaches still rely on simple decoding methods. Unlike traditional strategies, newer guidance techniques, such as CFG, offer a more dynamic approach but haven’t yet been widely applied with real-time execution feedback.

Introducing EG-CFG: Execution-Guided Code Generation from Tel Aviv University

Researchers at Tel Aviv University have introduced EG-CFG, a new method for code generation that actively utilizes execution feedback during the generation process, a technique commonly employed by human programmers. Instead of waiting until the end, EG-CFG evaluates partial code as it’s being written, guiding the model toward correct and executable outputs. It uses a beam search to generate multiple code options, runs them, and integrates runtime outcomes to influence the next steps. This real-time feedback loop significantly boosts performance across standard benchmarks, such as MBPP, HumanEval, and CodeContests, even surpassing closed-source models, while also enabling efficient parallel reasoning and dynamic exploration.

How EG-CFG Works: Real-Time Feedback Meets Beam Search and AST Parsing

The EG-CFG method improves code generation by guiding language models using real-time execution feedback during inference. For a given programming task, it generates partial code solutions and explores multiple continuations using beam search. These continuations are checked for syntax using AST parsing, and only valid ones are executed on test cases to gather detailed runtime traces, including variable states and errors. This feedback is then injected into the model’s prompt to inform future predictions. A guidance mechanism interpolates between the model’s standard output and feedback-informed suggestions, helping the model refine its solution step by step until it passes all test cases.

Benchmark Results: EG-CFG Outperforms GPT-4 and Claude on HumanEval and MBPP-ET

The EG-CFG method was tested using two versions of DeepSeek models: a 1.3B parameter model locally and the larger V3-0324 model through an API. It was evaluated on five code benchmarks: MBPP, HumanEval, CodeContests, MBPP-ET, and HumanEval-ET. On HumanEval, EG-CFG with DeepSeek V3 solved 90.1% of the tasks correctly, outperforming GPT-4 (85.5%) and Claude 2 (83.2%). On MBPP-ET, it achieved an 81.4% accuracy rate, setting a new benchmark. Notably, the smaller 1.3B model also showed strong gains, improving from 46.3% to 61.7% on HumanEval when guided with EG-CFG. An ablation study confirmed the importance of components like dynamic feedback and beam search in driving these results.

Conclusion: EG-CFG Simulates Human Debugging to Advance Code Generation

In conclusion, the EG-CFG method introduces a new way to generate code using language models by incorporating real-time execution feedback during generation. Unlike traditional approaches that rely on static patterns, EG-CFG simulates how human programmers test and refine code. It uses beam search to explore possible code completions, tests them with real inputs, and then guides generation based on the results. This happens line by line, ensuring feedback is both structured and actionable. The method also supports multiple agents working in parallel, boosting efficiency. EG-CFG achieves top accuracy across standard benchmarks, showing strong results even on complex coding tasks and with smaller models.


Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project.

The post EG-CFG: Enhancing Code Generation with Real-Time Execution Feedback appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

EG-CFG 代码生成 人工智能 执行反馈 程序合成
相关文章