cs.AI updates on arXiv.org 07月31日 12:48
Toward Automated Validation of Language Model Synthesized Test Cases using Semantic Entropy
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了一种名为VALTEST的框架,利用语义熵自动验证LLM生成的测试用例,提高测试用例的有效性,并提升代码生成性能。

arXiv:2411.08254v2 Announce Type: replace-cross Abstract: Modern Large Language Model (LLM)-based programming agents often rely on test execution feedback to refine their generated code. These tests are synthetically generated by LLMs. However, LLMs may produce invalid or hallucinated test cases, which can mislead feedback loops and degrade the performance of agents in refining and improving code. This paper introduces VALTEST, a novel framework that leverages semantic entropy to automatically validate test cases generated by LLMs. Analyzing the semantic structure of test cases and computing entropy-based uncertainty measures, VALTEST trains a machine learning model to classify test cases as valid or invalid and filters out invalid test cases. Experiments on multiple benchmark datasets and various LLMs show that VALTEST not only boosts test validity by up to 29% but also improves code generation performance, as evidenced by significant increases in pass@1 scores. Our extensive experiments also reveal that semantic entropy is a reliable indicator to distinguish between valid and invalid test cases, which provides a robust solution for improving the correctness of LLM-generated test cases used in software testing and code generation.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

语义熵 LLM 测试用例 代码生成 性能提升
相关文章