MarkTechPost@AI 2024年08月11日
TestART: Achieving 78.55% Pass Rate and 90.96% Coverage with a Co-Evolutionary Approach to LLM-Based Unit Test Generation and Repair
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

TestART是一种利用大型语言模型 (LLM) 生成单元测试的新方法,通过结合自动生成和迭代修复来克服现有方法的局限性。TestART采用模板修复技术和提示注入机制,引导模型生成高质量的测试用例。在Defects4J基准测试中,TestART取得了78.55%的通过率和90.96%的覆盖率,显著优于其他方法。

😊 TestART通过将大型语言模型 (LLM) 与协同进化修复过程相结合,克服了现有方法的局限性,提高了自动生成单元测试的效率和可靠性。 TestART首先使用ChatGPT-3.5模型生成初始单元测试用例,然后通过修复过程来解决常见问题,例如编译错误、运行时错误和断言错误。修复过程采用固定模板来纠正LLM生成的测试用例中常见的错误。修复后的测试用例会被重新编译和执行,并提取覆盖率信息,为进一步优化提供反馈。此迭代过程持续进行,直到测试用例达到所需的质量标准,重点是实现更高的覆盖率和准确性。

😄 TestART在Defects4J基准测试中进行了广泛的实验,该基准测试包括从五个Java项目中提取的8192个焦点方法。实验结果表明,TestART显著优于现有方法,包括EvoSuite和ChatUniTest。具体而言,TestART生成的测试用例的通过率为78.55%,比ChatGPT-4.0模型和基于ChatGPT-3.5的ChatUniTest方法的通过率高出约18%。TestART在通过测试的焦点方法上实现了90.96%的代码行覆盖率,超过了EvoSuite 3.4%。这些结果表明,TestART通过有效利用LLM的功能,同时解决其固有缺陷,能够生成高质量的单元测试用例。

😉 TestART通过解决现有LLM方法的局限性,实现了更高的通过率和更好的覆盖率,使其成为软件开发人员寻求确保其代码可靠性和质量的宝贵工具。南京大学和华为云计算技术有限公司的研究团队进行的研究证明了将LLM与协同进化修复过程相结合的潜力,可以生成更有效和可靠的单元测试。TestART的通过率为78.55%,覆盖率为90.96%,为自动化单元测试生成树立了新标准。

Unit testing aims to identify and resolve bugs at the earliest stages by testing individual components or units of code. This process ensures software reliability and quality before the final product is delivered. Traditional methods of unit test generation, such as search-based, constraint-based, and random-based techniques, have been utilized to automate the creation of unit tests. These methods aim to maximize the coverage of software components, thereby minimizing the chances of undetected bugs. However, the manual creation and maintenance of unit tests are time-consuming and labor-intensive, necessitating the development of automated solutions.

The primary challenge in automated unit test generation lies in the limitations of existing methods. Large Language Models (LLMs), such as ChatGPT, have shown significant potential in generating unit tests. However, these models often fall short due to their inability to create valid test cases consistently. Common issues include compilation errors caused by insufficient context, runtime errors resulting from inadequate feedback mechanisms, and repetitive loops during self-repair attempts, which hinder the models from producing high-quality test cases. These limitations highlight the need for more robust and reliable methods to leverage LLMs’ strengths while addressing their inherent weaknesses.

Existing automated unit test generation tools, including those based on search-based software testing (SBST) and LLMs, offer various approaches to tackle these challenges. SBST tools like EvoSuite employ evolutionary algorithms to create test cases that aim to improve code coverage. However, the tests generated by these tools often differ significantly from human-written tests, making them difficult to read, understand, and modify. On the other hand, while more aligned with human-like reasoning, LLM-based methods still need help with issues such as invalid context handling and low pass rates. These existing methods need to be revised to ensure the development of a more effective solution.

Researchers from Nanjing University and Huawei Cloud Computing Technologies Co., Ltd. have introduced a novel approach called TestART. This method enhances LLM-based unit test generation through a co-evolutionary process integrating automated generation with iterative repair. TestART is designed to overcome the limitations of LLMs by incorporating template-based repair techniques and prompt injection mechanisms. These innovations guide the model’s subsequent generation processes, helping to avoid repetition and enhance the overall quality of the generated test cases.

TestART operates by first generating initial unit test cases using the ChatGPT-3.5 model. These initial test cases are then subjected to a rigorous repair process that addresses common issues such as compilation errors, runtime failures, and assertion errors. The repair process employs fixed templates tailored to correct the mistakes typically produced by LLM-generated tests. Once repaired, the test cases are recompiled and executed, with coverage information being extracted to provide feedback for further refinement. This iterative process continues until the test cases meet the desired quality standards, focusing on achieving higher coverage and accuracy.

The effectiveness of TestART has been demonstrated through extensive experiments conducted on the widely adopted Defects4J benchmark, which includes 8192 focal methods extracted from five Java projects. The results of these experiments show that TestART significantly outperforms existing methods, including EvoSuite and ChatUniTest. Specifically, TestART achieved a pass rate of 78.55% for the generated test cases, approximately 18% higher than the pass rates of both the ChatGPT-4.0 model and the ChatUniTest method based on ChatGPT-3.5. TestART achieved an impressive line coverage rate of 90.96% on the focal methods that passed the test, exceeding EvoSuite by 3.4%. These results underscore TestART’s superior ability to produce high-quality unit test cases by effectively harnessing the power of LLMs while addressing their inherent flaws.

In conclusion, TestART, by addressing the limitations of existing LLM-based methods, achieves higher pass rates and better coverage, making it a valuable tool for software developers seeking to ensure the reliability and quality of their code. The research conducted by the team from Nanjing University and Huawei Cloud Computing Technologies Co., Ltd. demonstrates the potential of combining LLMs with co-evolutionary repair processes to produce more effective and reliable unit tests. With a pass rate of 78.55% and a coverage rate of 90.96%, TestART sets a new standard for automated unit test generation.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here


The post TestART: Achieving 78.55% Pass Rate and 90.96% Coverage with a Co-Evolutionary Approach to LLM-Based Unit Test Generation and Repair appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

单元测试 大型语言模型 TestART 协同进化 软件测试
相关文章