MarkTechPost@AI 2024年12月23日
Meet LLMSA: A Compositional Neuro-Symbolic Approach for Compilation-Free, Customizable Static Analysis with Reduced Hallucinations
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

LLMSA是一种新型的神经符号框架,旨在突破传统静态分析的瓶颈。它无需编译即可工作,并提供完全的定制化能力。LLMSA采用面向数据日志的策略语言,将复杂的分析任务分解为更小的子问题,并结合确定性解析和神经推理来解决语言模型中的幻觉问题。该框架通过延迟计算、增量处理和并行执行等技术,提高了效率,并在别名分析、程序切片和错误检测等任务中表现出色,优于现有工具。LLMSA的出现为软件开发中的静态分析带来了变革性的方法。

🛠️LLMSA框架采用神经符号方法,结合了符号构造器和神经组件。符号构造器确定抽象语法树(AST),获取语法特征;神经组件利用大型语言模型(LLM)推理语义关系,从而实现精确的分析。

💡LLMSA 使用数据日志风格的策略语言,允许用户直观地定义分析任务,将其分解为具体的规则进行检查。这种方法简化了复杂分析任务的定制过程,使得用户能够根据自己的需求灵活地调整分析策略。

⚡LLMSA 框架通过延迟计算、增量处理和并行执行等技术来优化性能。延迟计算仅在必要时执行神经操作,增量处理避免了迭代过程中的冗余计算,并行执行则允许独立的规则同时运行,从而显著提高了计算效率。

🎯在别名分析、程序切片和错误检测等任务中,LLMSA的精度和召回率均优于现有工具,并在TaintBench数据集中识别出55/70的污点漏洞,召回率比工业级工具高出37.66%。这证明了LLMSA在多种静态分析任务中的有效性和多功能性。

Static analysis is an inherent part of the software development process since it enables such activities as bug finding, program optimization, and debugging. The traditional approaches have two major drawbacks: methods based on code compilation are bound to fail in any development scenario where the code is incomplete or rapidly changing, and the need for tailoring calls for intimate knowledge of compiler internals and IRs inaccessible to many developers. These issues prevent static analysis tools from being widely used in real-world scenarios.

The existing static analysis tools, such as FlowDroid and Infer, use IRs to detect issues in programs. However, they rely on compilation, which limits their usability in dynamic and incomplete codebases. Furthermore, they do not have enough support for tailoring analysis tasks to the needs of specific users; rather, customization requires deep knowledge of compiler infrastructures. Query-based systems such as CodeQL, which seek to mitigate these constraints, nevertheless present significant learning challenges stemming from intricate domain-specific languages and comprehensive application programming interfaces. These deficiencies limit their efficiency and uptake in various programming contexts.

Researchers from Purdue University, Hong Kong University of Science and Technology, and Nanjing University have designed LLMSA. This neuro-symbolic framework aims to break the bottlenecks associated with traditional static analysis by enabling compilation-free functionality and full customization. The LLMSA framework uses datalog-oriented policy language to decompose complex analytical tasks into smaller, more tractable sub-problems. The methodology successfully addresses the hallucination errors in language models by combining deterministic parsing focused on syntactic attributes with neural reasoning targeted toward semantic elements. Furthermore, its implementation of complex techniques such as lazy evaluation wherein neural calculations are postponed until needed and incremental and parallel processing that optimize the utilization of computational resources while minimizing redundancy significantly improve its efficacy. This architectural framework places LLMSA as a versatile and resilient substitute for conventional static analysis techniques.

The proposed framework combines the symbolic and neural elements to satisfy its objectives. Symbolic constructors determine abstract syntax trees (ASTs) in a deterministic fashion to obtain syntactic characteristics, while neural components apply large language models (LLMs) for reasoning about semantic relationships. The limited Datalog-style policy language allows the user to intuitively sketch tasks, breaking them up into exact rules for inspection. Lazy evaluation saves the computational cost since it performs the neural operations only when necessary, whereas incremental processing saves redundant calculations in iterative processes. Concurrent execution makes independent rules execute concurrently and greatly improves performance. The framework has been tested with Java programs on tasks such as alias analysis, program slicing, and bug detection, hence demonstrating its versatility and scalability.

LLMSA performed well in a variety of static analysis tasks. It achieved 72.37% precision and 85.94% recall for alias analysis and 91.50% precision and 84.61% recall for program slicing. For the tasks of bug detection, it had an average precision of 82.77% and recall of 85.00%, thereby outperforming dedicated tools like NS-Slicer and Pinpoint by a fair margin of F1 score. In addition, the methodology could identify 55 out of 70 taint vulnerabilities in the TaintBench dataset, with a recall rate that exceeded an industrial-grade tool by 37.66% and a significant improvement in the F1 score. LLMSA achieved up to a 3.79× improvement compared with other designs in terms of computational efficiency, thus demonstrating its potential to perform various analytical tasks efficiently and proficiently.

This research presents LLMSA as a transformative approach to static analysis, overcoming challenges related to compilation dependency and limited customization. Strong performance, scalability, as well as flexibility across applications in the context of different tasks in analysis, have been gained using the neuro-symbolic framework along with a correctly defined policy language. Effectiveness and versatility ensure LLMSA is an essential resource, bringing about ease to the advanced methods of static analysis for software development.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

The post Meet LLMSA: A Compositional Neuro-Symbolic Approach for Compilation-Free, Customizable Static Analysis with Reduced Hallucinations appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLMSA 静态分析 神经符号 编译无关 定制化
相关文章