MarkTechPost@AI 2024年11月25日
CMU Researchers Propose XGrammar: An Open-Source Library for Efficient, Flexible, and Portable Structured Generation
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

随着大型语言模型(LLM)的兴起,结构化生成领域变得越来越重要。LLM能够生成类似人类的文本,但同时也需要生成遵循严格格式(如JSON、SQL等)的输出。代码生成、机器人控制和结构化查询等应用都依赖于这些能力。然而,确保输出符合特定结构而又不影响速度或效率仍然是一个重大挑战。XGrammar作为一个开源库,通过将标记分为上下文无关和上下文相关两类,并利用GPU加速,显著提高了结构化生成的效率,实现了高达100倍的加速和80倍的端到端输出生成速度提升,同时内存占用也大幅降低。

🤔**标记分类优化计算**: XGrammar将标记分为上下文无关和上下文相关两类,预先验证上下文无关标记,减少运行时检查,显著降低计算开销。

🚀**内存效率提升**: 采用自适应标记掩码缓存,将内存使用量降低至原来的0.2%,提高了可扩展性,降低了资源消耗。

⚡️**性能显著提升**: 在CFG处理方面实现了高达100倍的加速,在结构化输出生成方面实现了80倍的提升,显著提高了生成速度和效率。

🌐**跨平台部署**: XGrammar支持多种平台,包括客户端浏览器,使其能够在智能手机等便携式设备上使用,拓展了应用场景。

🔗**与LLM框架集成**: XGrammar可以与流行的LLM模型(如Llama 3.1)无缝集成,确保兼容性和易用性,方便开发者使用。

The field of structured generation has become important with the rise of LLMs. These models, capable of generating human-like text, are now tasked with producing outputs that follow rigid formats such as JSON, SQL, and other domain-specific languages. Applications like code generation, robotic control, and structured querying depend heavily on these capabilities. However, ensuring that outputs conform to specific structures without compromising speed or efficiency remains a significant challenge. Structured outputs allow for seamless downstream processing, but the complexity of achieving these results necessitates innovative solutions.

Despite advancements in LLMs, structured output generation continues to be plagued by inefficiencies. One major challenge is managing the computational demands of adhering to grammatical constraints during output generation. Traditional methods like context-free grammar (CFG) interpretation require processing each possible token in the model’s vocabulary, which can exceed 128,000 tokens. Moreover, maintaining stack states to track recursive grammar rules adds to runtime delays. As a result, existing systems often experience high latency and increased resource usage, making them unsuitable for real-time or large-scale applications.

Current tools for structured generation utilize constrained decoding methods to ensure outputs align with predefined rules. These approaches filter out invalid tokens by setting their probabilities to zero at each decoding step. While effective, constrained decoding often needs to improve its efficiency due to evaluating each token against the entire stack state. Also, the recursive nature of CFGs further complicates runtime processing. These challenges have limited the scalability and practicality of existing systems, particularly when handling complex structures or large vocabularies.

Researchers from Carnegie Mellon University, NVIDIA, Shanghai Jiao Tong University, and the University of California Berkeley developed XGrammar, a groundbreaking structured generation engine to address these limitations. XGrammar introduces a novel approach by dividing tokens into two categories: context-independent tokens that can be prevalidated and context-dependent tokens requiring runtime evaluation. This separation significantly reduces the computational burden during output generation. Also, the system incorporates a co-designed grammar and inference engine, enabling it to overlap grammar computations with GPU-based LLM operations, thereby minimizing overhead.

XGrammar’s technical implementation includes several key innovations. It uses a byte-level pushdown automaton to process CFGs efficiently, enabling it to handle irregular token boundaries and nested structures. The adaptive token mask cache precomputes and stores validity for context-independent tokens, covering over 99% of tokens in most cases. Context-dependent tokens, representing less than 1% of the total, are processed using a persistent execution stack that allows for rapid branching and rollback operations. XGrammar’s preprocessing phase overlaps with the LLM’s initial prompt processing, ensuring near-zero latency for structured generation.

Performance evaluations reveal the significant advantages of XGrammar. For JSON grammar tasks, the system achieves a token mask generation time of less than 40 microseconds, delivering up to a 100x speedup compared to traditional methods. Integrated with the Llama 3.1 model, XGrammar enables an 80x improvement in end-to-end structured output generation on the NVIDIA H100 GPU. Moreover, memory optimization techniques reduce storage requirements to just 0.2% of the original size, from 160 MB to 0.46 MB. These results demonstrate XGrammar’s ability to handle large-scale tasks with unprecedented efficiency.

The researchers’ efforts have several key takeaways:

In conclusion, XGrammar represents a transformative step in structured generation for large language models. Addressing inefficiencies in traditional CFG processing and constrained decoding offers a scalable, high-performance solution for generating structured outputs. Its innovative techniques, such as token categorization, memory optimization, and platform compatibility, make it an essential tool for advancing AI applications. With results up to 100x speedup and reduced latency, XGrammar sets a new standard for structured generation, enabling LLMs to meet modern AI systems’ demands effectively.


Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[FREE AI VIRTUAL CONFERENCE] SmallCon: Free Virtual GenAI Conference ft. Meta, Mistral, Salesforce, Harvey AI & more. Join us on Dec 11th for this free virtual event to learn what it takes to build big with small models from AI trailblazers like Meta, Mistral AI, Salesforce, Harvey AI, Upstage, Nubank, Nvidia, Hugging Face, and more.

The post CMU Researchers Propose XGrammar: An Open-Source Library for Efficient, Flexible, and Portable Structured Generation appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

XGrammar 结构化生成 大型语言模型 LLM 上下文无关标记
相关文章