MarkTechPost@AI 13小时前
Inception Labs Introduces Mercury: A Diffusion-Based Language Model for Ultra-Fast Code Generation
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了 Inception Labs 推出的 Mercury,一款基于扩散模型的 LLM,专为代码生成优化。与传统的自回归模型相比,Mercury 采用并行 token 生成,显著提高了计算效率和吞吐量。独立评估显示,Mercury Coder Mini 的吞吐量超过 1100 tokens/秒,远超现有模型。Mercury Coder Small 在速度和准确性之间取得了良好平衡,并在 HumanEval 和 MultiPL-E 等基准测试中表现出色,尤其在交互式和实时编码场景中具有优势。

🚀 传统自回归模型在代码生成中存在速度瓶颈,限制了其在实时交互环境中的应用。Mercury 通过采用扩散模型,实现了并行 token 生成,从而显著提升了处理速度和效率。

💡 Mercury Coder Mini 实现了超过 1100 tokens/秒的吞吐量,远超现有自回归模型。Mercury Coder Small 在速度和准确性之间取得了平衡,在 HumanEval 和 MultiPL-E 等基准测试中表现出色。

💻 Mercury 模型在 HumanEval 和 MultiPL-E 基准测试中表现出色,尤其在 fill-in-the-middle 任务中,Mercury Coder Small 优于 Codestral 2501。在 Copilot Arena 平台的人工评估中,Mercury Coder Mini 在用户偏好方面排名第二,延迟仅为 25 毫秒。

✨ Mercury 模型采用扩散过程,通过迭代细化初始随机噪声来生成连贯数据。这种方法使模型能够同时调整多个 token,从而实现并行化,并与现有的提示方法兼容,确保了与现有开发工作流程的无缝集成。

Generative AI and Its Challenges in Autoregressive Code Generation

The field of generative artificial intelligence has significantly impacted software development by automating various coding tasks, ranging from simple auto-completions to complex software solutions. However, traditional language models predominantly employ autoregressive methods, predicting one token at a time, which leads to inherent bottlenecks and latency issues. Particularly for coding applications, the slow sequential generation limits efficiency, posing challenges in real-time interactive environments or scenarios demanding immediate responses. Although existing speed-optimized models, such as GPT-4o and Claude 3.5 Haiku, have shown somewhat improved performance, the fundamental constraint of token-by-token generation persists, necessitating a shift toward alternative modeling approaches capable of parallel generation and substantial latency reduction.

Current State of AI-Based Coding Assistants and Their Speed Limitations

Currently, the mainstream AI-based coding assistants rely heavily on autoregressive transformer architectures. Notable models in this domain, such as GPT-4o Mini, Claude 3.5 Haiku, Gemini 2.0 Flash Lite, and Codestral, deliver impressive results across standard coding benchmarks. Yet, their sequential nature remains a limiting factor in terms of speed. Autoregressive models typically achieve throughput around 50 to 200 tokens per second on contemporary GPU hardware. These models, although highly accurate, encounter significant limitations when handling high-demand, interactive, or latency-sensitive coding tasks.

Introduction of Mercury: A Diffusion-Based LLM for High-Performance Coding

Researchers at Inception Labs introduced Mercury, a groundbreaking diffusion-based large language model (LLM) family specifically optimized for coding applications. Mercury Coder, the first model set within this family, comprises two distinct variants: Mercury Coder Mini and Mercury Coder Small. These diffusion models uniquely combine transformer-based architectures with parallel token generation, significantly enhancing computational efficiency and overall throughput. According to independent evaluations conducted by Artificial Analysis, Mercury Coder models achieved exceptional performance benchmarks. The Mercury Coder Mini reached a throughput of 1,109 tokens per second, much faster than baseline autoregressive models. Mercury Coder Small demonstrated a similarly impressive throughput of 737 tokens per second, offering an excellent balance between speed and coding accuracy.

Diffusion Mechanism Behind Mercury’s Parallel Token Generation

The Mercury models leverage diffusion processes where outputs are iteratively refined from initial random noise into coherent data. Unlike conventional models that sequentially predict tokens, Mercury models simultaneously refine multiple tokens at each iteration, greatly optimizing GPU utilization. During training, Mercury models employed datasets comprising trillions of tokens sourced from extensive web crawls, synthetic data, and proprietary repositories. The diffusion training protocol involves a forward process of progressively adding noise to clean data and a reverse process that iteratively denoises this noisy data. Specifically, Mercury utilizes a denoising diffusion loss, which enables the simultaneous adjustment of tokens and enhances parallelization. Also, Mercury models incorporate prompting methods commonly used in existing autoregressive models, including zero-shot and few-shot learning, ensuring seamless integration into established coding workflows.

Benchmark Accuracy: Mercury Models Excel Across Standard Coding Tasks

On benchmark tests, Mercury Coder Small achieved 90.0% accuracy on the HumanEval test, a standard Python coding benchmark, and 76.2% on MultiPL-E, a multi-language benchmark covering languages such as C++, Java, JavaScript, PHP, Bash, and TypeScript. Mercury Coder Mini similarly demonstrated robust performance, with 88.0% on HumanEval and 74.1% on MultiPL-E. Notably, on fill-in-the-middle coding tasks, essential for auto-completion and interactive coding, Mercury Coder Small outperformed prominent models with an average accuracy of 84.8%, surpassing even specialized speed-optimized models like Codestral 2501, which attained 82.5%. Moreover, in real-world human evaluations conducted via the Copilot Arena platform, Mercury Coder Mini was ranked second overall in user preference, outperforming well-established models like GPT-4o Mini and Gemini 1.5 Flash, and exhibited the lowest average latency of only 25 milliseconds.

Additionally, Mercury models consistently demonstrate exceptional results in specific language tests. In detailed evaluations, Mercury Coder Small demonstrated notable accuracy across various programming languages on the MultiPL-E benchmark, achieving 82.0% accuracy in C++, 80.1% in Java, 83.9% in JavaScript, 78.3% in PHP, 50.1% in Bash, and 82.6% in TypeScript.

Key Takeaways: High Throughput, Accuracy, and Workflow Compatibility


Check out the Paper, API and Chat. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post Inception Labs Introduces Mercury: A Diffusion-Based Language Model for Ultra-Fast Code Generation appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Mercury 扩散模型 代码生成 LLM
相关文章