MarkTechPost@AI 03月09日 08:28
Inception Unveils Mercury: The First Commercial-Scale Diffusion Large Language Model
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Inception Labs推出了Mercury,首个商用级扩散大语言模型(dLLMs),在文本和代码生成任务的速度、成本效益和智能方面实现了范式转变。Mercury系列模型速度前所未有,在商品级NVIDIA H100 GPU上实现了超过每秒1000个token的吞吐量,媲美Groq等定制硬件。与当前领先的自回归模型相比,速度提高了5-10倍。Mercury Coder在代码生成方面表现卓越,速度超过每秒1000个token,且在标准编码基准测试中超越了GPT-4o Mini和Claude 3.5 Haiku等模型。Mercury dLLMs无缝支持RAG、工具集成和基于代理的工作流,适用于企业环境、API集成和本地部署。

🚀 **速度与效率突破**: Mercury系列扩散大语言模型实现了前所未有的速度,在NVIDIA H100 GPU上达到每秒超过1000个token的吞吐量,性能比肩定制硬件,并且比现有自回归模型快5-10倍。

💡 **扩散模型优势**: 与自回归LLM逐个生成token的方式不同,扩散模型采用“由粗到精”的生成过程,并行更新token,显著提升了推理、纠错和整体连贯性,颠覆了传统的文本生成方式。

💻 **Mercury Coder卓越性能**: Mercury Coder专为编码应用优化,代码生成速度超过每秒1000个token,在标准编码基准测试中,不仅与GPT-4o Mini和Claude 3.5 Haiku等高性能模型相媲美,还在Copilot Arena中名列前茅,甚至超越了GPT-4o Mini和Gemini-1.5-Flash,同时速度快约4倍。

🔄 **广泛的适用性与集成**: Mercury dLLMs可以无缝替代传统的自回归LLM,支持检索增强生成(RAG)、工具集成和基于代理的工作流,适用于企业环境、API集成和本地部署。

The landscape of generative AI and LLMs has experienced a remarkable leap forward with the launch of Mercury by the cutting-edge startup Inception Labs. Introducing the first-ever commercial-scale diffusion large language models (dLLMs), Inception labs promises a paradigm shift in speed, cost-efficiency, and intelligence for text and code generation tasks.

Mercury: Setting New Benchmarks in AI Speed and Efficiency

Inception’s Mercury series of diffusion large language models introduces unprecedented performance, operating at speeds previously unachievable with traditional LLM architectures. Mercury achieves remarkable throughput—over 1000 tokens per second on commodity NVIDIA H100 GPUs—a performance that was formerly exclusive to custom-designed hardware like Groq, Cerebras, and SambaNova. This translates to an astonishing 5-10x speed increase compared to current leading autoregressive models.

Diffusion Models: The Future of Text Generation

Traditional autoregressive LLMs generate text sequentially, token-by-token, causing significant latency and computational costs, especially in extensive reasoning and error-correction tasks. Diffusion models, however, leverage a unique “coarse-to-fine” generation process. Unlike autoregressive models restricted by sequential generation, diffusion models iteratively refine outputs from noisy approximations, enabling parallel token updates. This method significantly enhances reasoning, error correction, and overall coherence of the generated content.

While diffusion approaches have proven revolutionary in image, audio, and video generation—powering applications like Midjourney and Sora—their application in discrete data domains such as text and code was largely unexplored until Inception’s breakthrough.

Mercury Coder: High-Speed, High-Quality Code Generation

Inception’s flagship product, Mercury Coder, is optimized specifically for coding applications. Developers now have access to a high-quality, rapid-response model capable of generating code at more than 1000 tokens per second, a dramatic improvement over existing speed-focused models.

On standard coding benchmarks, Mercury Coder doesn’t just match but often surpasses the performance of other high-performing models such as GPT-4o Mini and Claude 3.5 Haiku. Moreover, Mercury Coder Mini secured a top-ranking position on Copilot Arena, tying for second place and outperforming established models like GPT-4o Mini and Gemini-1.5-Flash. Even more impressively, Mercury accomplishes this while maintaining approximately 4x faster speeds than GPT-4o Mini.

Versatility and Integration

Mercury dLLMs function seamlessly as drop-in replacements for traditional autoregressive LLMs. They effortlessly support use-cases including Retrieval-Augmented Generation (RAG), tool integration, and agent-based workflows. The diffusion model’s parallel refinement allows multiple tokens to be updated simultaneously, ensuring swift and accurate generation suitable for enterprise environments, API integration, and on-premise deployments.

Built by AI Innovators

Inception’s technology is underpinned by foundational research at Stanford, UCLA and Cornell from its pioneering founders, recognized for their crucial contributions to the evolution of generative AI. Their combined expertise includes the original development of image-based diffusion models and innovations such as Direct Preference Optimization, Flash Attention, and Decision Transformers—techniques widely acknowledged for their transformative impact on modern AI.

Inception’s introduction of Mercury marks a pivotal moment for enterprise AI, unlocking previously impossible performance levels, accuracy, and cost-efficiency.


Check out the Playground and Technical details. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 80k+ ML SubReddit.

Recommended Read- LG AI Research Releases NEXUS: An Advanced System Integrating Agent AI System and Data Compliance Standards to Address Legal Concerns in AI Datasets

The post Inception Unveils Mercury: The First Commercial-Scale Diffusion Large Language Model appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Mercury 扩散模型 大语言模型 代码生成
相关文章