MarkTechPost@AI 07月03日
DeepSeek R1T2 Chimera: 200% Faster Than R1-0528 With Improved Reasoning and Compact Output
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

TNG Technology Consulting 推出了 DeepSeek-TNG R1T2 Chimera,这是一款创新的 Assembly-of-Experts (AoE) 模型,通过独特的模型融合策略实现了智能与速度的结合。R1T2 基于三个高性能的父模型构建,展示了大规模专家层插值如何提高大型语言模型 (LLM) 的效率。该模型在推理速度上比 R1 快 20% 以上,比 R1-0528 快两倍以上,同时在高级基准测试中表现出色。R1T2 的开放权重发布在 Hugging Face 上,鼓励社区实验和进一步开发。

💡 DeepSeek-TNG R1T2 Chimera 是一种 Assembly-of-Experts (AoE) 模型,通过模型融合策略将智能与速度相结合。

🚀 R1T2 通过融合三个父模型(R1-0528、R1 和 V3-0324)的专家层,实现了线性时间构建新模型的能力,继承了多个父模型的能力。

⏱️ R1T2 的推理速度比 R1 快 20% 以上,比 R1-0528 快两倍以上,这主要归功于其减少的输出token长度和选择性专家张量集成。

🧠 R1T2 在高级基准测试(如 GPQA Diamond 和 AIME-2024/2025)中显著优于 R1,同时保持了 R1 的推理能力。

💬 Reddit 社区的反馈表明,用户称赞 R1T2 的响应速度、token效率,以及在速度和连贯性之间的平衡。并且R1T2 避免产生幻觉,更适合生产环境的 LLM 后端。

🔓 R1T2 在 Hugging Face 上以 MIT 许可证开放权重发布,鼓励社区进行实验,包括下游微调和强化学习。

TNG Technology Consulting has unveiled DeepSeek-TNG R1T2 Chimera, a new Assembly-of-Experts (AoE) model that blends intelligence and speed through an innovative model merging strategy. Built from three high-performing parent models—R1-0528, R1, and V3-0324—R1T2 demonstrates how expert-layer interpolation at scale can unlock new efficiencies in large language models (LLMs).

Assembly-of-Experts: Efficient Model Composition at Scale

Traditional LLM training and fine-tuning require massive compute resources. TNG addresses this with its Assembly-of-Experts (AoE) approach, merging large-scale Mixture-of-Experts (MoE) models at the weight tensor level without retraining. This strategy enables linear-time construction of new models that inherit capabilities from multiple parents. R1T2’s architecture combines expert tensors from R1 with the base of V3-0324 and selectively includes improvements from R1-0528, optimizing the tradeoff between inference cost and reasoning quality.

Speed Gains and Intelligence Tradeoffs

In benchmark comparisons, R1T2 is over 20% faster than R1 and more than twice as fast as R1-0528. These performance gains are largely attributed to its reduced output token length and selective expert tensor integration. While it falls slightly short of R1-0528 in raw intelligence, it significantly outperforms R1 across high-level benchmarks like GPQA Diamond and AIME-2024/2025.

Moreover, the model retains the …n reasoning traces, which emerge only when R1’s contribution to the merge crosses a specific threshold. This behavioral consistency is vital for applications requiring step-by-step chain-of-thought reasoning.

Emergent Properties in the Parameter Space

R1T2 confirms findings from the accompanying research paper that model merging can yield viable models throughout the interpolation space. Interestingly, intelligence properties change gradually, but behavioral markers (like consistent use of ) emerge abruptly near a 50% R1 weight ratio. This indicates that certain traits reside in distinct subspaces of the LLM weight landscape.

By merging only the routed expert tensors and leaving other components (e.g., attention and shared MLPs) from V3-0324 intact, R1T2 maintains a high reasoning score while avoiding verbosity. This design leads to what TNG calls “think-token consistency,” a behavioral trait where reasoning is not only accurate but also concise.

Reddit Community Feedback

Early discussions from the Reddit LocalLLaMA community highlight practical impressions of R1T2. Users praise the model’s responsiveness, token efficiency, and balance between speed and coherence. One user noted, “It’s the first time a Chimera model feels like a real upgrade in both speed and quality.” Another pointed out that it performs better in math-heavy contexts compared to previous R1 variants.

A few Redditors also observed that R1T2 exhibits a more grounded persona, avoiding hallucinations more consistently than R1 or V3-based models. Such emergent traits are particularly relevant for developers seeking stable LLM backends for production environments.

Open-Weights and Availability

R1T2 is publicly available under the MIT License on Hugging Face: DeepSeek-TNG R1T2 Chimera. The release encourages community experimentation, including downstream fine-tuning and reinforcement learning. According to TNG, internal deployments via the Chutes serverless inference platform are already processing close to 5 billion tokens daily.

Conclusion

DeepSeek-TNG R1T2 Chimera showcases the potential of Assembly-of-Experts construction to generate performant, efficient LLMs without the need for gradient-based training. By strategically combining the reasoning capabilities of R1, the token-efficient design of V3-0324, and enhancements from R1-0528, R1T2 establishes a new standard for balanced model design. Its open-weight release under the MIT license ensures accessibility, making it a strong candidate for developers looking for fast, capable, and customizable large language models.

With model merging proving viable even at the 671B-parameter scale, TNG’s R1T2 may serve as a blueprint for future experiments in parameter space interpolation, enabling more modular and interpretable LLM development.


Check out the Paper and Open Weights on Hugging Face. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post DeepSeek R1T2 Chimera: 200% Faster Than R1-0528 With Improved Reasoning and Compact Output appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

DeepSeek-TNG LLM 模型融合 AoE模型
相关文章