MarkTechPost@AI 前天 04:40
Tencent Open Sources Hunyuan-A13B: A 13B Active Parameter MoE Model with Dual-Mode Reasoning and 256K Context
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

腾讯推出开源大语言模型Hunyuan-A13B,采用稀疏混合专家(MoE)架构,总参数800亿,推理时仅激活130亿参数,实现性能与计算成本的平衡。该模型支持分组查询注意力(GQA)和256K上下文长度,并具备快慢双模式推理框架。Hunyuan-A13B在BFCL-v3、τ-Bench、C3-Bench和ComplexFuncBench等代理基准测试中表现出色,尤其在工具调用和长上下文场景中超越了更大模型。该模型已在Hugging Face和GitHub上开源,适用于研究和生产环境。

💡Hunyuan-A13B采用稀疏混合专家(MoE)架构,总参数800亿,推理时仅激活130亿参数,在性能和计算成本之间取得了高效平衡。该模型包含1个共享专家和64个非共享专家,每次前向传播激活8个专家,并结合了20T tokens的预训练阶段,以及快速退火和长上下文适应的训练流程。

🧠Hunyuan-A13B具备双模式思维链(CoT)能力,支持低延迟的快速推理模式和更复杂的慢速推理模式。用户可以通过简单的标签系统(/no think和/think)控制这两种模式,以适应不同任务的复杂性。

🛠️该模型在后训练阶段采用了多阶段监督微调(SFT)和强化学习(RL),结合了基于结果的奖励和特定工具的反馈,包括用于代码的沙盒执行环境和用于代理的基于规则的检查。在代理训练阶段,团队合成了多种工具使用场景,生成了超过20,000种格式组合,增强了Hunyuan-A13B执行现实世界工作流程的能力。

🏆在多个NLP任务的基准测试中,Hunyuan-A13B表现出色,例如在MATH、CMATH和GPQA上与更大模型持平或超越。在逻辑推理方面,它超越了Qwen3-A22B和DeepSeek R1。在编码方面,它在MBPP上达到83.9,在MultiPL-E上达到69.3。在代理任务上,它在BFCL-v3和ComplexFuncBench上领先。

🚀Hunyuan-A13B在长上下文理解方面表现出色,在PenguinScrolls上得分87.7,在RULER上即使在64K–128K的上下文长度下也能保持高性能,优于Qwen3-A22B和DeepSeek R1等更大模型。

⚙️Hunyuan-A13B完全集成了vLLM、SGLang和TensorRT-LLM等流行的推理框架,支持W16A16、W8A8和KV Cache FP8等精度格式,并具有自动前缀缓存和Chunk Prefill等功能。在32批输入(2048输入,14336输出长度)下,该模型实现了高达1981.99 tokens/秒的吞吐量,适用于实时应用。

Tencent’s Hunyuan team has introduced Hunyuan-A13B, a new open-source large language model built on a sparse Mixture-of-Experts (MoE) architecture. While the model consists of 80 billion total parameters, only 13 billion are active during inference, offering a highly efficient balance between performance and computational cost. It supports Grouped Query Attention (GQA), 256K context length, and a dual-mode reasoning framework that toggles between fast and slow thinking.

Designed for efficient deployment and robust reasoning, Hunyuan-A13B achieves top-tier performance across agentic benchmarks including BFCL-v3, τ-Bench, C3-Bench, and ComplexFuncBench, often outperforming larger models in tool-calling and long-context scenarios.

Architecture: Sparse MoE with 13B Active Parameters

At its core, Hunyuan-A13B follows a fine-grained MoE design comprising 1 shared expert and 64 non-shared experts, with 8 experts activated per forward pass. This architecture, backed by scaling experiments, ensures performance consistency while keeping inference costs low. The model includes 32 layers, uses SwiGLU activations, a vocabulary size of 128K, and integrates GQA for enhanced memory efficiency during long-context inference.

The model’s MoE setup is paired with an optimized training curriculum: a 20T-token pretraining phase, followed by fast annealing and long-context adaptation. This last phase scales the context window first to 32K and then to 256K tokens using NTK-aware positional encoding, ensuring stable performance at large sequence lengths.

Dual-Mode Reasoning: Fast and Slow Thinking

A standout feature of Hunyuan-A13B is its dual-mode Chain-of-Thought (CoT) capability. It supports both a low-latency fast-thinking mode for routine queries and a more elaborate slow-thinking mode for multi-step reasoning. These modes are controlled through a simple tag system: /no think for fast inference and /think for reflective reasoning. This flexibility allows users to adapt computational cost to task complexity.

Post-Training: Reinforcement Learning with Task-Specific Reward Models

The post-training pipeline of Hunyuan-A13B includes multi-stage supervised fine-tuning (SFT) and reinforcement learning (RL) across both reasoning-specific and general tasks. The RL stages incorporate outcome-based rewards and tool-specific feedback, including sandbox execution environments for code and rule-based checks for agents.

In the agent training phase, the team synthesized diverse tool-use scenarios with planner, checker, and tool roles, generating over 20,000 format combinations. This reinforced Hunyuan-A13B’s ability to execute real-world workflows such as spreadsheet processing, information search, and structured reasoning.

Evaluation: State-of-the-Art Agentic Performance

Hunyuan-A13B shows strong benchmark results across diverse NLP tasks:

Long-context comprehension is another highlight. On PenguinScrolls, it scores 87.7—just shy of Gemini 2.5 Pro. On RULER, it sustains high performance (73.9) even at 64K–128K context, outperforming larger models like Qwen3-A22B and DeepSeek R1 in context resilience.

Inference Optimization and Deployment

Hunyuan-A13B is fully integrated with popular inference frameworks like vLLM, SGLang, and TensorRT-LLM. It supports precision formats such as W16A16, W8A8, and KV Cache FP8, along with features like Auto Prefix Caching and Chunk Prefill. It achieves up to 1981.99 tokens/sec throughput on a 32-batch input (2048 input, 14336 output length), making it practical for real-time applications.

Open Source and Industry Relevance

Available on Hugging Face and GitHub, Hunyuan-A13B is released with permissive open-source licensing. It’s engineered for efficient research and production use, especially in latency-sensitive environments and long-context tasks.

By combining MoE scalability, agentic reasoning, and open-source accessibility, Tencent’s Hunyuan-A13B offers a compelling alternative to heavyweight LLMs, enabling broader experimentation and deployment without sacrificing capability.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post Tencent Open Sources Hunyuan-A13B: A 13B Active Parameter MoE Model with Dual-Mode Reasoning and 256K Context appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Hunyuan-A13B MoE模型 大语言模型 开源 腾讯
相关文章