MarkTechPost@AI 前天 12:26
Moonshot AI Releases Kimi K2: A Trillion-Parameter MoE Model Focused on Long Context, Code, Reasoning, and Agentic Behavior
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

由月之暗面(Moonshot AI)于2025年7月推出的Kimi K2,是一个专为agentic工作流设计的开源Mixture-of-Experts (MoE)模型。该模型拥有1万亿总参数,每token激活参数320亿,在15.5万亿tokens上使用定制的MuonClip优化器进行训练,实现了前所未有的大规模稳定训练。与传统聊天机器人不同,Kimi K2专注于行动而非纯粹的推理,能够自主分解任务、执行工具序列、编写和调试代码、分析数据,并编排工作流,同时保持较低的人工干预。

🤖 Kimi K2 是一款专为agentic工作流设计的模型,区别于传统聊天机器人,它更侧重于行动而非单纯的推理。

⚙️ Kimi K2 拥有1万亿总参数,每token激活参数320亿,在15.5万亿tokens上训练,采用MoE架构,包含384个专家,每个token路由到8个激活专家,并有一个共享专家用于全局上下文。

🛠️ Kimi K2 在代码执行、数据分析、Web应用开发和工具编排方面表现出色,能够自主完成多步骤任务,包括17个以上的工具编排,同时在SWE-bench、agentic任务和LiveCodeBench等基准测试中超越了Claude Sonnet 4和GPT-4.1。

💰 Kimi K2 具有显著的成本优势,其输入成本为每百万tokens 0.60美元,输出成本为2.50美元,远低于Claude和Gemini,这使得Kimi K2成为开发者和企业更经济实惠的选择。

Kimi K2, launched by Moonshot AI in July 2025, is a purpose-built, open-source Mixture-of-Experts (MoE) model—1 trillion total parameters, with 32 billion active parameters per token. It’s trained using the custom MuonClip optimizer on 15.5 trillion tokens, achieving stable training at this unprecedented scale without the typical instabilities seen in ultra-large models.

Unlike traditional chatbots, K2 is architected specifically for agentic workflows. It features native Model Context Protocol (MCP) support and was trained on simulated multi-step tool interactions, enabling it to autonomously decompose tasks, execute tool sequences, write and debug code, analyze data, and orchestrate workflows—all with minimal human oversight.

Why Agentic over Conversational?

While advanced models like GPT-4 and Claude 4 Sonnet excel at language reasoning, Kimi K2 moves from reasoning to action. It doesn’t just respond—it executes. The core shift lies in enabling real-world workflows:

K2’s training incorporated millions of synthetic dialogues, each rated by an LLM-based evaluator. These dialogues simulate realistic tool-use scenarios, giving K2 a practical edge in tool selection and multi-step execution.

Architecture and Training Innovations

K2’s technical design demonstrates several novel elements:

The model comes in two variants: Kimi-K2-Base, the foundational model ideal for fine-tuning and building customized solutions; and Kimi-K2-Instruct, the post-trained version optimized for immediate use in general-purpose chat and tool-using agentic tasks. Instruct is reflex-grade—optimized for fast, low-latency interaction rather than long-form deliberation. On benchmarks, Kimi K2 outperforms Claude Sonnet 4 and GPT-4.1 in coding and agentic reasoning, with 71.6% on SWE-bench, 65.8% on agentic tasks, and 53.7% on LiveCodeBench.

Performance Benchmarks

Kimi K2 not only matches but often surpasses closed-source models on key benchmarks:

BenchmarkKimi K2GPT‑4.1Claude Sonnet 4
SWE-bench Verified71.6 %54.6 %~72.7 %
Agentic Coding (Tau2)65.8 %45.2 %~61 %
LiveCodeBench v6 (Pass@1)53.7 %44.7 %47.4 %
MATH-50097.4 %92.4 %
MMLU89.5 %~90.4 %~92.9 %

Its performance in agentic benchmarks like Tau2 and LiveCodeBench demonstrates its superior capacity to handle multi-step, real-world coding tasks—outperforming many proprietary models.

Cost Efficiency

Perhaps the most disruptive element is pricing:

Kimi K2 is roughly 5x cheaper than Claude or Gemini while offering equal or better performance on several metrics. The cost advantage, combined with open access and support for local deployment, positions K2 as an economically viable alternative for developers, enterprises, and research teams.

Strategic Shift: From Thinking to Acting

Kimi K2 marks a pivotal moment in AI’s evolution—from thinking agents to acting systems. With native tool-use capabilities and built-in support for multi-agent protocols, it goes far beyond static chat interfaces. It is capable of triggering workflows, making decisions, executing API calls, and delivering tangible outputs autonomously.

Moreover, its release comes at a time when most such capabilities are either locked behind expensive APIs or limited to research labs. K2 is:

Broader Implications

    Will agentic architecture become the norm? K2’s strong performance on tool use tasks could push proprietary players to rethink their architectures.Can open-source efforts from Asia compete at global scale? With K2, Moonshot AI joins others like DeepSeek in showing that top-tier performance doesn’t have to originate from Silicon Valley.What’s next in the agentic evolution? Future models may combine video, robotics, and embodied reasoning to further expand the scope of what agentic AI can accomplish.

Conclusion

Kimi K2 isn’t just a bigger model—it’s a blueprint for what comes after the reasoning race: execution-first AI. By combining trillion-parameter scale, low inference costs, and deeply integrated agentic capabilities, Kimi K2 opens the door for AI systems that do more than generate—they build, act, and solve autonomously.

Check out the Models on Hugging Face and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter, and Youtube and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post Moonshot AI Releases Kimi K2: A Trillion-Parameter MoE Model Focused on Long Context, Code, Reasoning, and Agentic Behavior appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Kimi K2 MoE模型 Agentic AI 月之暗面
相关文章