MarkTechPost@AI 05月10日 04:45
ServiceNow AI Released Apriel-Nemotron-15b-Thinker: A Compact Yet Powerful Reasoning Model Optimized for Enterprise-Scale Deployment and Efficiency
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

ServiceNow AI 发布了 Apriel-Nemotron-15b-Thinker,一个参数量仅为150亿的模型,但性能却能与更大模型相媲美。它在内存占用和token使用效率方面表现出色,仅需QWQ-32b一半的内存,并能减少40%的token消耗,有效降低企业推理成本。Apriel-Nemotron-15b-Thinker 经过三阶段训练,包括持续预训练、监督微调和引导强化偏好优化,使其在数学推理、编程挑战和企业级任务中表现优异,为企业在实际硬件环境中部署高性能推理模型提供了新的可能。

🧠Apriel-Nemotron-15b-Thinker 模型参数量为150亿,远小于 QWQ-32b 和 EXAONE-Deep-32b 等大型模型,但性能却能与之匹敌,展示了小模型也能实现高性能。

🚀该模型采用了三阶段训练方法:首先是持续预训练(CPT),使用超过1000亿的token进行训练;然后是监督微调(SFT),使用20万个高质量的演示数据进行训练;最后是 GRPO(Guided Reinforcement Preference Optimization)优化,确保模型输出与预期结果对齐。

💡Apriel-Nemotron-15b-Thinker 在企业级任务中表现出色,如 MBPP, BFCL, Enterprise RAG 等,并在学术任务中也能与更大的模型相媲美,例如 GPQA 和 MATH-500。

💰该模型在生产环境中比 QWQ-32b 减少 40% 的 token 使用量,显著降低了推理成本,同时内存占用也减少了约 50%,使得在企业硬件上部署更加容易。

🏢Apriel-Nemotron-15b-Thinker 专为企业级应用而设计,适用于企业自动化、编码助手和逻辑助手等场景,避免了过度依赖实验室环境,更注重实际应用。

AI models today are expected to handle complex tasks such as solving mathematical problems, interpreting logical statements, and assisting with enterprise decision-making. Building such models demands the integration of mathematical reasoning, scientific understanding, and advanced pattern recognition. As the demand for intelligent agents in real-time applications, like coding assistants and business automation tools, continues to grow, there is a pressing need for models that combine strong performance with efficient memory and token usage, making them viable for deployment in practical hardware environments.

A central challenge in AI development is the resource intensity of large-scale reasoning models. Despite their strong capabilities, these models often require significant memory and computational resources, limiting their real-world applicability. This creates a gap between what advanced models can achieve and what users can realistically deploy. Even well-resourced enterprises may find running models demanding dozens of gigabytes of memory or high inference costs unsustainable. The issue is not just about building smarter models, but ensuring they are efficient and deployable in real-world platforms. High-performing models such as QWQ‑32b, o1‑mini, and EXAONE‑Deep‑32b excel at tasks involving mathematical reasoning and academic benchmarks. However, their dependence on high-end GPUs and high token consumption limits their use in production settings. These models highlight the ongoing trade-off in AI deployment: achieving high accuracy at the cost of scalability and efficiency.

Addressing this gap, researchers at ServiceNow introduced Apriel-Nemotron-15b-Thinker. This model consists of 15 billion parameters, a relatively modest size compared to its high-performing counterparts, yet it demonstrates performance on par with models almost twice its size. The primary advantage lies in its memory footprint and token efficiency. While delivering competitive results, it requires nearly half the memory of QWQ‑32b and EXAONE‑Deep‑32b. This directly contributes to improved operational efficiency in enterprise environments, making it feasible to integrate high-performance reasoning models into real-world applications without large-scale infrastructure upgrades.

The development of Apriel-Nemotron-15b-Thinker followed a structured three-stage training approach, each designed to enhance a specific aspect of the model’s reasoning capabilities. In the initial phase, termed Continual Pre-training (CPT), the model was exposed to over 100 billion tokens. These tokens were not generic text but carefully selected examples from domains requiring deep reasoning, mathematical logic, programming challenges, scientific literature, and logical deduction tasks. This exposure provided the foundational reasoning capabilities that distinguish the model from others. The second stage involved Supervised Fine-Tuning (SFT) using 200,000 high-quality demonstrations. These examples further calibrated the model’s responses to reasoning challenges, enhancing performance on tasks that require accuracy and attention to detail. The final tuning stage, GRPO (Guided Reinforcement Preference Optimization), refined the model’s outputs by optimizing alignment with expected results across key tasks. This pipeline ensures the model is intelligent, precise, structured, and scalable.

In enterprise-specific tasks such as MBPP, BFCL, Enterprise RAG, MT Bench, MixEval, IFEval, and Multi-Challenge, the model delivered competitive or superior performance compared to larger models. Regarding production efficiency, it consumed 40% fewer tokens than QWQ‑32b, significantly lowering inference costs. From a memory standpoint, it achieves all this with approximately 50% of the memory needed by QWQ‑32b and EXAONE-Deep‑32b, indicating a substantial improvement in deployment feasibility. Even in academic benchmarks, such as AIME-24, AIME-25, AMC-23, MATH-500, and GPQA, the model held its own, often equaling or surpassing the performance of other larger models, all while being significantly lighter in computational demand.

Several Key Takeaways from the Research on Apriel-Nemotron-15b-Thinker:


Check out the Model on Hugging Face. Also, don’t forget to follow us on Twitter.

Here’s a brief overview of what we’re building at Marktechpost:

The post ServiceNow AI Released Apriel-Nemotron-15b-Thinker: A Compact Yet Powerful Reasoning Model Optimized for Enterprise-Scale Deployment and Efficiency appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Apriel-Nemotron-15b-Thinker AI模型 企业级应用 推理效率 ServiceNow AI
相关文章