MarkTechPost@AI 1小时前
NVIDIA Releases Llama Nemotron Nano 4B: An Efficient Open Reasoning Model Optimized for Edge AI and Scientific Tasks
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

英伟达推出Llama Nemotron Nano 4B,一款开源推理模型,专为科学任务、编程、符号数学、函数调用和指令遵循设计。该模型具有强大的性能和效率,同时足够紧凑,可在边缘设备上部署。尽管参数仅为40亿,但根据内部基准测试,其准确性更高,吞吐量比同类80亿参数的开源模型高出50%。这使得Llama Nemotron Nano 4B成为在资源受限环境中部署基于语言的AI代理的实用基础。

🧠 Llama Nemotron Nano 4B基于Llama 3.1架构,采用密集型、仅解码器Transformer设计。该模型经过优化,可在推理密集型工作负载中保持高性能,同时保持较轻的参数量。

🛠️ 该模型经过多阶段监督微调,针对数学、编码、推理任务和函数调用等精选数据集。除了传统的监督学习,Nemotron Nano 4B还使用奖励感知偏好优化(RPO)进行强化学习优化,旨在增强模型在基于聊天和指令遵循环境中的实用性。

🚀 Nemotron Nano 4B在单轮和多轮推理任务中表现出色,推理吞吐量比同类80亿参数的开源模型高出50%。它支持高达128,000个token的上下文窗口,这对于涉及长文档、嵌套函数调用或多跳推理链的任务特别有用。

💡 Nemotron Nano 4B的核心优势之一是专注于边缘部署。该模型已明确测试和优化,可在NVIDIA Jetson平台和NVIDIA RTX GPU上高效运行,从而在低功耗嵌入式设备上实现实时推理能力,包括机器人系统、自主边缘代理或本地开发人员工作站。

⚖️ 该模型根据NVIDIA开放模型许可证发布,允许商业用途。它可以通过Hugging Face访问,所有相关的模型权重、配置文件和分词器工件均可公开访问。该许可结构与NVIDIA在其开放模型周围支持开发人员生态系统的更广泛战略保持一致。

NVIDIA has released Llama Nemotron Nano 4B, an open-source reasoning model designed to deliver strong performance and efficiency across scientific tasks, programming, symbolic math, function calling, and instruction following—while being compact enough for edge deployment. With just 4 billion parameters, it achieves higher accuracy and up to 50% greater throughput than comparable open models with up to 8 billion parameters, according to internal benchmarks.

The model is positioned as a practical foundation for deploying language-based AI agents in resource-constrained environments. By focusing on inference efficiency, Llama Nemotron Nano 4B addresses a growing demand for compact models capable of supporting hybrid reasoning and instruction-following tasks outside traditional cloud settings.

Model Architecture and Training Stack

Nemotron Nano 4B builds upon the Llama 3.1 architecture and shares lineage with NVIDIA’s earlier “Minitron” family. The architecture follows a dense, decoder-only transformer design. The model has been optimized for performance in reasoning-intensive workloads while maintaining a lightweight parameter count.

The post-training stack for the model includes multi-stage supervised fine-tuning on curated datasets for mathematics, coding, reasoning tasks, and function calling. In addition to traditional supervised learning, Nemotron Nano 4B has undergone reinforcement learning optimization using Reward-aware Preference Optimization (RPO), a method intended to enhance the model’s utility in chat-based and instruction-following environments.

This combination of instruction tuning and reward modeling helps align the model’s outputs more closely with user intent, particularly in multi-turn reasoning scenarios. The training approach reflects NVIDIA’s emphasis on aligning smaller models to practical usage tasks that traditionally require significantly larger parameter sizes.

Performance Benchmarks

Despite its compact footprint, Nemotron Nano 4B exhibits robust performance in both single-turn and multi-turn reasoning tasks. According to NVIDIA, it provides 50% higher inference throughput compared to similar open-weight models within the 8B parameter range. The model supports a context window of up to 128,000 tokens, which is particularly useful for tasks involving long documents, nested function calls, or multi-hop reasoning chains.

While NVIDIA has not disclosed full benchmark tables in the Hugging Face documentation, the model reportedly outperforms other open alternatives in benchmarks across math, code generation, and function calling precision. Its throughput advantage suggests it can serve as a viable default for developers targeting efficient inference pipelines with moderately complex workloads.

Edge-Ready Deployment

One of the core differentiators of Nemotron Nano 4B is its focus on edge deployment. The model has been explicitly tested and optimized to run efficiently on NVIDIA Jetson platforms and NVIDIA RTX GPUs. This enables real-time reasoning capabilities on low-power embedded devices, including robotics systems, autonomous edge agents, or local developer workstations.

For enterprises and research teams concerned with privacy and deployment control, the ability to run advanced reasoning models locally—without relying on cloud inference APIs—can provide both cost savings and greater flexibility.

Licensing and Access

The model is released under the NVIDIA Open Model License, which permits commercial usage. It is available through Hugging Face at huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-4B-v1.1, with all relevant model weights, configuration files, and tokenizer artifacts openly accessible. The license structure aligns with NVIDIA’s broader strategy of supporting developer ecosystems around its open models.

Conclusion

Nemotron Nano 4B represents NVIDIA’s continued investment in bringing scalable, practical AI models to a broader development audience—especially those targeting edge or cost-sensitive deployment scenarios. While the field continues to see rapid progress in ultra-large models, compact and efficient models like Nemotron Nano 4B provide a counterbalance, enabling deployment flexibility without compromising too heavily on performance.


Check out the Model on Hugging Face. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.

The post NVIDIA Releases Llama Nemotron Nano 4B: An Efficient Open Reasoning Model Optimized for Edge AI and Scientific Tasks appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Llama Nemotron Nano 4B 边缘AI 推理模型 英伟达
相关文章