MarkTechPost@AI 18小时前
NVIDIA AI Releases GraspGen: A Diffusion-Based Framework for 6-DOF Grasping in Robotics
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

NVIDIA推出的GraspGen是一个创新的基于扩散模型的抓取生成框架,旨在解决机器人抓取领域长期存在的挑战。该框架利用大规模合成数据在模拟环境中训练,有效克服了真实世界数据收集成本高昂且泛化性差的限制。GraspGen采用新颖的Diffusion Transformer架构和“on-generator”训练方式,显著提高了抓取姿态的准确性和鲁棒性,并在多种抓取器类型、复杂场景及跨越模拟到真实的转移实验中展现出卓越性能。该技术的发布不仅提升了机器人自动化和操作的效率,还通过开源数据集和代码推动了整个机器人技术社区的进步。

💡 GraspGen框架的核心在于利用大规模合成数据生成抓取姿态,通过在NVIDIA Isaac Sim中模拟超过5300万次抓取,并结合Objaverse海量物体模型,解决了传统方法对昂贵真实世界数据依赖的问题,实现了更好的泛化能力和可扩展性。

🚀 该框架采用创新的Diffusion Transformer架构,利用PointTransformerV3编码器处理3D点云数据,并通过迭代扩散过程预测抓取姿态。与依赖PointNet++或接触点表示的传统方法相比,GraspGen在抓取质量和计算效率上均有提升,并引入了“on-generator”训练方式,使判别器能学习模型自身的潜在错误,从而更有效地过滤无效抓取。

🌐 GraspGen展现了强大的多抓取器适应性和环境鲁棒性,成功应用于平行颚抓手和吸盘抓手,并计划扩展至多指抓手。该技术在处理部分点云(如单视角观测)和复杂杂乱场景(如FetchBench基准)时表现出色,并且在纯模拟训练下,能够实现到真实机器人平台的零样本(zero-shot)迁移,即使在有噪声的视觉输入下也能保持高成功率。

📈 在性能测试中,GraspGen在FetchBench基准上显著优于现有SOTA方法,抓取成功率提升近17%。在真实机器人实验中,使用UR10机器人成功率达到81.3%,比M2T2高出28%,并且能精准识别目标物体,避免生成场景中的杂乱抓取。NVIDIA还开源了GraspGen数据集和代码,以加速机器人领域的研发。

🌟 GraspGen的发布标志着机器人6-DOF抓取技术的一大进步,它整合了模拟、学习和模块化机器人技术,提供了一个强大的即插即用解决方案,为实现可靠、泛化的真实世界机器人抓取奠定了坚实基础,推动了通用机器人操作的发展。

Robotic grasping is a cornerstone task for automation and manipulation, critical in domains spanning from industrial picking to service and humanoid robotics. Despite decades of research, achieving robust, general-purpose 6-degree-of-freedom (6-DOF) grasping remains a challenging open problem. Recently, NVIDIA unveiled GraspGen, a novel diffusion-based grasp generation framework that promises to bring state-of-the-art (SOTA) performance with unprecedented flexibility, scalability, and real-world reliability.

The Grasping Challenge and Motivation

Accurate and reliable grasp generation in 3D space—where grasp poses must be expressed in terms of position and orientation—requires algorithms that can generalize across unknown objects, diverse gripper types, and challenging environmental conditions including partial observations and clutter. Classical model-based grasp planners depend heavily on precise object pose estimation or multi-view scans, making them impractical for in-the-wild settings. Data-driven learning approaches show promise, but current methods tend to struggle with generalization and scalability, especially when shifting to new grippers or real-world cluttered environments.

Another limitation of many existing grasping systems is their dependency on large amounts of costly real-world data collection or domain-specific tuning. Collecting and annotating real grasp datasets is expensive and does not easily transfer between gripper types or scene complexities.

Key Idea: Large-Scale Simulation and Diffusion Model Generative Grasping

NVIDIA’s GraspGen pivots away from expensive real-world data collection towards leveraging large-scale synthetic data generation in simulation—particularly utilizing the vast diversity of object meshes from the Objaverse dataset (over 8,000 objects) and simulated gripper interactions (over 53 million grasps generated).

GraspGen formulates grasp generation as a denoising diffusion probabilistic model (DDPM) operating on the SE(3) pose space (comprising 3D rotations and translations). Diffusion models, well-established in image generation, iteratively refine random noise samples towards realistic grasp poses conditioned on an object-centric point cloud representation. This generative modeling approach naturally captures the multi-modal distribution of valid grasps on complex objects, enabling spatial diversity critical for handling clutter and task constraints.

Architecting GraspGen: Diffusion Transformer and On-Generator Training

Multi-Embodiment Grasping and Environmental Flexibility

GraspGen is demonstrated across three gripper types:

Crucially, the framework generalizes to:

Benchmarking and Performance

Dataset Release and Open Source

NVIDIA released the GraspGen dataset publicly to foster community progress. It consists of approximately 53 million simulated grasps across 8,515 object meshes licensed under permissive Creative Commons policies. The dataset was generated using NVIDIA Isaac Sim with detailed physics-based grasp success labeling, including shaking tests for stability.

Alongside the dataset, the GraspGen codebase and pretrained models are available under open-source licenses at https://github.com/NVlabs/GraspGen, with additional project material at https://graspgen.github.io/.

Conclusion

GraspGen represents a major advance in 6-DOF robotic grasping, introducing a diffusion-based generative framework that outperforms prior methods while scaling across multiple grippers, scene complexities, and observability conditions. Its novel on-generator training recipe for grasp scoring decisively improves filtering of model errors, leading to dramatic gains in grasp success and task-level performance both in simulation and on real robots.

By publicly releasing both code and a massive synthetic grasp dataset, NVIDIA empowers the robotics community to further develop and apply these innovations. The GraspGen framework consolidates simulation, learning, and modular robotics components into a turnkey solution, advancing the vision of reliable, real-world robotic grasping as a broadly applicable foundational building block in general-purpose robotic manipulation.


Check out the PaperProject and GitHub Page. All credit for this research goes to the researchers of this project. SUBSCRIBE NOW to our AI Newsletter

The post NVIDIA AI Releases GraspGen: A Diffusion-Based Framework for 6-DOF Grasping in Robotics appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

GraspGen 机器人抓取 扩散模型 人工智能 自动化
相关文章