NVIDIA AI Releases GraspGen: A Diffusion-Based Framework for 6-DOF Grasping in Robotics

Robotic grasping is a cornerstone task for automation and manipulation, critical in domains spanning from industrial picking to service and humanoid robotics. Despite decades of research, achieving robust, general-purpose 6-degree-of-freedom (6-DOF) grasping remains a challenging open problem. Recently, NVIDIA unveiled GraspGen, a novel diffusion-based grasp generation framework that promises to bring state-of-the-art (SOTA) performance with unprecedented flexibility, scalability, and real-world reliability.

The Grasping Challenge and Motivation

Accurate and reliable grasp generation in 3D space—where grasp poses must be expressed in terms of position and orientation—requires algorithms that can generalize across unknown objects, diverse gripper types, and challenging environmental conditions including partial observations and clutter. Classical model-based grasp planners depend heavily on precise object pose estimation or multi-view scans, making them impractical for in-the-wild settings. Data-driven learning approaches show promise, but current methods tend to struggle with generalization and scalability, especially when shifting to new grippers or real-world cluttered environments.

Another limitation of many existing grasping systems is their dependency on large amounts of costly real-world data collection or domain-specific tuning. Collecting and annotating real grasp datasets is expensive and does not easily transfer between gripper types or scene complexities.

Key Idea: Large-Scale Simulation and Diffusion Model Generative Grasping

NVIDIA’s GraspGen pivots away from expensive real-world data collection towards leveraging large-scale synthetic data generation in simulation—particularly utilizing the vast diversity of object meshes from the Objaverse dataset (over 8,000 objects) and simulated gripper interactions (over 53 million grasps generated).

GraspGen formulates grasp generation as a denoising diffusion probabilistic model (DDPM) operating on the SE(3) pose space (comprising 3D rotations and translations). Diffusion models, well-established in image generation, iteratively refine random noise samples towards realistic grasp poses conditioned on an object-centric point cloud representation. This generative modeling approach naturally captures the multi-modal distribution of valid grasps on complex objects, enabling spatial diversity critical for handling clutter and task constraints.

Architecting GraspGen: Diffusion Transformer and On-Generator Training

Diffusion Transformer Encoder:

On-Generator Training of Discriminator:

Efficient Weight Sharing:

Translation Normalization & Rotation Representations:

Multi-Embodiment Grasping and Environmental Flexibility

GraspGen is demonstrated across three gripper types:

Parallel-jaw grippers (Franka Panda, Robotiq-2F-140)Suction grippers (modeled analytically)Multi-fingered grippers (planned future extensions)

Crucially, the framework generalizes to:

Partial vs. Complete Point Clouds:

Single Objects and Cluttered Scenes:

FetchBench

Sim-to-Real Transfer:

Benchmarking and Performance

FetchBench Benchmark:

Precision-Coverage Gains:

Real Robot Experiments:

Dataset Release and Open Source

NVIDIA released the GraspGen dataset publicly to foster community progress. It consists of approximately 53 million simulated grasps across 8,515 object meshes licensed under permissive Creative Commons policies. The dataset was generated using NVIDIA Isaac Sim with detailed physics-based grasp success labeling, including shaking tests for stability.

Alongside the dataset, the GraspGen codebase and pretrained models are available under open-source licenses at https://github.com/NVlabs/GraspGen, with additional project material at https://graspgen.github.io/.

Conclusion

GraspGen represents a major advance in 6-DOF robotic grasping, introducing a diffusion-based generative framework that outperforms prior methods while scaling across multiple grippers, scene complexities, and observability conditions. Its novel on-generator training recipe for grasp scoring decisively improves filtering of model errors, leading to dramatic gains in grasp success and task-level performance both in simulation and on real robots.

By publicly releasing both code and a massive synthetic grasp dataset, NVIDIA empowers the robotics community to further develop and apply these innovations. The GraspGen framework consolidates simulation, learning, and modular robotics components into a turnkey solution, advancing the vision of reliable, real-world robotic grasping as a broadly applicable foundational building block in general-purpose robotic manipulation.

Check out the Paper, Project and GitHub Page. All credit for this research goes to the researchers of this project. SUBSCRIBE NOW to our AI Newsletter

The post NVIDIA AI Releases GraspGen: A Diffusion-Based Framework for 6-DOF Grasping in Robotics appeared first on MarkTechPost.

The Grasping Challenge and Motivation

Key Idea: Large-Scale Simulation and Diffusion Model Generative Grasping

Architecting GraspGen: Diffusion Transformer and On-Generator Training

Multi-Embodiment Grasping and Environmental Flexibility

Benchmarking and Performance

Dataset Release and Open Source

Conclusion

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签