MarkTechPost@AI 04月08日
Sensor-Invariant Tactile Representation for Zero-Shot Transfer Across Vision-Based Tactile Sensors
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了Sensor-Invariant Tactile Representation (SITR),一种能够在不同视觉触觉传感器之间进行零样本迁移的触觉表示框架。SITR通过学习传感器不变的特征,解决了现有触觉传感器在不同设计和制造差异下难以迁移的问题。研究人员利用校准图像、监督对比学习和大规模合成数据集,使模型在无需重新训练或微调的情况下,实现跨传感器的泛化。SITR的出现,为机器人操作和触觉研究带来了新的突破。

👁️ 视觉触觉传感器在智能系统中至关重要,但传感器间的差异导致迁移性差。SITR框架旨在解决这个问题,实现零样本迁移。

⚙️ SITR的核心创新包括:利用校准图像表征传感器;采用监督对比学习强调触觉数据的几何特性;构建包含100万个样本的大规模合成数据集。

💻 SITR的训练过程包括:从输入图像中减去传感器背景;将图像线性投影为token;使用像素级法线图重建损失和对比损失来指导训练。

🏆 在物体分类和位姿估计任务中,SITR均优于现有模型。在跨传感器测试中,SITR展现出强大的泛化能力,解决了传统模型难以跨传感器应用的问题。

🚀 SITR代表了触觉传感统一方法的重要一步,它促进了机器人操作和触觉研究的发展,消除了这些有前景的传感器技术应用的关键障碍。

Tactile sensing is a crucial modality for intelligent systems to perceive and interact with the physical world. The GelSight sensor and its variants have emerged as influential tactile technologies, providing detailed information about contact surfaces by transforming tactile data into visual images. However, vision-based tactile sensing lacks transferability between sensors due to design and manufacturing variations, which result in significant differences in tactile signals. Minor differences in optical design or manufacturing processes can create substantial discrepancies in sensor output, causing machine learning models trained on one sensor to perform poorly when applied to others.

Computer vision models have been widely applied to vision-based tactile images due to their inherently visual nature. Researchers have adapted representation learning methods from the vision community, with contrastive learning being popular for developing tactile and visual-tactile representations for specific tasks. Auto-encoding representation approaches are also explored, with some researchers utilizing Masked Auto-Encoder (MAE) to learn tactile representations. Methods like general-purpose multimodal representations utilize multiple tactile datasets in LLM frameworks, encoding sensor types as tokens. Despite these efforts, current methods often require large datasets, treat sensor types as fixed categories, and lack the flexibility to generalize to unseen sensors.

Researchers from the University of Illinois Urbana-Champaign proposed Sensor-Invariant Tactile Representations (SITR), a tactile representation to transfer across various vision-based tactile sensors in a zero-shot manner. It is based on the premise that achieving sensor transferability requires learning effective sensor-invariant representations through exposure to diverse sensor variations. It uses three core innovations: utilizing easy-to-acquire calibration images to characterize individual sensors with a transformer encoder, utilizing supervised contrastive learning to emphasize geometric aspects of tactile data across multiple sensors, and developing a large-scale synthetic dataset that contains 1M examples across 100 sensor configurations.

Researchers used the tactile image and a set of calibration images for the sensor as inputs for the network. The sensor background is subtracted from all input images to isolate the pixel-wise color changes. Following Vision Transformer (ViT), these images are linearly projected into tokens, with calibration images requiring tokenization only once per sensor. Further, two supervision signals guide the training process: a pixel-wise normal map reconstruction loss for the output patch tokens and a contrastive loss for the class token. During pre-training, a lightweight decoder reconstructs the contact surface as a normal map from the encoder’s output. Moreover, SITR  employs Supervised Contrastive Learning (SCL), extending traditional contrastive approaches by utilizing label information to define similarity.

In object classification tests using the researchers’ real-world dataset, SITR outperforms all baseline models when transferred across different sensors. While most models perform well in no-transfer settings, they fail to generalize when tested on distinct sensors. It shows SITR’s ability to capture meaningful, sensor-invariant features that remain robust despite changes in the sensor domain. In pose estimation tasks, where the goal is to estimate 3-DoF position changes using initial and final tactile images, SITR reduces the Root Mean Square Error by approximately 50% compared to baselines. Unlike classification results, ImageNet pre-training only marginally improves pose estimation performance, showing that features learned from natural images may not transfer effectively to tactile domains for precise regression tasks.

In this paper, researchers introduced SITR, a tactile representation framework that transfers across various vision-based tactile sensors in a zero-shot manner. They constructed large-scale, sensor-aligned datasets using synthetic and real-world data and developed a method to train SITR to capture dense, sensor-invariant features. The SITR represents a step toward a unified approach to tactile sensing, where models can generalize seamlessly across different sensor types without retraining or fine-tuning. This breakthrough has the potential to accelerate advancements in robotic manipulation and tactile research by removing a key barrier to the adoption and implementation of these promising sensor technologies.


Check out the Paper and Code. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

[Register Now] miniCON Virtual Conference on OPEN SOURCE AI: FREE REGISTRATION + Certificate of Attendance + 3 Hour Short Event (April 12, 9 am- 12 pm PST) + Hands on Workshop [Sponsored]

The post Sensor-Invariant Tactile Representation for Zero-Shot Transfer Across Vision-Based Tactile Sensors appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

触觉传感 零样本迁移 SITR 机器人
相关文章