MarkTechPost@AI 2024年10月05日
XR-Objects: A New Open-Source Augmented Reality Prototype that Transforms Physical Objects into Interactive Digital Portals Using Real-Time Object Segmentation and Multimodal Large Language Models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

XR-Objects是一项新的开源增强现实原型,利用实时对象分割和多模态大语言模型,将物理对象转化为交互式数字门户。它带来了对象智能增强,实现了现实与虚拟内容的无缝集成,具有多种优势,并通过用户研究得到了积极反馈。

🌐 XR-Objects实现了现实与虚拟内容的无缝集成,代表了一种范式转变,使用户能进行情境适宜的数字交互,如通过提取模拟物体的数字信息,让交互更具意义。

🎯 该项目采用对象中心的交互方式,区别于谷歌镜头的应用中心方式。交互直接锚定在用户环境中的物体上,通过世界空间用户界面进一步改进,避免了应用导航和手动选择物体的麻烦。

📋 实现XR-Objects的框架包括对象检测、定位与锚定、与多模态大语言模型耦合及行动执行。利用谷歌MediaPipe库进行对象检测,生成2D边界框,目前该库在COCO数据集上训练,可识别约80种物体。

📊 谷歌进行了用户研究,将XR-Objects与Gemini进行比较。结果显示,XR-Objects在HMD的时间消耗和外形因素方面取得了显著胜利,在手机的外形因素方面与聊天机器人和XR对象有所不同,且用户对其给予了积极反馈。

Advancements in Extended Reality (XR) have allowed for the fusion of real-world entities within the virtual world. However, despite the innumerable sensors, plethora of cameras, and expensive computer vision techniques, this integration poses a few critical questions. 1 ) Does this blend truly capture the essence of real-world objects or merely treat them as a backdrop? 2) If we continue along the path at this velocity, would it be “feasibly” accessible to the masses soon? When seen stand-alone without machine learning interventions, the future of XR seems hazy – A) Current endeavors transport surrounding objects into XR, but this integration is superficial and lacks meaningful interaction. B ) Masses are not most exuberant when they go before the technological constraints to experience the XR mentioned in part (A). When AI and its multiple fascinating applications, such as real-time unsupervised segmentation and generative AI content generation, come into perspective, solid ground is set for XR to achieve this XR future encompassing a seamless integration.

A team of researchers at Google recently unveiled XR-Objects, and in their literal words, they claim to make XR as immersive as – “right-clicking a digital file to open its context menu, but applied to physical objects.” The paper introduces ‘Augmented Object Intelligence’ that employs AI to extract digital information from analog objects, a task established earlier as strenuous. AOI represents a paradigm shift towards seamless integration of real and virtual content and gives users the freedom to context-appropriate digital interactions. Google Researchers combined AR developments in spatial understanding via SLAM with object detection and segmentation integrated with Multimodal Large Language Model (MLLM) 

XR Object offers an object-centric interaction in contradistinction to the application-centric approach of Google Lens. Here, interactions are directly anchored to objects within the user’s environment, further improved by a World-Space UI, which saves one the hassle of navigating through applications and manually selecting objects. To ensure aesthetic appeal and avoid clutter, digital information is presented in semi-transparent bubbles that serve as subtle minimalist prompts.

The framework to achieve this state-of-the-art in XR is straightforward. The quarter-fold strategy is – A) Object Detection and B) Localisation and Anchoring of objects. C) Coupling each object with MLLM D) Action Execution. Google MediaPipe library, which essentially uses a mobile-optimized CNN, comes in handy for the first task and generates 2D bounding boxes that initiate AR anchoring and localization. Currently, this CNN is trained on a COCO dataset that categorizes around 80 objects. Initially, Depth Maps are used for AR localization, and an object proxy template containing the object’s context menu is initiated. At last, an MLLM(PaLI) is coupled with each object, and the cropped bounding box from step A becomes the prompt. This makes the algorithm stand out and identify “Superior Dark Soy Sauce” from the ordinary bottle kept in your kitchen.

Google performed a user study to compare XR Object against Gemini, and the results were no surprise given the above context. XR achieved sweet victories in time consumption and form factor for HMD. The form factor for the phone was split between chatbot and XR objects. The HALIE survey results for both Chatbot and XR were similar. The subject users also gave appreciative feedback for XR on how helpful and efficient it was. Users also provided feedback to improve its ergonomic feasibility.

This new AOI paradigm is promising and would grow with acceleration in LLM functionalities. It would be interesting to see if its counter Meta, which has made massive strides in segmentation and LLM, would develop new solutions to supersede XR Objects and take XR to a new zenith.


Check out the Paper and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit

Interested in promoting your company, product, service, or event to over 1 Million AI developers and researchers? Let’s collaborate!

The post XR-Objects: A New Open-Source Augmented Reality Prototype that Transforms Physical Objects into Interactive Digital Portals Using Real-Time Object Segmentation and Multimodal Large Language Models appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

XR-Objects 增强现实 对象智能 无缝集成
相关文章