MarkTechPost@AI 2024年07月09日
VCHAR: A Novel Artificial Intelligence AI Framework that Treats the Outputs of Atomic Activities as a Distribution Over Specified Intervals
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

VCHAR 是一种全新的 AI 框架,旨在解决智能环境中复杂人类活动识别 (CHAR) 的难题。它采用一种生成式方法,将原子活动的输出视为指定间隔内的分布,从而无需精确的标记。VCHAR 框架利用生成式方法为复杂活动分类提供易于理解的解释,通过视频输出,即使没有机器学习专业知识的用户也能轻松理解。

🤔 VCHAR 框架通过利用 Kullback-Leibler 散度来近似指定时间间隔内原子活动输出的分布,采用了一种方差驱动的策略。这种方法允许识别决定性原子活动,而无需消除瞬态状态或无关数据。因此,即使没有原子活动的详细标记,VCHAR 也能提高复杂活动的检测率。

👀 VCHAR 引入了一种新颖的生成式解码器框架,将基于传感器的模型输出转换为集成的视觉域表示。这包括对复杂活动和原子活动的可视化以及相关传感器信息。该框架使用语言模型 (LM) 代理来组织不同的数据源,并使用视觉语言模型 (VLM) 来生成全面的视觉输出。作者还提出了一种预训练的“基于传感器的基础模型”和一种带有掩码引导的“一次性微调策略”,以促进快速适应特定场景。

📈 在三个公开可用的数据集上的实验结果表明,VCHAR 的性能与传统方法相媲美,同时显著增强了 CHAR 系统的可解释性和可用性。通过集成语言模型 (LM) 和视觉语言模型 (VLM),可以合成全面、连贯的视觉叙述,代表检测到的活动和传感器信息。这种能力不仅有助于更好地理解和信任系统的输出,而且还增强了向可能没有技术背景的利益相关者传达发现的能力。

🚀 VCHAR 框架有效地解决了 CHAR 的挑战,通过消除对精确标记的需求,并提供复杂活动的易于理解的视觉表示。这种创新方法提高了活动识别的准确性,并将见解提供给非专家,缩短了原始传感器数据和可操作信息之间的差距。该框架的可适应性是通过预训练和一次性微调实现的,使其成为需要准确且与上下文相关的活动识别和描述的现实世界智能环境应用的有希望的解决方案。

Complex Human Activity Recognition (CHAR) in ubiquitous computing, particularly in smart environments, presents significant challenges due to the labor-intensive and error-prone process of labeling datasets with precise temporal information of atomic activities. This task becomes impractical in real-world scenarios where accurate and detailed labeling is scarce. The need for effective CHAR methods that do not rely on meticulous labeling is crucial for advancing applications in healthcare, elderly care, surveillance, and emergency response.

Traditional CHAR methods typically require detailed labeling of atomic activities within specific time intervals to train models effectively. These methods often involve segmenting data to improve accuracy, which is labor-intensive and prone to inaccuracies. In practice, many datasets only indicate the types of activities occurring within specific collection intervals without precise temporal or sequential labeling, leading to combinatorial complexity and potential errors in labeling.

To address these issues, a team of researchers from Rutgers University propose the Variance-Driven Complex Human Activity Recognition (VCHAR) framework. VCHAR leverages a generative approach to treat atomic activity outputs as distributions over specified intervals, thus eliminating the need for precise labeling. This framework utilizes generative methodologies to provide intelligible explanations for complex activity classifications through video-based outputs, making it accessible to users without prior machine learning expertise.

The VCHAR framework employs a variance-driven approach that utilizes the Kullback-Leibler divergence to approximate the distribution of atomic activity outputs within specific time intervals. This method allows for the recognition of decisive atomic activities without the need to eliminate transient states or irrelevant data. By doing so, VCHAR enhances the detection rates of complex activities even when detailed labeling of atomic activities is absent.

Furthermore, VCHAR introduces a novel generative decoder framework that transforms sensor-based model outputs into integrated visual domain representations. This includes visualizations of complex and atomic activities along with relevant sensor information. The framework uses a Language Model (LM) agent to organize diverse data sources and a Vision-Language Model (VLM) to generate comprehensive visual outputs. The authors also propose a pretrained “sensor-based foundation model” and a “one-shot tuning strategy” with masked guidance to facilitate rapid adaptation to specific scenarios. Experimental results on three publicly available datasets show that VCHAR performs competitively with traditional methods while significantly enhancing the interpretability and usability of CHAR systems.

The integration of a Language Model (LM) and a Vision-Language Model (VLM) allows for the synthesis of comprehensive, coherent visual narratives that represent the detected activities and sensor information. This capability not only aids in better understanding and trust in the system’s outputs but also enhances the ability to communicate findings to stakeholders who may not have a technical background.

The VCHAR framework effectively addresses the challenges of CHAR by eliminating the need for precise labeling and providing intelligible visual representations of complex activities. This innovative approach improves the accuracy of activity recognition and makes the insights accessible to non-experts, bridging the gap between raw sensor data and actionable information. The framework’s adaptability, achieved through pre-training and one-shot tuning, makes it a promising solution for real-world smart environment applications that require accurate and contextually relevant activity recognition and description.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter.. Don’t Forget to join our 46k+ ML SubReddit

If You are interested in a promotional partnership (content/ad/newsletter), please fill out this form.

The post VCHAR: A Novel Artificial Intelligence AI Framework that Treats the Outputs of Atomic Activities as a Distribution Over Specified Intervals appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

VCHAR 复杂人类活动识别 人工智能 生成式模型 智能环境
相关文章