Seeing Beyond Frames: Zero-Shot Pedestrian Intention Prediction with Raw Temporal Video and Multimodal Cues

cs.AI updates on arXiv.org 07月30日 12:46

Seeing Beyond Frames: Zero-Shot Pedestrian Intention Prediction with Raw Temporal Video and Multimodal Cues

本文介绍了一种名为BF-PIP的零样本行人意图预测方法，基于Gemini 2.5 Pro，通过短连续视频和结构化JAAD元数据直接推断行人过街意图，无需额外训练，预测准确率达到73%，优于GPT-4V基线18%，为智能交通系统中的敏捷感知模块提供了新思路。

arXiv:2507.21161v1 Announce Type: cross Abstract: Pedestrian intention prediction is essential for autonomous driving in complex urban environments. Conventional approaches depend on supervised learning over frame sequences and require extensive retraining to adapt to new scenarios. Here, we introduce BF-PIP (Beyond Frames Pedestrian Intention Prediction), a zero-shot approach built upon Gemini 2.5 Pro. It infers crossing intentions directly from short, continuous video clips enriched with structured JAAD metadata. In contrast to GPT-4V based methods that operate on discrete frames, BF-PIP processes uninterrupted temporal clips. It also incorporates bounding-box annotations and ego-vehicle speed via specialized multimodal prompts. Without any additional training, BF-PIP achieves 73% prediction accuracy, outperforming a GPT-4V baseline by 18 %. These findings illustrate that combining temporal video inputs with contextual cues enhances spatiotemporal perception and improves intent inference under ambiguous conditions. This approach paves the way for agile, retraining-free perception module in intelligent transportation system.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

行人意图预测零样本学习智能交通系统视频分析 Gemini 2.5 Pro

相关文章

Learning "Common Sense" and Physical Concepts with Roland Memisevic - TWiML Talk #111

广州亿航智能单季交付量创下新高，核心零部件实现国产化

Looking for a specific action in a video? This AI-based method can find it for you

Understanding System Prompts and the Power of Zero-shot vs. Few-shot Prompting in Artificial Intelligence (AI)

ROBOSHOT by University of Wisconsin-Madison Enhancing Zero-Shot Learning Robustness: A Novel Machine Learning Approach to Bias Mitigation

Borg Collective | The Templin Institute [27:12]

Show HN: EndType - 从图像、视频和 PDF 中提取结构化数据

深圳】求教JR们，这种能否举报危险驾驶，加塞不成把我逼停

New algorithm discovers language just by watching videos

Researchers at Brown University Explore Zero-Shot Cross-Lingual Generalization of Preference Tuning in Detoxifying LLMs