视觉语言模型_Fishai

热点

"视觉语言模型" 相关文章

【论文通读】OmniDrive-NVIDIA-CVPR 2025

掘金人工智能 2025-08-02T09:55:12.000000Z

DHCP: Detecting Hallucinations by Cross-modal Attention Pattern in Large Vision-Language Models

cs.AI updates on arXiv.org 2025-08-01T04:08:31.000000Z

Vision-Language Fusion for Real-Time Autonomous Driving: Goal-Centered Cross-Attention of Camera, HD-Map, & Waypoints

cs.AI updates on arXiv.org 2025-08-01T04:08:28.000000Z

CHECK-MAT: Checking Hand-Written Mathematical Answers for the Russian Unified State Exam

cs.AI updates on arXiv.org 2025-08-01T04:08:26.000000Z

ART: Adaptive Relation Tuning for Generalized Relation Prediction

cs.AI updates on arXiv.org 2025-08-01T04:08:25.000000Z

Augmented Vision-Language Models: A Systematic Review

cs.AI updates on arXiv.org 2025-08-01T04:08:23.000000Z

Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving

cs.AI updates on arXiv.org 2025-07-31T04:48:25.000000Z

StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification

cs.AI updates on arXiv.org 2025-07-31T04:48:24.000000Z

Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions

cs.AI updates on arXiv.org 2025-07-31T04:48:22.000000Z

Visual Language Models as Zero-Shot Deepfake Detectors

cs.AI updates on arXiv.org 2025-07-31T04:48:08.000000Z

Apple Researchers Introduce FastVLM: Achieving State-of-the-Art Resolution-Latency-Accuracy Trade-off in Vision Language Models

MarkTechPost@AI 2025-07-30T07:21:37.000000Z

Self-Aware Safety Augmentation: Leveraging Internal Semantic Understanding to Enhance Safety in Vision-Language Models

cs.AI updates on arXiv.org 2025-07-30T04:11:58.000000Z

SafeDriveRAG: Towards Safe Autonomous Driving with Knowledge Graph-based Retrieval-Augmented Generation

cs.AI updates on arXiv.org 2025-07-30T04:11:56.000000Z

CopyJudge: Automated Copyright Infringement Identification and Mitigation in Text-to-Image Diffusion Models

cs.AI updates on arXiv.org 2025-07-29T04:22:46.000000Z

Cognitive Chain-of-Thought: Structured Multimodal Reasoning about Social Situations

cs.AI updates on arXiv.org 2025-07-29T04:22:22.000000Z

LRR-Bench: Left, Right or Rotate? Vision-Language models Still Struggle With Spatial Understanding Tasks

cs.AI updates on arXiv.org 2025-07-29T04:22:18.000000Z

Trust the Model: Compact VLMs as In-Context Judges for Image-Text Data Quality

cs.AI updates on arXiv.org 2025-07-29T04:22:17.000000Z

Multi-Agent Interactive Question Generation Framework for Long Document Understanding

cs.AI updates on arXiv.org 2025-07-29T04:22:17.000000Z

Multi-Stage Verification-Centric Framework for Mitigating Hallucination in Multi-Modal RAG

cs.AI updates on arXiv.org 2025-07-29T04:22:16.000000Z

TAPS : Frustratingly Simple Test Time Active Learning for VLMs

cs.AI updates on arXiv.org 2025-07-29T04:22:13.000000Z

Copyright © 2019 FISHAI.All Rights Reserved