热点
关于我们
xx
xx
"
视觉语言模型
" 相关文章
【论文通读】OmniDrive-NVIDIA-CVPR 2025
掘金 人工智能
2025-08-02T09:55:12.000000Z
DHCP: Detecting Hallucinations by Cross-modal Attention Pattern in Large Vision-Language Models
cs.AI updates on arXiv.org
2025-08-01T04:08:31.000000Z
Vision-Language Fusion for Real-Time Autonomous Driving: Goal-Centered Cross-Attention of Camera, HD-Map, & Waypoints
cs.AI updates on arXiv.org
2025-08-01T04:08:28.000000Z
CHECK-MAT: Checking Hand-Written Mathematical Answers for the Russian Unified State Exam
cs.AI updates on arXiv.org
2025-08-01T04:08:26.000000Z
ART: Adaptive Relation Tuning for Generalized Relation Prediction
cs.AI updates on arXiv.org
2025-08-01T04:08:25.000000Z
Augmented Vision-Language Models: A Systematic Review
cs.AI updates on arXiv.org
2025-08-01T04:08:23.000000Z
Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving
cs.AI updates on arXiv.org
2025-07-31T04:48:25.000000Z
StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification
cs.AI updates on arXiv.org
2025-07-31T04:48:24.000000Z
Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions
cs.AI updates on arXiv.org
2025-07-31T04:48:22.000000Z
Visual Language Models as Zero-Shot Deepfake Detectors
cs.AI updates on arXiv.org
2025-07-31T04:48:08.000000Z
Apple Researchers Introduce FastVLM: Achieving State-of-the-Art Resolution-Latency-Accuracy Trade-off in Vision Language Models
MarkTechPost@AI
2025-07-30T07:21:37.000000Z
Self-Aware Safety Augmentation: Leveraging Internal Semantic Understanding to Enhance Safety in Vision-Language Models
cs.AI updates on arXiv.org
2025-07-30T04:11:58.000000Z
SafeDriveRAG: Towards Safe Autonomous Driving with Knowledge Graph-based Retrieval-Augmented Generation
cs.AI updates on arXiv.org
2025-07-30T04:11:56.000000Z
CopyJudge: Automated Copyright Infringement Identification and Mitigation in Text-to-Image Diffusion Models
cs.AI updates on arXiv.org
2025-07-29T04:22:46.000000Z
Cognitive Chain-of-Thought: Structured Multimodal Reasoning about Social Situations
cs.AI updates on arXiv.org
2025-07-29T04:22:22.000000Z
LRR-Bench: Left, Right or Rotate? Vision-Language models Still Struggle With Spatial Understanding Tasks
cs.AI updates on arXiv.org
2025-07-29T04:22:18.000000Z
Trust the Model: Compact VLMs as In-Context Judges for Image-Text Data Quality
cs.AI updates on arXiv.org
2025-07-29T04:22:17.000000Z
Multi-Agent Interactive Question Generation Framework for Long Document Understanding
cs.AI updates on arXiv.org
2025-07-29T04:22:17.000000Z
Multi-Stage Verification-Centric Framework for Mitigating Hallucination in Multi-Modal RAG
cs.AI updates on arXiv.org
2025-07-29T04:22:16.000000Z
TAPS : Frustratingly Simple Test Time Active Learning for VLMs
cs.AI updates on arXiv.org
2025-07-29T04:22:13.000000Z