MLLM_Fishai

热点

"MLLM" 相关文章

The Effect of Compression Techniques on Large Multimodal Language Models in the Medical Domain

cs.AI updates on arXiv.org 2025-07-30T04:12:05.000000Z

A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model

cs.AI updates on arXiv.org 2025-07-24T05:31:18.000000Z

Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning

cs.AI updates on arXiv.org 2025-07-24T05:30:58.000000Z

Self-Contradiction as Self-Improvement: Mitigating the Generation-Understanding Gap in MLLMs

cs.AI updates on arXiv.org 2025-07-23T04:03:30.000000Z

On Pre-training of Multimodal Language Models Customized for Chart Understanding

cs.AI updates on arXiv.org 2025-07-21T04:06:48.000000Z

ICCV2025 | One image is all you need，多模态指令数据合成，你只管给图，剩下的交给Oasis

机器之心 2025-07-18T07:52:44.000000Z

LaSM: Layer-wise Scaling Mechanism for Defending Pop-up Attack on GUI Agents

cs.AI updates on arXiv.org 2025-07-16T04:28:49.000000Z

A Training-Free, Task-Agnostic Framework for Enhancing MLLM Performance on High-Resolution Images

cs.AI updates on arXiv.org 2025-07-15T04:24:36.000000Z

Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI

cs.AI updates on arXiv.org 2025-07-15T04:24:25.000000Z

Improving MLLM's Document Image Machine Translation via Synchronously Self-reviewing Its OCR Proficiency

cs.AI updates on arXiv.org 2025-07-14T04:08:33.000000Z

DLaVA: Document Language and Vision Assistant for Answer Localization with Enhanced Interpretability and Trustworthiness

cs.AI updates on arXiv.org 2025-07-11T04:04:27.000000Z

Robust Multimodal Large Language Models Against Modality Conflict

cs.AI updates on arXiv.org 2025-07-11T04:04:02.000000Z

Iterative Zoom-In: Temporal Interval Exploration for Long Video Understanding

cs.AI updates on arXiv.org 2025-07-08T05:53:47.000000Z

Multimodal LLM Integrated Semantic Communications for 6G Immersive Experiences

cs.AI updates on arXiv.org 2025-07-08T04:33:59.000000Z

ICML 2025 | Parrot：通过多语言视觉指令微调，让AI说地道多国语言

阿里技术 2025-06-23T05:06:56.000000Z

ICML 2025 | Parrot：通过多语言视觉指令微调，让AI说地道多国语言

阿里技术 2025-06-23T04:51:26.000000Z

20个样本，搞定多模态思维链！UCSC重磅开源：边画框，边思考

新智元 2025-06-18T13:18:15.000000Z

社区供稿 | 阶跃星辰开源图像编辑模型 Step1X-Edit: 人人都能用的“改图大师”！

Hugging Face 2025-06-12T02:32:47.000000Z

ACL 2025 | 多维阅卷，智识觉醒：打开多模态大模型看图写作评估的认知之门

PaperWeekly 2025-06-11T09:17:56.000000Z

CVPR 2025：73%人类认同率，Video-Bench实现视频质量精准打

36氪 - 科技频道 2025-06-03T11:44:12.000000Z

Copyright © 2019 FISHAI.All Rights Reserved