热点
关于我们
xx
xx
"
MLLM
" 相关文章
The Effect of Compression Techniques on Large Multimodal Language Models in the Medical Domain
cs.AI updates on arXiv.org
2025-07-30T04:12:05.000000Z
A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model
cs.AI updates on arXiv.org
2025-07-24T05:31:18.000000Z
Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning
cs.AI updates on arXiv.org
2025-07-24T05:30:58.000000Z
Self-Contradiction as Self-Improvement: Mitigating the Generation-Understanding Gap in MLLMs
cs.AI updates on arXiv.org
2025-07-23T04:03:30.000000Z
On Pre-training of Multimodal Language Models Customized for Chart Understanding
cs.AI updates on arXiv.org
2025-07-21T04:06:48.000000Z
ICCV2025 | One image is all you need,多模态指令数据合成,你只管给图,剩下的交给Oasis
机器之心
2025-07-18T07:52:44.000000Z
LaSM: Layer-wise Scaling Mechanism for Defending Pop-up Attack on GUI Agents
cs.AI updates on arXiv.org
2025-07-16T04:28:49.000000Z
A Training-Free, Task-Agnostic Framework for Enhancing MLLM Performance on High-Resolution Images
cs.AI updates on arXiv.org
2025-07-15T04:24:36.000000Z
Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI
cs.AI updates on arXiv.org
2025-07-15T04:24:25.000000Z
Improving MLLM's Document Image Machine Translation via Synchronously Self-reviewing Its OCR Proficiency
cs.AI updates on arXiv.org
2025-07-14T04:08:33.000000Z
DLaVA: Document Language and Vision Assistant for Answer Localization with Enhanced Interpretability and Trustworthiness
cs.AI updates on arXiv.org
2025-07-11T04:04:27.000000Z
Robust Multimodal Large Language Models Against Modality Conflict
cs.AI updates on arXiv.org
2025-07-11T04:04:02.000000Z
Iterative Zoom-In: Temporal Interval Exploration for Long Video Understanding
cs.AI updates on arXiv.org
2025-07-08T05:53:47.000000Z
Multimodal LLM Integrated Semantic Communications for 6G Immersive Experiences
cs.AI updates on arXiv.org
2025-07-08T04:33:59.000000Z
ICML 2025 | Parrot:通过多语言视觉指令微调,让AI说地道多国语言
阿里技术
2025-06-23T05:06:56.000000Z
ICML 2025 | Parrot:通过多语言视觉指令微调,让AI说地道多国语言
阿里技术
2025-06-23T04:51:26.000000Z
20个样本,搞定多模态思维链!UCSC重磅开源:边画框,边思考
新智元
2025-06-18T13:18:15.000000Z
社区供稿 | 阶跃星辰开源图像编辑模型 Step1X-Edit: 人人都能用的“改图大师”!
Hugging Face
2025-06-12T02:32:47.000000Z
ACL 2025 | 多维阅卷,智识觉醒:打开多模态大模型看图写作评估的认知之门
PaperWeekly
2025-06-11T09:17:56.000000Z
CVPR 2025:73%人类认同率,Video-Bench实现视频质量精准打
36氪 - 科技频道
2025-06-03T11:44:12.000000Z