MLLMs_Fishai

热点

"MLLMs" 相关文章

From Generator to Embedder: Harnessing Innate Abilities of Multimodal LLMs via Building Zero-Shot Discriminative Embedding Model

cs.AI updates on arXiv.org 2025-08-05T11:28:49.000000Z

EH-Benchmark Ophthalmic Hallucination Benchmark and Agent-Driven Top-Down Traceable Reasoning Workflow

cs.AI updates on arXiv.org 2025-08-05T11:10:26.000000Z

Multimodal Large Language Models for End-to-End Affective Computing: Benchmarking and Boosting with Generative Knowledge Prompting

cs.AI updates on arXiv.org 2025-08-05T11:10:21.000000Z

FairReason: Balancing Reasoning and Social Bias in MLLMs

cs.AI updates on arXiv.org 2025-08-01T04:08:13.000000Z

HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs

cs.AI updates on arXiv.org 2025-07-24T05:31:20.000000Z

Pixels, Patterns, but No Poetry: To See The World like Humans

cs.AI updates on arXiv.org 2025-07-24T05:31:03.000000Z

CCL-XCoT: An Efficient Cross-Lingual Knowledge Transfer Method for Mitigating Hallucination Generation

cs.AI updates on arXiv.org 2025-07-22T04:34:45.000000Z

Exposing and Mitigating Calibration Biases and Demographic Unfairness in MLLM Few-Shot In-Context Learning for Medical Image Classification

cs.AI updates on arXiv.org 2025-07-22T04:34:26.000000Z

Automating Steering for Safe Multimodal Large Language Models

cs.AI updates on arXiv.org 2025-07-18T04:13:55.000000Z

Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification

cs.AI updates on arXiv.org 2025-07-17T04:14:12.000000Z

Warehouse Spatial Question Answering with LLM Agent

cs.AI updates on arXiv.org 2025-07-16T04:28:56.000000Z

Prompt4Trust: A Reinforcement Learning Prompt Augmentation Framework for Clinically-Aligned Confidence Calibration in Multimodal Large Language Models

cs.AI updates on arXiv.org 2025-07-15T04:24:38.000000Z

PyVision: Agentic Vision with Dynamic Tooling

cs.AI updates on arXiv.org 2025-07-11T04:04:21.000000Z

Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders

cs.AI updates on arXiv.org 2025-07-08T05:54:14.000000Z

Enhancing Sports Strategy with Video Analytics and Data Mining: Assessing the effectiveness of Multimodal LLMs in tennis video analysis

cs.AI updates on arXiv.org 2025-07-08T04:34:01.000000Z

HV-MMBench: Benchmarking MLLMs for Human-Centric Video Understanding

cs.AI updates on arXiv.org 2025-07-08T04:33:50.000000Z

PathCoT: Chain-of-Thought Prompting for Zero-shot Pathology Visual Reasoning

cs.AI updates on arXiv.org 2025-07-03T04:07:17.000000Z

中国科学院科学家首次证实：大语言模型能像人类一样“理解”事物

IT之家 2025-06-11T01:38:31.000000Z

ICML 2025 Spotlight | 多模态大模型暴露短板？EMMA基准深度揭秘多模态推理能力

机器之心 2025-05-20T06:50:21.000000Z

GPT-4o不敌Qwen，无一模型及格！UC伯克利/港大等联合团队提出多模态新基准：考察多视图理解能力

智源社区 2025-05-16T05:03:38.000000Z

Copyright © 2019 FISHAI.All Rights Reserved