视觉问答_Fishai

热点

"视觉问答" 相关文章

Enhancing Spatial Reasoning through Visual and Textual Thinking

cs.AI updates on arXiv.org 2025-07-29T04:22:25.000000Z

Ask and Remember: A Questions-Only Replay Strategy for Continual Visual Question Answering

cs.AI updates on arXiv.org 2025-07-29T04:21:41.000000Z

Multimodal AI for Gastrointestinal Diagnostics: Tackling VQA in MEDVQA-GI 2025

cs.AI updates on arXiv.org 2025-07-22T04:44:34.000000Z

多模态模型学会“按需搜索”，少搜30%还更准！字节&NTU新研究优化多模态模型搜索策略

智源社区 2025-07-09T11:53:50.000000Z

多模态大模型事实正确性评估：o1最强，模型普遍过于自信，最擅长现代建筑/工程技术/科学

智源社区 2025-02-24T07:37:16.000000Z

多模态大模型事实正确性评估：o1最强，模型普遍过于自信，最擅长现代建筑/工程技术/科学

量子位 2025-02-24T01:13:50.000000Z

Advancing Large Multimodal Models: DocHaystack, InfoHaystack, and the Vision-Centric Retrieval-Augmented Generation Framework

MarkTechPost@AI 2024-12-07T01:34:51.000000Z

北大、清华等提出LLaVA-o1，视觉语言模型中的o1来了！

PaperWeekly 2024-11-23T11:41:42.000000Z

Fine-tune multimodal models for vision and text use cases on Amazon SageMaker JumpStart

AWS Machine Learning Blog 2024-11-15T17:03:17.000000Z

Generalized Visual Language Models

Lil'Log 2024-11-09T05:43:41.000000Z

ChatGPT自学指南：宝藏参考书大盘点

智源社区 2024-07-30T07:07:01.000000Z

Visual Haystacks Benchmark: The First “Visual-Centric” Needle-In-A-Haystack (NIAH) Benchmark to Assess LMMs’ Capability in Long-Context Visual Retrieval and Reasoning

MarkTechPost@AI 2024-07-24T07:19:20.000000Z

Google DeepMind Unveils PaliGemma: A Versatile 3B Vision-Language Model VLM with Large-Scale Ambitions

MarkTechPost@AI 2024-07-12T11:16:31.000000Z

多模态大模型看懂图片也会答错，智源联合多家机构推出多模态模型鲁棒性测试基准

PaperAgent 2024-07-04T14:06:28.000000Z

Robust Visual Reasoning with Adriana Kovashka - #463

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) 2024-05-12T03:02:26.000000Z

Copyright © 2019 FISHAI.All Rights Reserved