MarkTechPost@AI 17小时前
ETH and Stanford Researchers Introduce MIRIAD: A 5.8M Pair Dataset to Improve LLM Accuracy in Medical AI
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

MIRIAD数据集由ETH苏黎世、斯坦福大学和梅奥诊所的研究人员开发,包含580万个高质量的医疗问题-答案对,旨在提高大型语言模型(LLMs)在医疗领域的准确性。该数据集通过半自动化的流程,将医疗知识结构化,并与同行评审的文献相关联。与传统非结构化数据集相比,MIRIAD显著提高了LLMs在复杂医疗问答任务中的准确性,并增强了幻觉检测能力。此外,MIRIAD-Atlas可视化工具方便用户探索和交互,为医疗保健领域的可信赖AI奠定了基础。

💡 医疗LLMs面临的主要挑战是生成不准确的医疗信息,为了解决这个问题,研究人员开发了MIRIAD数据集,旨在通过结构化和可检索的医疗知识来提高LLMs的可靠性和准确性。

🔍 MIRIAD数据集包括580万个高质量的医疗问题-答案对,这些数据对都经过了精心构建,并与同行评审的文献相关联,数据集的构建过程包括从S2ORC语料库中筛选文章,使用LLMs生成问题-答案对,以及通过规则过滤和专家审查来确保质量。

📈 使用MIRIAD数据集进行RAG(检索增强生成)时,LLMs在医疗任务中的准确性提高了高达6.7%,幻觉检测能力提高了22.5%到37%,MIRIAD-Atlas是一个可视化工具,提供了56个医疗领域的交互式探索,进一步增强了医疗AI的可靠性。

Challenges of LLMs in Medical Decision-Making: Addressing Hallucinations via Knowledge Retrieval

LLMs are set to revolutionize healthcare through intelligent decision support and adaptable chat-based assistants. However, a major challenge is their tendency to produce factually incorrect medical information. To address this, a common solution is RAG, where external medical knowledge is broken into smaller text pieces that LLMs can retrieve and use during generation. While promising, current RAG methods depend on unstructured medical content that is often noisy, unfiltered, and difficult for LLMs to interpret effectively. There is a clear need for better organization and presentation of medical knowledge to ensure LLMs can use it more reliably and accurately.

Limitations of Current RAG Approaches in Healthcare AI

Though LLMs perform impressively across general language tasks, they often fall short in domains requiring up-to-date and precise knowledge, such as medicine. RAG offers a cost-effective alternative to expensive fine-tuning by grounding models in external literature. Yet, many current RAG systems rely on general-purpose text embeddings and standard vector databases, which are not optimized for medical content. Unlike in general domains, the medical field lacks large, high-quality datasets pairing medical questions with relevant answers. Existing datasets, such as PubMedQA or MedQA, are either too small, overly structured (e.g., multiple-choice), or lack the kind of open-ended, real-world responses needed to build strong medical retrieval systems.

MIRIAD Dataset: Structuring Medical QA with Peer-Reviewed Grounding

Researchers from ETH Zurich, Stanford, the Mayo Clinic, and other institutions have developed MIRIAD, a large-scale dataset comprising over 5.8 million high-quality medical instruction-response pairs. Each pair is carefully rephrased and grounded in peer-reviewed literature through a semi-automated process involving LLMs, filters, and expert review. Unlike prior unstructured datasets, MIRIAD offers structured, retrievable medical knowledge, boosting LLM accuracy on complex medical QA tasks by up to 6.7% and improving hallucination detection by 22.5–37%. They also launched MIRIAD-Atlas, a visual tool encompassing 56 medical fields, which enables users to explore and interact with this rich resource, thereby enhancing trustworthy AI in healthcare.

Data Pipeline: Filtering and Structuring Medical Literature Using LLMs and Classifiers

To build MIRIAD, researchers filtered 894,000 medical articles from the S2ORC corpus and broke them into clean, sentence-based passages, excluding overly long or noisy content. They used LLMs with structured prompts to generate over 10 million question-answer pairs, later refining this to 5.8 million through rule-based filtering. A custom-trained classifier, based on GPT-4 labels, helped further narrow it down to 4.4 million high-quality pairs. Human medical experts also validated a sample for accuracy, relevance, and grounding. Finally, they created MIRIAD-Atlas, an interactive 2D map of the dataset, using embedding and dimensionality reduction to cluster related content by topic and discipline.

Performance Gains: Enhancing QA Accuracy and Hallucination Detection Using MIRIAD

The MIRIAD dataset significantly improves the performance of large language models on medical tasks. When used in RAG, models achieved up to 6.7% higher accuracy compared to using unstructured data, even with the same amount of retrieved content. MIRIAD also enhanced the ability of models to detect medical hallucinations, with F1 score improvements ranging from 22.5% to 37%. Additionally, training retriever models on MIRIAD resulted in improved retrieval quality. The dataset’s structure, grounded in verified literature, enables more precise and reliable access to information, supporting a wide range of downstream medical applications.

MIRIAD-Atlas: Visual Exploration Across 56 Medical Fields

In conclusion, MIRIAD is a large, structured dataset comprising 5.8 million medical question-answer pairs, grounded in peer-reviewed literature, and built to support a range of medical AI applications. It includes an interactive atlas for easy exploration and incorporates rigorous quality control through automated filters, LLM assessments, and expert reviews. Unlike previous unstructured corpora, MIRIAD improves retrieval accuracy in medical question answering and can help identify hallucinations in language models. While not yet exhaustive, it lays a strong foundation for future datasets. Continued improvements could enable more accurate, user-involved retrieval and better integration with clinical tools and medical AI systems.


Check out the Paper, GitHub Page and Dataset on Hugging Face. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post ETH and Stanford Researchers Introduce MIRIAD: A 5.8M Pair Dataset to Improve LLM Accuracy in Medical AI appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

MIRIAD LLMs 医疗AI RAG
相关文章