MarkTechPost@AI 04月12日
Google AI Introduce the Articulate Medical Intelligence Explorer (AMIE): A Large Language Model Optimized for Diagnostic Reasoning, and Evaluate its Ability to Generate a Differential Diagnosis
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

谷歌开发的AMIE是一种专为临床诊断推理设计的大型语言模型,通过自然语言交互支持鉴别诊断(DDx)。研究表明,AMIE在生成DDx方面表现出色,优于独立执业的临床医生。AMIE在临床医生辅助下,诊断的准确性和全面性显著提高。AMIE的优势在于其交互界面和对临床医生推理能力的增强,尤其在复杂病例中展现出巨大潜力。尽管存在局限性,AMIE为医疗诊断提供了新的可能性,并强调了在临床实践中谨慎整合LLMs的重要性。

🧠 AMIE的核心在于其作为大型语言模型的设计,专为临床诊断推理优化,能够通过自然语言交互辅助医生进行鉴别诊断(DDx)。

✅ 在一项涉及20位临床医生和302个真实世界复杂病例的研究中,AMIE的独立表现优于未辅助的临床医生,AMIE辅助下的临床医生生成的DDx列表在准确性和全面性上均有显著提升。

📈 AMIE在生成DDx方面的表现出色,其列表在质量、适宜性和全面性上均获得高评价。在54%的病例中,AMIE的DDx包含了正确的诊断结果,其Top-10准确率为59%,正确诊断结果在首位的比例为29%。

🗣️ AMIE的对话界面直观高效,临床医生在使用后对其DDx列表的信心有所增加。虽然AMIE在访问图像和表格数据方面存在局限性,但其在教育支持和诊断辅助方面的潜力,尤其是在复杂或资源有限的环境中,是值得期待的。

Developing an accurate differential diagnosis (DDx) is a fundamental part of medical care, typically achieved through a step-by-step process that integrates patient history, physical exams, and diagnostic tests. With the rise of LLMs, there’s growing potential to support and automate parts of this diagnostic journey using interactive, AI-powered tools. Unlike traditional AI systems focusing on producing a single diagnosis, real-world clinical reasoning involves continuously updating and evaluating multiple diagnostic possibilities as more patient data becomes available. Although deep learning has successfully generated DDx across fields like radiology, ophthalmology, and dermatology, these models generally lack the interactive, conversational capabilities needed to engage effectively with clinicians.

The advent of LLMs offers a new avenue for building tools that can support DDx through natural language interaction. These models, including general-purpose ones like GPT-4 and medical-specific ones like Med-PaLM 2, have shown high performance on multiple-choice and standardized medical exams. While these benchmarks initially assess a model’s medical knowledge, they don’t reflect its usefulness in real clinical settings or its ability to assist physicians during complex cases. Although some recent studies have tested LLMs on challenging case reports, there’s still a limited understanding of how these models might enhance clinician decision-making or improve patient care through real-time collaboration.

Researchers at Google introduced AMIE, a large language model tailored for clinical diagnostic reasoning, to evaluate its effectiveness in assisting with DDx. AMIE’s standalone performance outperformed unaided clinicians in a study involving 20 clinicians and 302 complex real-world medical cases. When integrated into an interactive interface, clinicians using AMIE alongside traditional tools produced significantly more accurate and comprehensive DDx lists than those using standard resources alone. AMIE not only improved diagnostic accuracy but also enhanced clinicians’ reasoning abilities. Its performance also surpassed GPT-4 in automated evaluations, showing promise for real-world clinical applications and broader access to expert-level support.

AMIE, a language model fine-tuned for medical tasks, demonstrated strong performance in generating DDx. Its lists were rated highly for quality, appropriateness, and comprehensiveness. In 54% of cases, AMIE’s DDx included the correct diagnosis, outperforming unassisted clinicians significantly. It achieved a top-10 accuracy of 59%, with the proper diagnosis ranked first in 29% of cases. Clinicians assisted by AMIE also improved their diagnostic accuracy compared to using search tools or working alone. Despite being new to the AMIE interface, clinicians used it similarly to traditional search methods, showing its practical usability.

In a comparative analysis between AMIE and GPT-4 using a subset of 70 NEJM CPC cases, direct human evaluation comparisons were limited due to different sets of raters. Instead, an automated metric that was shown to align reasonably with human judgment was used. While GPT-4 marginally outperformed AMIE in top-1 accuracy (though not statistically significant), AMIE demonstrated superior top-n accuracy for n > 1, with notable gains for n > 2. This suggests that AMIE generated more comprehensive and appropriate DDx, a crucial aspect in real-world clinical reasoning. Additionally, AMIE outperformed board-certified physicians in standalone DDx tasks and significantly improved clinician performance as an assistive tool, yielding higher top-n accuracy, DDx quality, and comprehensiveness than traditional search-based assistance.

Beyond raw performance, AMIE’s conversational interface was intuitive and efficient, with clinicians reporting increased confidence in their DDx lists after its use. While limitations exist—such as AMIE’s lack of access to images and tabular data in clinician materials and the artificial nature of CPC-style case presentations the model’s potential for educational support and diagnostic assistance is promising, particularly in complex or resource-limited settings. Nonetheless, the study emphasizes the need for careful integration of LLMs into clinical workflows, with attention to trust calibration, the model’s uncertainty expression, and the potential for anchoring biases and hallucinations. Future work should rigorously evaluate AI-assisted diagnosis’s real-world applicability, fairness, and long-term impacts.


Check out Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

The post Google AI Introduce the Articulate Medical Intelligence Explorer (AMIE): A Large Language Model Optimized for Diagnostic Reasoning, and Evaluate its Ability to Generate a Differential Diagnosis appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AMIE LLM 医学诊断 人工智能
相关文章