MarkTechPost@AI 前天 14:05
Meet BioReason: The World’s First Reasoning Model in Biology that Enables AI to Reason about Genomics like a Biology Expert
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

BIOREASON是一个开创性的AI系统,它结合了DNA基础模型和大型语言模型(LLM),以实现对基因组数据的深入、可解释的推理。该系统能够分析原始基因组序列,并应用LLM进行推理,从而生成清晰、生物学相关的见解。BIOREASON在DNA变异解释和生物学推理方面表现出色,优于仅使用DNA模型或LLM的模型,例如在预测疾病结果方面。通过提供透明的、基于生物学的推理路径,BIOREASON有助于科学家更好地理解疾病机制,并提出新的研究问题,是推进精准医学和基因组学研究的有力工具。

🧬 BIOREASON是一个结合了DNA基础模型和大型语言模型的AI系统,旨在解决AI在基因组学研究中缺乏可解释性推理的问题。

💡 BIOREASON通过将DNA基础模型与LLM相结合,实现了对原始基因组序列的分析,并生成了清晰的、生物学相关的见解。该模型使用DNA基础模型提取基因组序列的嵌入,并将其与文本查询相结合,形成LLM的统一输入。

🔬 BIOREASON在DNA变异解释和生物学推理的评估中表现出色,在预测疾病结果方面优于仅使用DNA模型或LLM的模型。例如,在PFN1突变与ALS相关的案例中,BIOREASON准确预测了疾病,并生成了10步解释,追踪了变异对肌动蛋白动力学和运动神经元退化的影响。

🚀 BIOREASON的优势在于不仅能做出准确的预测,还能提供透明的、基于生物学的推理路径,有助于科学家理解疾病机制并提出新的研究问题。研究人员通过监督微调和强化学习训练BIOREASON,使其在基于KEGG的疾病通路预测中达到了高达97%的准确率。

A major hurdle in using AI for genomics is the lack of interpretable, step-by-step reasoning from complex DNA data. While DNA foundation models excel at learning rich sequence patterns for tasks such as variant prediction and gene regulation, they often operate as black boxes, offering limited insight into the underlying biological mechanisms. Meanwhile, large language models demonstrate impressive reasoning skills across various domains, but they aren’t designed to handle raw genomic sequences. This gap between strong DNA representation and deep biological reasoning prevents AI from reaching expert-level understanding and limits its potential to drive scientific discovery through meaningful, hypothesis-driven explanations. 

DNA foundation models have made significant progress by learning rich representations directly from genomic sequences, showing strong performance across a range of biological tasks. Models like Evo2, with its long-range capabilities, highlight their potential, but their lack of interpretability limits deeper biological insights. Meanwhile, large language models excel in reasoning over biomedical texts but often don’t engage directly with raw genomic data. Attempts, such as GeneGPT and TxGemma, represent early efforts to bridge this gap. Current genomic benchmarks assess task performance but fall short in evaluating reasoning and hypothesis generation. 

Researchers from the Vector Institute, University Health Network (UHN), Arc Institute, Cohere, University of California, San Francisco, and Google DeepMind have introduced BIOREASON, a pioneering AI system that unites a DNA foundation model with an LLM. This integration allows BIOREASON to analyze raw genomic sequences while applying LLM-based reasoning to generate clear, biologically grounded insights. Trained through supervised fine-tuning and reinforcement learning, it achieves a performance gain of 15% or more over traditional models, reaching up to 97% accuracy in KEGG-based disease pathway prediction. This approach offers interpretable, step-by-step outputs that advance biological understanding and facilitate hypothesis generation. 

The BIOREASON model is a multimodal framework designed to support deep, interpretable biological reasoning by combining genomic sequences with natural language queries. It uses a DNA foundation model to extract rich, contextual embeddings from raw DNA inputs and integrates these with tokenized textual queries to form a unified input for a LLM, specifically Qwen3. The system is trained to generate step-by-step explanations of biological processes. DNA embeddings are projected into the LLM’s space using a learnable layer, and the combined input is enriched with positional encoding. Additionally, reinforcement learning via Group Relative Policy Optimization refines its reasoning capabilities. 

The researchers evaluated BIOREASON on three datasets focused on DNA variant interpretation and biological reasoning. It outperformed both DNA-only and LLM-only models in predicting disease outcomes from genomic variants. The best-performing version, which combined Evo2 and Qwen3-4B, achieved high accuracy and F1-scores across all tasks. A notable case study involved a PFN1 mutation linked to ALS, where BIOREASON accurately predicted the disease and generated a 10-step explanation tracing the variant’s impact on actin dynamics and motor neuron degeneration. This shows its strength not just in accurate predictions but also in providing transparent, biologically grounded reasoning paths. 

In conclusion, BIOREASON combines DNA encoders with large language models to enable detailed, interpretable reasoning over genomic data. Unlike traditional models, it not only makes accurate predictions but also explains the biological logic behind them using step-by-step outputs. This helps scientists better understand disease mechanisms and generate new research questions. While powerful, BIOREASON has challenges, like high computational costs and limited uncertainty measures. Future work aims to address these issues by improving scalability, incorporating additional biological data such as RNA and proteins, and applying it to broader tasks, including GWAS. Overall, BIOREASON shows promise in advancing precision medicine and genomic research. 


Check out the Paper, GitHub Page and Project Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.

The post Meet BioReason: The World’s First Reasoning Model in Biology that Enables AI to Reason about Genomics like a Biology Expert appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

BIOREASON 基因组学 AI推理 LLM
相关文章