MarkTechPost@AI 03月28日 10:45
Google AI Released TxGemma: A Series of 2B, 9B, and 27B LLM for Multiple Therapeutic Tasks for Drug Development Fine-Tunable with Transformers
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Google AI推出TxGemma,这是一系列用于药物开发多种治疗任务的语言模型。传统药物研发成本高、难度大,TxGemma旨在解决此问题,它整合多种数据集,具有多种优势和实际应用价值。

💊TxGemma是一系列通用语言模型,用于助力药物开发的多种治疗任务。

🎯它整合多种数据集,可跨越治疗开发流程的多个阶段。

📈TxGemma-Predict在多个数据集上表现显著,且微调方式优化了预测准确性。

💬TxGemma-Chat具有交互对话功能,支持深入科学分析和讨论。

🚀Agentic-Tx能动态协调复杂治疗查询,结合多种优势。

Developing therapeutics continues to be an inherently costly and challenging endeavor, characterized by high failure rates and prolonged development timelines. The traditional drug discovery process necessitates extensive experimental validations from initial target identification to late-stage clinical trials, consuming substantial resources and time. Computational methodologies, particularly machine learning and predictive modeling, have emerged as pivotal tools to streamline this process. However, existing computational models are typically highly specialized, limiting their effectiveness in addressing diverse therapeutic tasks and offering limited interactive reasoning capabilities required for scientific inquiry and analysis.

To address these limitations, Google AI has introduced TxGemma, a collection of generalist large language models (LLMs) designed explicitly to facilitate various therapeutic tasks in drug development. TxGemma distinguishes itself by integrating diverse datasets, encompassing small molecules, proteins, nucleic acids, diseases, and cell lines, which allows it to span multiple stages within the therapeutic development pipeline. TxGemma models, available with 2 billion (2B), 9 billion (9B), and 27 billion (27B) parameters, are fine-tuned from Gemma-2 architecture using comprehensive therapeutic datasets. Additionally, the suite includes TxGemma-Chat, an interactive conversational model variant, that enables scientists to engage in detailed discussions and mechanistic interpretations of predictive outcomes, fostering transparency in model utilization.

From a technical standpoint, TxGemma capitalizes on the extensive Therapeutic Data Commons (TDC), a curated dataset containing over 15 million datapoints across 66 therapeutically relevant datasets. TxGemma-Predict, the predictive variant of the model suite, demonstrates significant performance across these datasets, matching or exceeding the performance of both generalist and specialist models currently employed in therapeutic modeling. Notably, the fine-tuning approach employed in TxGemma optimizes predictive accuracy with substantially fewer training samples, providing a crucial advantage in domains where data scarcity is prevalent. Further extending its capabilities, Agentic-Tx, powered by Gemini 2.0, dynamically orchestrates complex therapeutic queries by combining predictive insights from TxGemma-Predict and interactive discussions from TxGemma-Chat with external domain-specific tools.

Empirical evaluations underscore TxGemma’s capability. Across 66 tasks curated by the TDC, TxGemma-Predict consistently achieved performance comparable to or exceeding existing state-of-the-art models. Specifically, TxGemma’s predictive models surpassed state-of-the-art generalist models in 45 tasks and specialized models in 26 tasks, with notable efficiency in clinical trial adverse event predictions. On challenging benchmarks such as ChemBench and Humanity’s Last Exam, Agentic-Tx demonstrated clear advantages over previous leading models, enhancing accuracy by approximately 5.6% and 17.9%, respectively. Moreover, the conversational capabilities embedded in TxGemma-Chat provided essential interactive reasoning to support in-depth scientific analyses and discussions.

TxGemma’s practical utility is particularly evident in adverse event prediction during clinical trials, an essential aspect of therapeutic safety evaluation. TxGemma-27B-Predict demonstrated robust predictive performance while utilizing significantly fewer training samples compared to conventional models, illustrating enhanced data efficiency and reliability. Moreover, computational performance assessments indicate that the inference speed of TxGemma supports practical real-time applications, such as virtual screening, with the largest variant (27B parameters) capable of efficiently processing large sample volumes daily when deployed on scalable infrastructure.

In summary, the introduction of TxGemma by Google AI represents a methodical advancement in computational therapeutic research, combining predictive efficacy, interactive reasoning, and improved data efficiency. By making TxGemma publicly accessible, Google enables further validation and adaptation on diverse, proprietary datasets, thereby promoting broader applicability and reproducibility in therapeutic research. With sophisticated conversational functionality via TxGemma-Chat and complex workflow integration through Agentic-Tx, the suite provides researchers with advanced computational tools capable of significantly enhancing decision-making processes in therapeutic development.


Check out the Paper and Models on Hugging Face . All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

The post Google AI Released TxGemma: A Series of 2B, 9B, and 27B LLM for Multiple Therapeutic Tasks for Drug Development Fine-Tunable with Transformers appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Google AI TxGemma 药物研发 语言模型 治疗任务
相关文章