MarkTechPost@AI 2024年07月11日
Advances in Chemical Representations and Artificial Intelligence AI: Transforming Drug Discovery
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了化学表示在人工智能驱动的药物发现中的重要作用,重点介绍了分子图等机器可读表示方法,以及它们在机器学习模型应用于化学信息学和药物发现中的应用。文章强调了不同化学表示方法的选择取决于具体任务,并指出了现代计算技术如何通过增强数据处理和分析能力来彻底改变药物发现。

😊 **分子图表示**:分子图是一种常见的机器可读表示方法,它将原子映射到节点,将键映射到边,以结构化的方式表示分子。这种表示方法可以通过各种软件进行可视化,节点和边通常被编码到矩阵中,包括邻接矩阵(用于连接性)、节点特征矩阵(用于原子标识)和边特征矩阵(用于键标识)。图遍历算法确保节点排序的一致性,这对于生成可靠的表示非常重要。这种灵活性允许编码 3D 信息,在某些方面比线性符号更具优势。

😄 **连接表和 MDL 文件格式**:连接表 (Ctabs) 和 MDL (现在是 BIOVIA) 文件格式在分子图表示中至关重要。Ctabs 由计数、原子、键、原子列表、Stext 和属性块组成,通过指定原子和键的详细信息来有效地描述分子结构。它们避免了显式氢表示,从而减小了文件大小。基于 Ctabs 的 MDL 格式包括用于单个分子的 Molfiles,并扩展到 SD、RXN、RD 和 RG 文件,以提供其他数据和反应。这些格式广泛用于紧凑、系统的化学信息存储和传输,支持各种化学信息学应用。

😉 **当代符号:SMILES 和 InChI**:SMILES 是 1988 年开发的一种直观且流行的符号,用于编码分子结构。它将数字分配给原子并使用深度优先搜索遍历分子图,允许同一分子的多种表示。可以通过规范化来指定唯一的 SMILES。SMILES 可以编码立体化学和其他复杂结构,但难以处理有机金属化合物和离子盐。国际化学标识符 (InChI) 于 2006 年推出,提供了一种标准的、开源的规范符号,具有多个层级以进行详细的分子表示。InChIKeys 提供了 InChI 的唯一、可搜索的哈希版本,增强了化学信息的访问性。

🥳 **化学表示的总结**:化学表示包含各种方法来模拟分子、反应和大分子。MACCS 和 CATS 等结构键编码了特定化学基团的存在。Daylight 和 ECFP 等哈希指纹使用哈希函数来表示分子模式。反应使用 Reaction SMILES、RInChI 和 CGR 等格式进行描述。大分子,包括蛋白质和肽,利用来自 PDB 等存储库的基于序列的符号和结构。这些不同的方法有助于在化学信息学和药物发现中进行准确的分析和预测。

😎 **分子和大分子图形表示**:分子的图形表示对于可视化和分析至关重要,包括 2D 图像和 3D 模型。2D 图像显示骨架结构,通常使用标准化的 IUPAC 指南,但在布局和渲染方面仍然面临挑战。RDKit 和 CDK 等工具改进了 2D 可视化。对于大分子,图像侧重于聚合物或肽结构,Pfizer Macromolecule Editor 等工具有助于可视化。3D 图像使用 Avogadro 和 PyMOL 等软件,包括球棍模型、卡通模型和范德华模型,有助于对接、蛋白质-配体相互作用和机制研究。这些表示增强了对化学信息学和药物发现的理解。

Advances in Chemical Representations and AI in Drug Discovery:

The past century’s technological advancements, especially the computer revolution and high-throughput screening in drug discovery, have necessitated the development of molecular representations readable by computers and understandable across scientific disciplines. Initially, molecules were depicted as structure diagrams with bonds and atoms, but computational processing required more sophisticated representations. Various chemical notations have been developed to encode molecular structures, with early examples like the empirical formula, which provides atomic composition but not connectivity or geometry. The advent of computers facilitated rapid digital storage and modification of chemical data, leading to the development of machine-readable notations and algorithms for 2D and 3D visualization. Modern representations, especially those developed since the 1970s, support small molecules, macromolecules, and chemical reactions, enhancing the efficiency and scalability of cheminformatics.

Applications of AI in Drug Discovery:

In AI-driven drug discovery, chemical representations play a crucial role. Molecular graphs, the most common machine-readable representation, and various other notations are employed to encode structural information for computational analysis. This review highlights the importance of these representations in AI applications, providing examples where AI techniques, such as ML models, are applied to cheminformatics and drug discovery. The review is an essential guide for researchers and students in chemistry, bioinformatics, and computer science, emphasizing the dependency of representation choice on the specific task. While not exhaustive, the review directs readers to further literature on AI applications in cheminformatics, showcasing how modern computational techniques are revolutionizing drug discovery by enhancing data handling and analysis capabilities.

Introduction to Molecular Graph Representations:

Understanding molecular graphs is essential for grasping chemical representations used in drug discovery. A molecular graph maps atoms to nodes and bonds to edges, representing molecules in a structured way. Formally defined as a tuple of nodes (atoms) and edges (bonds), these graphs can be visualized using various software. Nodes and edges are often encoded into matrices: an adjacency matrix for connectivity, a node features matrix for atom identity, and an edge features matrix for bond identity. Graph traversal algorithms ensure consistent node ordering, which is crucial for generating reliable representations. This flexibility allows encoding 3D information, offering advantages over linear notations.

Connection Tables and MDL File Formats:

Connection tables (Ctabs) and MDL (now BIOVIA) file formats are crucial in molecular graph representation. Ctabs consist of counts, atoms, bonds, atom lists, Stext, and properties blocks, efficiently describing molecular structures by specifying atom and bond details. They avoid explicit hydrogen representation, reducing file size. MDL formats, built on Ctabs, include Molfiles for single molecules and extend to SD, RXN, RD, and RG files for additional data and reactions. These formats are widely used for compact, systematic chemical information storage and transfer, supporting diverse cheminformatics applications.

Contemporary Notations: SMILES and InChI:

SMILES, developed in 1988, is an intuitive and popular notation for encoding molecular structures. It assigns numbers to atoms and traverses the molecular graph using depth-first search, allowing multiple representations of the same molecule. Unique SMILES can be designated through canonicalization. SMILES can encode stereochemistry and other complex structures but struggle with organometallic compounds and ionic salts. The International Chemical Identifier (InChI), introduced in 2006, provides a standard, open-source canonical notation with multiple layers for detailed molecular representation. InChIKeys offer unique, searchable, hashed versions of InChIs, enhancing accessibility for chemical information.

           Image source

Summary of Chemical Representations:

Chemical representations encompass various methods to model molecules, reactions, and macromolecules. Structural keys like MACCS and CATS encode the presence of specific chemical groups. Hashed fingerprints like Daylight and ECFP use hash functions to represent molecular patterns. Reactions are described using formats like Reaction SMILES, RInChI, and CGR. Macromolecules, including proteins and peptides, utilize sequence-based notations and structures from repositories like the PDB. These diverse methods facilitate accurate analysis and prediction in chemical informatics and drug discovery.

Graphical Representations for Molecules and Macromolecules:

Graphical representations of molecules, crucial for visualization and analysis, include 2D depictions and 3D models. 2D depictions show skeletal structures, often using standardized IUPAC guidelines, but still face challenges in layout and rendering. Tools like RDKit and CDK have improved 2D visualizations. For macromolecules, depictions focus on polymer or peptide structures, with tools like the Pfizer Macromolecule Editor aiding visualization. 3D depictions, using software such as Avogadro and PyMOL, include ball-and-stick, cartoon, and van der Waals models, facilitating studies in docking, protein-ligand interactions, and mechanistic studies. These representations enhance understanding of cheminformatics and drug discovery.


Check out the Paper 1 and Paper 2. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 46k+ ML SubReddit

The post Advances in Chemical Representations and Artificial Intelligence AI: Transforming Drug Discovery appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

化学表示 人工智能 药物发现 分子图 机器学习
相关文章