MarkTechPost@AI 10小时前
A Coding Implementation for Creating, Annotating, and Visualizing Complex Biological Knowledge Graphs Using PyBEL
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍如何利用PyBEL生态系统在Google Colab中构建和分析丰富的生物知识图谱。首先,安装必要的软件包,包括PyBEL、NetworkX、Matplotlib、Seaborn和Pandas。然后,演示如何使用PyBEL DSL定义蛋白质、过程和修饰。接着,构建一个与阿尔茨海默病相关的通路,展示如何编码因果关系、蛋白质-蛋白质相互作用和磷酸化事件。同时,介绍高级网络分析,包括中心性度量、节点分类和子图提取,以及提取引用和证据数据。最终,获得一个完全注释的BEL图,用于下游可视化和富集分析,为交互式生物知识探索奠定基础。

🧬 安装PyBEL及其依赖项:在Google Colab中安装PyBEL、NetworkX、Matplotlib、Seaborn和Pandas等软件包,确保所有必要的库可用于分析。

🧠 创建和初始化BEL图:使用PyBEL DSL定义蛋白质和过程,添加因果关系、蛋白质修饰和关联,构建一个综合的网络,捕捉关键的分子相互作用,例如APP、Abeta、tau等蛋白质之间的关系。

📊 进行高级网络分析:计算节点的度中心性、中介中心性和接近中心性,量化每个节点在图中的重要性,以识别可能驱动疾病机制的关键节点。

🔬 生物实体分类:按功能对每个节点进行分类(例如蛋白质或生物过程),统计它们的数量,以帮助理解网络的构成。

📚 文献证据分析:从每条边提取引用标识符和证据字符串,评估图在已发表研究中的基础,总结总引用次数和唯一引用次数,以评估支持文献的广度。

In this tutorial, we explore how to leverage the PyBEL ecosystem to construct and analyze rich biological knowledge graphs directly within Google Colab. We begin by installing all necessary packages, including PyBEL, NetworkX, Matplotlib, Seaborn, and Pandas. We then demonstrate how to define proteins, processes, and modifications using the PyBEL DSL. From there, we guide you through the creation of an Alzheimer’s disease-related pathway, showcasing how to encode causal relationships, protein–protein interactions, and phosphorylation events. Alongside graph construction, we introduce advanced network analyses, including centrality measures, node classification, and subgraph extraction, as well as techniques for extracting citation and evidence data. By the end of this section, you will have a fully annotated BEL graph ready for downstream visualization and enrichment analyses, laying a solid foundation for interactive biological knowledge exploration.

!pip install pybel pybel-tools networkx matplotlib seaborn pandas -qimport pybelimport pybel.dsl as dslfrom pybel import BELGraphfrom pybel.io import to_pickle, from_pickleimport networkx as nximport matplotlib.pyplot as pltimport pandas as pdimport seaborn as snsfrom collections import Counterimport warningswarnings.filterwarnings('ignore')print("PyBEL Advanced Tutorial: Biological Expression Language Ecosystem")print("=" * 65)

We begin by installing PyBEL and its dependencies directly in Colab, ensuring that all necessary libraries, NetworkX, Matplotlib, Seaborn, and Pandas, are available for our analysis. Once installed, we import the core modules and suppress warnings to keep our notebook clean and focused on the results.

print("\n1. Building a Biological Knowledge Graph")print("-" * 40)graph = BELGraph(   name="Alzheimer's Disease Pathway",   version="1.0.0",   description="Example pathway showing protein interactions in AD",   authors="PyBEL Tutorial")app = dsl.Protein(name="APP", namespace="HGNC")abeta = dsl.Protein(name="Abeta", namespace="CHEBI")tau = dsl.Protein(name="MAPT", namespace="HGNC")gsk3b = dsl.Protein(name="GSK3B", namespace="HGNC")inflammation = dsl.BiologicalProcess(name="inflammatory response", namespace="GO")apoptosis = dsl.BiologicalProcess(name="apoptotic process", namespace="GO")graph.add_increases(app, abeta, citation="PMID:12345678", evidence="APP cleavage produces Abeta")graph.add_increases(abeta, inflammation, citation="PMID:87654321", evidence="Abeta triggers neuroinflammation")tau_phosphorylated = dsl.Protein(name="MAPT", namespace="HGNC",                               variants=[dsl.ProteinModification("Ph")])graph.add_increases(gsk3b, tau_phosphorylated, citation="PMID:11111111", evidence="GSK3B phosphorylates tau")graph.add_increases(tau_phosphorylated, apoptosis, citation="PMID:22222222", evidence="Hyperphosphorylated tau causes cell death")graph.add_increases(inflammation, apoptosis, citation="PMID:33333333", evidence="Inflammation promotes apoptosis")graph.add_association(abeta, tau, citation="PMID:44444444", evidence="Abeta and tau interact synergistically")print(f"Created BEL graph with {graph.number_of_nodes()} nodes and {graph.number_of_edges()} edges")

We initialize a BELGraph with metadata for an Alzheimer’s disease pathway and define proteins and processes using the PyBEL DSL. By adding causal relationships, protein modifications, and associations, we construct a comprehensive network that captures key molecular interactions.

print("\n2. Advanced Network Analysis")print("-" * 30)degree_centrality = nx.degree_centrality(graph)betweenness_centrality = nx.betweenness_centrality(graph)closeness_centrality = nx.closeness_centrality(graph)most_central = max(degree_centrality, key=degree_centrality.get)print(f"Most connected node: {most_central}")print(f"Degree centrality: {degree_centrality[most_central]:.3f}")

We compute degree, betweenness, and closeness centralities to quantify each node’s importance within the graph. By identifying the most connected nodes, we gain insight into potential hubs that may drive disease mechanisms.

print("\n3. Biological Entity Classification")print("-" * 35)node_types = Counter()for node in graph.nodes():   node_types[node.function] += 1print("Node distribution:")for func, count in node_types.items():   print(f"  {func}: {count}")

We classify each node by its function, such as Protein or BiologicalProcess, and tally their counts. This breakdown helps us understand the composition of our network at a glance.

print("\n4. Pathway Analysis")print("-" * 20)proteins = [node for node in graph.nodes() if node.function == 'Protein']processes = [node for node in graph.nodes() if node.function == 'BiologicalProcess']print(f"Proteins in pathway: {len(proteins)}")print(f"Biological processes: {len(processes)}")edge_types = Counter()for u, v, data in graph.edges(data=True):   edge_types[data.get('relation')] += 1print("\nRelationship types:")for rel, count in edge_types.items():   print(f"  {rel}: {count}")

We separate all proteins and processes to measure the pathway’s scope and complexity. Counting the different relationship types further reveals which interactions, like increases or associations, dominate our model.

print("\n5. Literature Evidence Analysis")print("-" * 32)citations = []evidences = []for _, _, data in graph.edges(data=True):   if 'citation' in data:       citations.append(data['citation'])   if 'evidence' in data:       evidences.append(data['evidence'])print(f"Total citations: {len(citations)}")print(f"Unique citations: {len(set(citations))}")print(f"Evidence statements: {len(evidences)}")

We extract citation identifiers and evidence strings from each edge to evaluate our graph’s grounding in published research. Summarizing total and unique citations allows us to assess the breadth of supporting literature.

print("\n6. Subgraph Analysis")print("-" * 22)inflammation_nodes = [inflammation]inflammation_neighbors = list(graph.predecessors(inflammation)) + list(graph.successors(inflammation))inflammation_subgraph = graph.subgraph(inflammation_nodes + inflammation_neighbors)print(f"Inflammation subgraph: {inflammation_subgraph.number_of_nodes()} nodes, {inflammation_subgraph.number_of_edges()} edges")

We isolate the inflammation subgraph by collecting its direct neighbors, yielding a focused view of inflammatory crosstalk. This targeted subnetwork highlights how inflammation interfaces with other disease processes.

print("\n7. Advanced Graph Querying")print("-" * 28)try:   paths = list(nx.all_simple_paths(graph, app, apoptosis, cutoff=3))   print(f"Paths from APP to apoptosis: {len(paths)}")   if paths:       print(f"Shortest path length: {len(paths[0])-1}")except nx.NetworkXNoPath:   print("No paths found between APP and apoptosis")apoptosis_inducers = list(graph.predecessors(apoptosis))print(f"Factors that increase apoptosis: {len(apoptosis_inducers)}")

We enumerate simple paths between APP and apoptosis to explore mechanistic routes and identify key intermediates. Listing all predecessors of apoptosis also shows us which factors may trigger cell death.

print("\n8. Data Export and Visualization")print("-" * 35)adj_matrix = nx.adjacency_matrix(graph)node_labels = [str(node) for node in graph.nodes()]plt.figure(figsize=(12, 8))plt.subplot(2, 2, 1)pos = nx.spring_layout(graph, k=2, iterations=50)nx.draw(graph, pos, with_labels=False, node_color='lightblue',       node_size=1000, font_size=8, font_weight='bold')plt.title("BEL Network Graph")plt.subplot(2, 2, 2)centralities = list(degree_centrality.values())plt.hist(centralities, bins=10, alpha=0.7, color='green')plt.title("Degree Centrality Distribution")plt.xlabel("Centrality")plt.ylabel("Frequency")plt.subplot(2, 2, 3)functions = list(node_types.keys())counts = list(node_types.values())plt.pie(counts, labels=functions, autopct='%1.1f%%', startangle=90)plt.title("Node Type Distribution")plt.subplot(2, 2, 4)relations = list(edge_types.keys())rel_counts = list(edge_types.values())plt.bar(relations, rel_counts, color='orange', alpha=0.7)plt.title("Relationship Types")plt.xlabel("Relation")plt.ylabel("Count")plt.xticks(rotation=45)plt.tight_layout()plt.show()

We prepare adjacency matrices and node labels for downstream use and generate a multi-panel figure showing the network structure, centrality distributions, node-type proportions, and edge-type counts. These visualizations bring our BEL graph to life, supporting a deeper biological interpretation.

In this tutorial, we have demonstrated the power and flexibility of PyBEL for modeling complex biological systems. We showed how easily one can construct a curated white-box graph of Alzheimer’s disease interactions, perform network-level analyses to identify key hub nodes, and extract biologically meaningful subgraphs for focused study. We also covered essential practices for literature evidence mining and prepared data structures for compelling visualizations. As a next step, we encourage you to extend this framework to your pathways, integrating additional omics data, running enrichment tests, or coupling the graph with machine-learning workflows.


Check out the Codes here. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post A Coding Implementation for Creating, Annotating, and Visualizing Complex Biological Knowledge Graphs Using PyBEL appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

PyBEL 生物知识图谱 阿尔茨海默病 网络分析
相关文章