EnterpriseAI 2024年08月21日
OpenFold Revolutionizes Protein Modeling with AI and Supercomputing Power
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

OpenFold是一个新的开源软件工具,利用超级计算机和人工智能的力量来预测蛋白质结构。它基于DeepMind开发的AlphaFold2,可以帮助科学家更深入地了解与神经退行性疾病相关的错误折叠蛋白质,并开发新的药物。OpenFold使研究人员能够使用自然语言与AI进行交互,并能够预测超过6亿种蛋白质的结构,这些蛋白质来自细菌、病毒和其他尚未被表征的微生物。

😊 OpenFold是一个新的开源软件工具,利用超级计算机和人工智能的力量来预测蛋白质结构。它基于DeepMind开发的AlphaFold2,可以帮助科学家更深入地了解与神经退行性疾病相关的错误折叠蛋白质,并开发新的药物。OpenFold基于大型语言模型(LLM),能够处理大量数据以生成新的有意义的见解。

🤩 OpenFold使研究人员能够使用自然语言与AI进行交互,并能够预测超过6亿种蛋白质的结构,这些蛋白质来自细菌、病毒和其他尚未被表征的微生物。OpenFold的开发人员解释说,生命组织也以一种语言组织,指的是DNA的四个碱基——腺嘌呤、胞嘧啶、鸟嘌呤和胸腺嘧啶。

🤔 OpenFold项目由哈佛医学院高级研究员纳齐姆·布阿塔博士和他的同事穆罕默德·阿尔库拉伊希(前哈佛大学,现哥伦比亚大学)发起。该项目得到了来自哈佛大学和哥伦比亚大学的几位其他研究人员的支持。该项目最终发展成为 OpenFold 联盟,这是一个非营利性 AI 研究和开发联盟,开发用于生物学和药物发现的免费开源软件工具。

🥳 OpenFold项目的核心组成部分是大型语言模型(LLM),能够处理大量数据以生成新的有意义的见解。OpenFold的开发人员解释说,生命组织也以一种语言组织,指的是DNA的四个碱基——腺嘌呤、胞嘧啶、鸟嘌呤和胸腺嘧啶。

🤩 OpenFold的早期应用之一是Meta AI(前身为Facebook)。Meta AI最近使用OpenFold集成了一个“蛋白质语言模型”,推出了一个包含超过6亿种蛋白质的图谱,这些蛋白质来自细菌、病毒和其他尚未被表征的微生物。

Proteins, life’s building blocks, perform a wide range of functions based on their unique shapes. The molecules fold into specific forms and shapes that define their roles, from catalyzing biochemical reactions to providing structural support and enabling cellular communication.

Predicting the protein structure is challenging due to the complexity of the folds and shapes. Even slight variations in folding can significantly alter a protein's function.

To address this complexity, researchers have developed a new open-source software tool called OpenFold that leverages the power of supercomputers and AI to predict protection structures. This can help scientists gain a deeper understanding of misfolded proteins associated with neurodegenerative diseases, such as Parkinson’s and Alzheimer’s disease, and develop new medicines. 

OpenFold, which was announced in a study published in the Nature Methods journal, builds on the success of AlphaFold2, an AI program developed by DeepMind that predicts the structure and interactions between biological molecules with unprecedented accuracy. 

AlphaFold2 is being used by over two million researchers for protein predictions in various fields, including drug discovery and medical treatments. While AlphaFold2 offers exceptional accuracy, it is limited by its lack of accessible code and data for training new models. 

(Shutterstock)

This restricts its application to new tasks, like protein-ligand complex structure prediction, understanding its learning process, or assessing the model’s capacity for unseen regions of fold space.

The research for OpenFold was initiated by Dr. Nazim Bouatta, a senior research fellow at Harvard Medical School, and his colleague Mohammed AlQuraishi, formerly at Harvard but now at Columbia University. The project was supported by several other researchers from Harvard and Columbia. 

The project eventually grew into the OpenFold Consortium, a non-profit AI research and development consortium developing free and open-source software tools for biology and drug discovery.

A core component of AI-based research is large language models (LLMs), which can process vast amounts of data to generate new and meaningful insights. The ability to use natural language to interact with AI has greatly enhanced accessibility and usability, allowing users to communicate with these systems more intuitively and effectively. 

One of the earliest applications of OpenFold was by Meta AI, formerly known as Facebook. Meta AI recently used OpenFold to integrate a ‘protein language model’ to launch an atlas featuring over 600 million proteins from bacteria, viruses, and other microorganisms that had not yet been characterized. 

Bouatta explained that living organizations are also organized in a language, referring to the four bases of DNA - adenine, cytosine, guanine, and thymine. "This is the language that nature picked to build these sophisticated living organisms."

He further elaborated that proteins have a second layer of language, represented by the 20 amino acids that make up all proteins in the human body and determine their functions. While genome sequencing has gathered extensive data on these biological “letters”, a crucial piece that has been missing is a “dictionary” that can translate this data into predicting shapes. 

“Machine learning allows us to take a string of letters, the amino acids that describe any kind of protein that you can think of, run a sophisticated algorithm, and return an exquisite three-dimensional structure that is close to what we get using experiments. The OpenFold algorithm is very sophisticated and uses new developments that we're familiar with from ChatGPT and others,” said Bouatta. 

The research was supported by Flatiron Institute, OpenBioML, Stability AI, the Texas Advanced Computing Center (TACC), and NVIDIA, all of whom provided the resources needed for the experiments described in this paper.

TACC provided the OpenFold team access to Lonestar6 and Frontera supercomputers, enabling large-scale machine learning and AI deployments that significantly accelerated their research and computational capabilities. 

Supercomputers, combined with AI, have transformed biological research by enabling the accurate and efficient prediction of protein structures. While these tools shouldn't replace lab experiments, they do significantly enhance the speed and precision of research. According to Bouatta, supercomputers are the “microscope of the modern era for biology and drug discovery” and they have immense potential to help us understand life and cure diseases.

Related Items 

NCSA’s SEAS Team Makes Advanced Computing More Efficient and Accessible 

The Path to Insight Is Changing: The AI-HPC Paradigm Shift 

Nvidia Taps Into Generative AI Fervor with Unveiling of AI Foundations Cloud Services 

 

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

OpenFold 蛋白质结构预测 人工智能 超级计算机 生物学
相关文章