MIT News - Artificial intelligence 01月03日
A new computational model can predict antibody structures more accurately
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

麻省理工学院的研究人员开发了一种计算技术,利用大型语言模型更准确地预测抗体结构,尤其是在抗体的高变区。这项技术通过训练模型识别高变区序列的结构模式,并结合抗体与抗原的结合数据,提高了预测的准确性。该模型名为AbMap,能帮助研究人员快速筛选出具有治疗潜力的抗体,例如针对SARS-CoV-2病毒的抗体。此外,该技术还可用于分析个体抗体库,深入了解人体对疾病的不同免疫反应,并为药物开发节省时间和成本。研究表明,考虑结构信息后,不同个体间的抗体重叠度远高于仅考虑序列的结果。

🧬 针对抗体高变区建模: 传统AI模型在预测抗体结构,尤其是在高变区方面存在局限性。麻省理工研究人员开发的新技术,重点关注抗体高变区的建模,通过训练模型识别特定序列的结构模式,提高了预测的准确性。

🧪 AbMap模型:研究人员创建了两个模块,它们基于现有的蛋白质语言模型,并使用蛋白质数据库中的抗体结构和抗体与抗原的结合数据进行训练。由此产生的AbMap模型能够根据抗体的氨基酸序列预测其结构和结合强度。

🔬 实验验证:研究人员使用AbMap模型预测了能有效中和SARS-CoV-2病毒刺突蛋白的抗体结构,并生成了数百万种变体。实验结果表明,该模型识别出的抗体具有更强的结合能力,其中82%的抗体结合强度优于原始抗体。

📊 抗体库分析:该技术还可用于分析个体抗体库,研究不同个体对疾病的免疫反应差异。研究发现,考虑结构信息后,不同个体间的抗体重叠度远高于仅考虑序列的结果,这有助于更深入理解免疫反应的机制。

By adapting artificial intelligence models known as large language models, researchers have made great progress in their ability to predict a protein’s structure from its sequence. However, this approach hasn’t been as successful for antibodies, in part because of the hypervariability seen in this type of protein.

To overcome that limitation, MIT researchers have developed a computational technique that allows large language models to predict antibody structures more accurately. Their work could enable researchers to sift through millions of possible antibodies to identify those that could be used to treat SARS-CoV-2 and other infectious diseases.

“Our method allows us to scale, whereas others do not, to the point where we can actually find a few needles in the haystack,” says Bonnie Berger, the Simons Professor of Mathematics, the head of the Computation and Biology group in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), and one of the senior authors of the new study. “If we could help to stop drug companies from going into clinical trials with the wrong thing, it would really save a lot of money.”

The technique, which focuses on modeling the hypervariable regions of antibodies, also holds potential for analyzing entire antibody repertoires from individual people. This could be useful for studying the immune response of people who are super responders to diseases such as HIV, to help figure out why their antibodies fend off the virus so effectively.

Bryan Bryson, an associate professor of biological engineering at MIT and a member of the Ragon Institute of MGH, MIT, and Harvard, is also a senior author of the paper, which appears this week in the Proceedings of the National Academy of Sciences. Rohit Singh, a former CSAIL research scientist who is now an assistant professor of biostatistics and bioinformatics and cell biology at Duke University, and Chiho Im ’22 are the lead authors of the paper. Researchers from Sanofi and ETH Zurich also contributed to the research.

Modeling hypervariability

Proteins consist of long chains of amino acids, which can fold into an enormous number of possible structures. In recent years, predicting these structures has become much easier to do, using artificial intelligence programs such as AlphaFold. Many of these programs, such as ESMFold and OmegaFold, are based on large language models, which were originally developed to analyze vast amounts of text, allowing them to learn to predict the next word in a sequence. This same approach can work for protein sequences — by learning which protein structures are most likely to be formed from different patterns of amino acids.

However, this technique doesn’t always work on antibodies, especially on a segment of the antibody known as the hypervariable region. Antibodies usually have a Y-shaped structure, and these hypervariable regions are located in the tips of the Y, where they detect and bind to foreign proteins, also known as antigens. The bottom part of the Y provides structural support and helps antibodies to interact with immune cells.

Hypervariable regions vary in length but usually contain fewer than 40 amino acids. It has been estimated that the human immune system can produce up to 1 quintillion different antibodies by changing the sequence of these amino acids, helping to ensure that the body can respond to a huge variety of potential antigens. Those sequences aren’t evolutionarily constrained the same way that other protein sequences are, so it’s difficult for large language models to learn to predict their structures accurately.

“Part of the reason why language models can predict protein structure well is that evolution constrains these sequences in ways in which the model can decipher what those constraints would have meant,” Singh says. “It’s similar to learning the rules of grammar by looking at the context of words in a sentence, allowing you to figure out what it means.”

To model those hypervariable regions, the researchers created two modules that build on existing protein language models. One of these modules was trained on hypervariable sequences from about 3,000 antibody structures found in the Protein Data Bank (PDB), allowing it to learn which sequences tend to generate similar structures. The other module was trained on data that correlates about 3,700 antibody sequences to how strongly they bind three different antigens.

The resulting computational model, known as AbMap, can predict antibody structures and binding strength based on their amino acid sequences. To demonstrate the usefulness of this model, the researchers used it to predict antibody structures that would strongly neutralize the spike protein of the SARS-CoV-2 virus.

The researchers started with a set of antibodies that had been predicted to bind to this target, then generated millions of variants by changing the hypervariable regions. Their model was able to identify antibody structures that would be the most successful, much more accurately than traditional protein-structure models based on large language models.

Then, the researchers took the additional step of clustering the antibodies into groups that had similar structures. They chose antibodies from each of these clusters to test experimentally, working with researchers at Sanofi. Those experiments found that 82 percent of these antibodies had better binding strength than the original antibodies that went into the model.

Identifying a variety of good candidates early in the development process could help drug companies avoid spending a lot of money on testing candidates that end up failing later on, the researchers say.

“They don’t want to put all their eggs in one basket,” Singh says. “They don’t want to say, I’m going to take this one antibody and take it through preclinical trials, and then it turns out to be toxic. They would rather have a set of good possibilities and move all of them through, so that they have some choices if one goes wrong.”

Comparing antibodies

Using this technique, researchers could also try to answer some longstanding questions about why different people respond to infection differently. For example, why do some people develop much more severe forms of Covid, and why do some people who are exposed to HIV never become infected?

Scientists have been trying to answer those questions by performing single-cell RNA sequencing of immune cells from individuals and comparing them — a process known as antibody repertoire analysis. Previous work has shown that antibody repertoires from two different people may overlap as little as 10 percent. However, sequencing doesn’t offer as comprehensive a picture of antibody performance as structural information, because two antibodies that have different sequences may have similar structures and functions.

The new model can help to solve that problem by quickly generating structures for all of the antibodies found in an individual. In this study, the researchers showed that when structure is taken into account, there is much more overlap between individuals than the 10 percent seen in sequence comparisons. They now plan to further investigate how these structures may contribute to the body’s overall immune response against a particular pathogen.

“This is where a language model fits in very beautifully because it has the scalability of sequence-based analysis, but it approaches the accuracy of structure-based analysis,” Singh says.

The research was funded by Sanofi and the Abdul Latif Jameel Clinic for Machine Learning in Health. 

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

大型语言模型 抗体结构预测 AbMap 免疫反应 药物开发
相关文章