MarkTechPost@AI 01月07日
This AI paper from the Beijing Institute of Technology and Harvard Unveils TXpredict for Predicting Microbial Transcriptomes
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

北京理工大学和哈佛大学的研究人员提出了TXpredict,一种利用注释基因组序列进行转录组预测的创新框架。该框架通过预训练的蛋白质语言模型ESM2提取蛋白质嵌入的预测特征,并融入进化原则。TXpredict克服了传统方法的局限性,如可扩展性、通用性和计算效率问题,并引入了条件特异性基因表达预测等新功能。该模型在22种细菌和10种古细菌的转录组数据上进行了训练,并使用留一基因组交叉验证方法来保证其鲁棒性。实验结果表明,TXpredict在各种微生物物种的转录组预测中具有高精度和可扩展性,为微生物基因组学研究带来了重大进展。

🧬TXpredict利用预训练的蛋白质语言模型ESM2,从蛋白质嵌入中提取预测特征,并结合进化原则,实现了对转录组的准确预测。

⏱️ 该框架计算效率高,在标准硬件上仅需22分钟即可完成微生物基因组的转录组预测,解决了传统方法耗时、成本高的问题。

🔬 TXpredict在多种微生物物种中展现出高精度,如对B. hinzii、B. thetaiotaomicron和C. beijerinckii的预测Spearman相关系数分别达到0.64、0.62和0.62,验证了其有效性。

📊 该模型不仅能预测基因表达,还能在4.6k个实验条件下预测条件特异性转录组,平均相关系数达到0.52,揭示了动态调控模式。

🌍 TXpredict的预测范围扩展到276个属的900个额外基因组,涵盖了大量之前未表征的物种,为微生物研究提供了宝贵资源。

Predicting transcriptomes directly from genome sequences is a significant challenge in microbial genomics, particularly for the numerous sequenced microbes that remain unculturable or require complex experimental protocols like RNA-seq. The gap between genomic information and functional understanding leaves us without knowledge of the microbial adaptive processes, survival mechanisms, and gene regulation functions. This must be addressed to make better studies of microbial ecosystems, analysis of non-model organisms, and synthetic biology applications better.

Present techniques of transcriptome profiling are mainly experimental approaches such as RNA sequencing, which is time-consuming, expensive, and usually unsuitable for microorganisms with special growth requirements or those that survive under extreme environments. Computational models on UTRs or long DNA sequences are only partially useful since they can not be easily generalized to all taxonomic groups. Moreover, these methods fail to consider evolutionary constraints relevant to protein synthesis, making them even less useful in predicting transcriptomes of non-model and novel microbial species.

Researchers from the Beijing Institute of Technology and Harvard University propose TXpredict, a transformative framework for transcriptome prediction that utilizes annotated genome sequences. Leveraging a pre-trained protein language model (ESM2) extracts predictive features from protein embeddings while incorporating evolutionary principles. This innovation surmounts limitations on scalability, generalizability, and computational efficiency yet introduces new capabilities such as condition-specific gene expression predictions. Due to its capability to analyze the diversity of microbial taxa, including unculturable species, TXpredict is a significant advancement in microbial genomics.

TXpredict is based on transcriptome data for 22 bacterial and 10 archaeal species, featuring 11.5 million gene expression measurements. The model uses a transformer encoder architecture with multi-head self-attention to capture complex sequence relationships. Inputs include protein embeddings from ESM2 and basic sequence statistics. Model training utilized leave-one-genome-out cross-validation for robust generalization. Condition-specific predictions were also enabled by incorporating 5′ UTR sequences. The framework is computationally efficient, completing transcriptome prediction for a microbial genome within 22 minutes on standard hardware.

TXpredict proved to be very accurate and scalable in the context of transcriptome prediction. It achieved a mean Spearman correlation coefficient of 0.53 for bacterial organisms and 0.42 for archaea and showed significant results for specific species such as B. hinzii (0.64), B. thetaiotaomicron (0.62), and C. beijerinckii (0.62). The predictions were extended to 900 additional genomes representing 276 genera and 3.11 million genes, which covered a large number of previously uncharacterized taxa. In the context of condition-specific transcriptomes, the model showed an average correlation of 0.52 over 4.6k experimental conditions, thereby capturing dynamic regulatory patterns. These results indicate that the framework is capable of giving precise predictions across a wide range of microbial species while keeping computational efficiency in check.

TXpredict addresses critical challenges in microbial genomics by bridging the gap between genome sequences and transcriptome predictions. This method, with the integration of protein embeddings, evolutionary constraints, and features specific to different conditions, presents a scalable, precise, and effective solution for various microbial taxa. This strategy not only yields valuable insights into gene regulation and adaptation but also possesses the potential to enhance synthetic biology and ecological research. Notwithstanding certain limitations, including dependence on pre-existing RNA-seq datasets and the exclusion of non-coding RNA components, TXpredict establishes a foundational framework for innovative applications in the field of microbial research.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation IntelligenceJoin this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.

The post This AI paper from the Beijing Institute of Technology and Harvard Unveils TXpredict for Predicting Microbial Transcriptomes appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

TXpredict 转录组预测 微生物基因组学 蛋白质语言模型 基因表达
相关文章