少点错误 03月24日 07:02
Tabula Bio: towards a future free of disease (& looking for collaborators)
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Tabula Bio旨在解决当前基因模型无法完全解释遗传疾病的问题。文章指出,尽管双胞胎研究显示疾病的遗传性很高,但现有基因模型解释力有限,限制了个性化基因医学的发展。Tabula Bio提出了一个基于机器学习的三部分方案,包括利用无标签的基因组数据、结合人类专家研究的先验知识,以及整合表观遗传学数据。文章强调了新颖的机器学习架构在疾病预测中的重要性,并介绍了团队成员及初期投资人。

🧬遗传疾病的复杂性:文章指出,许多最具破坏性的人类疾病具有遗传性,但现有基因模型无法完全解释这种遗传性。例如,精神分裂症的遗传性高达80%,但现有模型仅解释了约9%的变异。

💡机器学习是关键:作者认为,由于人类基因组的复杂性,需要依赖统计建模。机器学习方法在多个领域已经超越了专家系统,因此,将机器学习应用于生物学是未来的趋势。

🔬数据和架构的挑战:由于人类基因组庞大以及标记生物银行数据集有限,复杂的模型容易过拟合。因此,需要新颖的机器学习架构,利用无标签的基因组数据,结合人类专家研究的先验知识,并整合表观遗传学数据,以构建更准确的疾病预测模型。

🚀Tabula Bio的解决方案:Tabula Bio计划通过机器学习方法弥合遗传性和疾病预测之间的差距,并介绍了团队成员和投资人。

Published on March 23, 2025 4:30 PM GMT

Many of the most devastating human diseases are heritable. Whether an individual develops schizophrenia, obesity, diabetes, or autism depends more on their genes (and epi-genome) than it does on any other factor. However, current genetic models do not explain all of this heritability. Twin studies show that schizophrenia, for example, is 80% heritable (over a broad cohort of Americans), but our best genetic model only explains ~9% of variance in cases. To be fair, we selected a dramatic example here (models for other diseases perform far better). Still, the gap between heritability and prediction stands in the way of personalized genetic medicine.

We are launching Tabula Bio to close this gap. We have a three-part thesis on how to approach this. 

    The path forward is machine learning. The human genome is staggeringly complex. In the 20 years since the Human Gnome Project, much progress has been made, but we are still entirely short of a mechanistic, bottom-up model that would allow anything like disease prediction. Instead, we have to rely on statistical modeling. And statistical methods are winning over expert systems across domains. Expert-system chess AIs have fallen to less-opinionated ML, syntax-aware NLP models were left in the dust by LLMs, and more recently constraint-based robotics is being replaced by pixel-to-control machine learning. We are betting on the same trend extending to biology.  The core problem is limited data and large genomes. The human genome contains 3 billion base pairs, while labeled biobank datasets (genomes and disease diagnoses) are numbered in the hundreds of thousands. Complex models thus hopelessly overfit and fail to generalize. Additionally, the human genome is highly repetitive and much of it likely has no relation to phenotype. Because of these problems, we can’t simply train high-parameter black-box models (or treat DNA like language in a language model).Given 1 and 2, novel ML architectures will be required. This is consistent with other breakthroughs in AI. Different problems require different inductive biases. An ML architecture for disease prediction should:
      Make use of unlabeled genomics data (human and non-human) as well as homogeneous biobank data. Much genetic coding is conserved across species. Ignoring this is leaving data on the table. Include priors from human expert research. It would be wonderful to start from a blank slate (we named the company Tabula for a reason). But, for example, we know that DNA is a 3D molecule and that the distance (in 3D space) between genes and regulatory sequences matters. An architecture needs to start from this premise.      Include epigenetic data. We know it’s part of the story (maybe a large part).     

We’re interested in probabilistic programming as a method to build such a model.  

This is an ambitious idea, and we’re just getting started. But if we look into the future, to a world where humans have closed the heritability gap and personalized genetic medicine has eradicated great swaths of disease, it’s hard to imagine we did not get there via an effort like this.   

Our team is currently Ammon Bartram, who was previously cofounder of Triplebyte and Socialcam, and Michael Poon, who has spent the past several years working on polygenic screening, an early employee at Twitch, and studied CS at MIT. We’re honored to have Michael Seibel and Emmett Shear among our initial investors.

Please reach out to us at strong><u>team@tabulabio.com</u to help us harden our hypotheses or collaborate together on this mission! We're especially interested in talking with ML engineers interested in genomics and bioinformaticians interested in ML.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Tabula Bio 机器学习 基因组学 疾病预测
相关文章