Multispecies Metagenomic Calibration

Published on June 25, 2025 2:50 AM GMT

Cross-posted from my NAONotebook.

This is something I wrote internally in late-2022. Sharing it nowwith light edits, additional context, and updated links after the ideacame up at the Microbiologyof the Built Environment conference I'm attending this week.

Metagenomic sequencing data is fundamentally relative: eachobservation is a fraction of all the observations in a sample. If youwant to make quantitative observations, however, like understandingwhether there's been an increase in the number of people with someinfection, you need to calibrate these observations. For example,there could be variation between samples due to variation in:

Changes in how many humans are contributing to a sample.Has it been raining? (Especially in areas with combined sewers, butalso a factor with nominally separated sewers.)What have people been eating lately?What temperature has it been?How concentrated is the sample?etc

If you're trying to understand growth patterns all of this is noise;can we reverse this variation? I'm using "calibration" to refer tothis process of going from raw per-sample pathogen read counts toestimates of how much of each pathogen was originally shed into sewage.

The simplest option is not to do any calibration, and just considerraw relative abundance: counts relative to the total number of readsin the sample. For example, this is what MarcJohnson and Dave O'Connor are doing.

It seems like you ought to be able to do better if you normalize bythe number of reads matching some other species humans excrete. It'scommon to use PMMoVfor this: peppers are commonly infected with PMMoV, people eatpeppers, people excrete PMMoV. All else being equal, the amount ofPMMoV in a sample should be proportional to the human contribution tothe sample. This is especially common in PCR work, where you take aPCR measurement of your target, and then present it relative to a PCRmeasurement of PMMoV. For example, this is what WastewaterSCAN does.

Because the NAO is doing verydeep metagenomic sequencing, around 1B read pairs (300Gbp) persample, we ought to be able to calibrate against many species at once.PMMoV is commonly excreted, but so are other tobamoviruses,crAssphage, other human gut bacteriophages, human gut bacteria, etc.We pick up thousands of other species, and should be able to combinethose measurements to get a much less noisy measurement of the humancontribution to a sample.

This isn't something the NAO has been able to look into yet, but Istill think it's quite promising.

Comment via: substack

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签