少点错误 2024年09月24日
Predicting Influenza Abundance in Wastewater Metagenomic Sequencing Data
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

探讨废水宏基因组测序在病原体检测中的应用,分析所需测序深度及相关因素,结合流感数据进行研究。

🦠废水宏基因组测序用于病原体检测,RAi(1%)这一统计量用于衡量若过去一周 sewer shed 中 1%的人感染某种病原体,测序读数中该病原体的占比。该研究基于公共测序数据,受新冠影响,部分数据存在局限性。

🌡️2023 - 2024 流感季感染人数显著增加,加州和密苏里州的情况有所不同。密苏里大学和加州大学尔湾分校合作进行废水样本测序,分析了不同地区流感 A 和 B 的情况。

📊使用 v2 宏基因组测序流程分析测序数据,得到人类病毒的相对丰度估计。将每周阳性测试转换为每周发病率,通过贝叶斯模型估计 RAi(1%),不同病毒和测序协议的结果有所差异。

📈针对研究结果,计划更新宏基因组生物监测模拟器,更好地捕捉不确定性,使预测更具方差,对新型病原体检测方法的实用性影响不大。

Published on September 23, 2024 5:25 PM GMT

Thanks to our collaborators at MU and UCI for generating the sequencing data we analyze in this post, to the Grimm et al. (2023) authors for the analysis this builds on, and to Marc Johnson, Mike McLaren, Dan Rice, Will Bradshaw, Simon Grimm, and Evan Fields for feedback on the draft.

One key question in understanding the performance of metagenomic wastewater sequencing for pathogen detection is how deeply you need to sequence to see a pathogen spreading through the population. This depends on many things, but in our preprint Inferring the sensitivity of wastewater metagenomic sequencing for pathogen early detection (Grimm et al. 2023) we introduced a summary statistic RAi(1%): for a given pathogen and sequencing approach, what fraction of sequencing reads will be from the pathogen if 1% of people in the sewershed became infected in the past week?

The analysis in our preprint was based only on public sequencing data, and unfortunately most of it was generated from samples collected during the COVID-19 pandemic, which heavily suppressed most infectious diseases. For influenza in particular, CDC writes that "activity was unusually low throughout the 2020-2021 season", and correspondingly we didn't identify any sequencing reads from flu. This meant we were only able to use the data to generate an upper bound for RAi(1%). [1]

With society returning to pre-pandemic interaction levels, however, the 2023-2024 flu season did see a significant number of infections. This is visible in the following chart, showing CDC's count of positive tests for influenza in California and Missouri:

Two things to note about this chart:

We're interested in Missouri and California in particular because we've been collaborating with Marc Johnson and Clayton Rushford at the University of Missouri (MU) and Jason Rothman, then in Katrine Whiteson's lab at the University of California, Irvine (UCI) now a professor at the University of California, Riverside. They processed and sequenced wastewater influent, with the MU group sampling in central Missouri and the UCI group sequenced in southern California. The two groups each used their own metagenomic sequencing protocol to generate a combined 102B read pairs from samples collected between December 2023 and August 2024, which covers much of the 2023-2024 flu season. We limit our analysis here to the 79B read pairs from samples collected before May 2024, however, for three reasons:

    We suspect people are much less likely to get tested for influenza outside core flu season, making conclusions drawn from test rates less reliable.

    We see occasional influenza A reads in May and beyond, but positive tests fall to zero. This causes a problem when we try to correlate reads with tests because simultaneously observing non-zero reads and zero tests leads our model to overestimate sensitivity.

    Outside of core flu season the proportion of influenza in wastewater that is from animal sources is likely higher.

We analyzed the sequencing data with our v2 metagenomic sequencing pipeline, which uses BBDuk to quickly identify potential human viral reads, then validates them with Kraken2 and Bowtie2. This gives us per-sample estimates of relative abundance [2] for each human virus of interest. Here is the relative abundance we see in the sequencing data plotted against per-capita positive tests reported to CDC:

As you'd expect, we see more influenza in the wastewater at times when there are more positive tests, though the correlation isn't great. Because statewide positive tests are only a rough proxy for incidence in the specific monitored sewersheds it would be tempting to assume the inaccuracy of the proxy explains all of the difference, but it's likely that much of the variability here is noise in the metagenomic sequencing process.

One thing to notice is that CDC observed many more samples testing positive for flu A relative to flu B in California (where UCI collected samples), while in Missouri (where MU collected samples) they saw an approximately even split. This means the MU data can give us a good view of both Flu A and Flu B, while the UCI data is mostly useful for understanding Flu A sensitivity.

To get a more quantitative sense of the relationship between incidence and relative abundance we applied the methods from our preprint to this new data to estimate RAi(1%), the relative abundance we'd expect to see at 1% weekly incidence. We convert weekly positive tests to weekly incidence via an underreporting factor we derive in our preprint from CDC's flu burden estimates, and then compare the incidence with relative abundance using a Bayesian model. The model gives a posterior probability distribution for the RAi(1%) for each virus and sequencing protocol. Here's the Rothman et al. (2021) estimate for SARS-CoV-2 from our preprint, along with estimates for influenza A and B that we generate here for the unpublished MU and UCI data:

For example, our median estimate (vertical bars) of the influenza A RAi(1%) with MU sequencing (MU Flu A) is 1.4e-8. This is 23% of our preprint's median estimate of 6.0e-8 for SARS-CoV-2 with Rothman et al (2021) sequencing. Influenza B with MU sequencing has a similar median but more uncertainty, due a combination of fewer observations and worse correlation. The UCI data shows substantially lower relative abundance for influenza A (5% of the preprint's 6e-8), and lower again for influenza B (1% of the preprint's 6e-8).

In response to this work we plan to update our metagenomic biosurveillance simulator in two ways:

Overall we expect these changes to make our projections higher variance and somewhat less optimistic, but not to have a large impact on whether this approach to novel pathogen detection is practical. As we ramp up our sequencing going into the 2024-2025 infectious disease season we expect to be able to reduce this uncertainty further.


  1. When we don’t observe any reads for a virus, our method can rule out veryhigh—but not very low—RAi(1%). This upper bound decreases when there is highincidence in the community and when we have more total reads. ↩︎

  2. We use this term to mean the number of high-quality unique read pairsassigned to a virus, as a fraction of all raw demultiplexed pairs produced bythe sequencer ↩︎



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

废水测序 病原体检测 流感 RAi(1%) 生物监测
相关文章