Metagenomic source tracking after microbiota transplant therapy
Abstract
Reliable engraftment assessment of donor microbial communities and individual strains is an essential component of characterizing the pharmacokinetics of microbiota transplant therapies (MTTs). Recent methods for measuring donor engraftment use whole-genome sequencing and reference databases or metagenome-assembled genomes (MAGs) to track individual bacterial strains but lack the ability to disambiguate DNA that matches both donor and patient microbiota. Here, we describe a new, cost-efficient analytic pipeline, MAGEnTa, which compares post-MTT samples to a database comprised MAGs derived directly from donor and pre-treatment metagenomic data, without relying on an external database. The pipeline uses Bayesian statistics to determine the likely sources of ambiguous reads that align with both the donor and pre-treatment samples. MAGEnTa recovers engrafted strains with minimal type II error in a simulated dataset and is robust to shallow sequencing depths in a downsampled dataset. Applying MAGEnTa to a dataset from a recent MTT clinical trial for ulcerative colitis, we found the results to be consistent with 16S rRNA gene SourceTracker analysis but with added MAG-level specificity. MAGEnTa is a powerful tool to study community and strain engraftment dynamics in the development of MTT-based treatments that can be integrated into frameworks for functional and taxonomic analysis.