Metagenomics Data Production on the Human Microbiome Project (HMP) 

Investigators at The Genome Center at Washington University faced enormous data production and analysis challenges working with short read NGS data on the NIH-funded Human Microbiome Project (HMP). An integrated pipeline of efficient analysis tools for filtering, alignment, and search addressed the production problem of this large metagenomics research project.

 
Metagenomics pipeline - HMP data production
 
The RTG metagenomics pipeline is shown here. In summary,  RTG mapf removes contaminating human reads from an original HMP microbiome sample.  Similar to NCBI BLASTX, RTG mapx performs translated search of protein databases for analysis of the functional/metabolic potential of the microbial community. RTG mapx is tunable to be 100 to 1000 times faster than BLASTX with 99.95% of proteins reported being exactly the same at 70% sequence similarity or higher. RTG phylogeny enables community vs. community analysis.  RTG map aligns reads to bacterial genomes and enables further species distribution analysis with RTG coverage.