Human Microbiome Analysis

 Metagenomics approaches in medicine are poised to deliver information of high diagnostic value for infectious diseases and many chronic health conditions.  An emerging interest in the use of high-throughput sequence data from new systems like the Illumina HiSeq promises extremely efficient sequence production that reduces the costs of large-scale projects.  However, the computational challenges found in metagenomics projects are among the most daunting in genomic science. 

 

Expand human microbiome research
with Illumina sequence data

 

An integrated metagenomics pipeline enables human microbiome research with Illumina sequence data.

 

RTG has been at the forefront in developing analytical tools to tackle the many analytical tasks in accurately assessing complex microbial environments within the body.  For metagenomics, we adapted our core read mapping and alignment technology to achievement of metabolic functional analysis, species frequency estimation and k-mer analysis.  The following table summarizes the components of the pipeline.

RTG Algorithms Metagenomics Analysis RTG Innovation
Map Map Illumina and/or 454 reads to reference genomes for species search Novel index structure; high accuracy alignment
Species Estimate frequency of bacterial species in a sample Statistical analysis of alignment data in large nucleotide databases
Mapf Remove contaminant reads from a sequence dataset High sensitivity filtering, process simplification
Mapx Query translated nucleotide sequence data against protein databases 1000X acceleration versus standard tools
Similarity Produce a similarity matrix and nearest neighbor tree Novel technique for large-scale molecular comparisons

We produce thousands of metagenomics data sets in a few months, and it would take a decade or more to analyze these on a large cluster with BLASTX and other existing software.  Real Time Genomics proved to us that RTG software was a reliable, accurate alternative to these programs that could reduce processing time by two orders of magnitude.  This allows us to perform analysis on a comparable time frame as data production

- George Weinstock, Associate Director