Metagenomics - High-throughput Metagenomic Analysis
Metagenomics experiments performed with high-throughput Illumina sequencing instruments for human microbiome analysis and environmental genomics require metagenomic analysis software with fast translated nucleotide search, k-mer analysis and bacterial species estimation. Click here for our high-throughput metagenomics poster.
| Mapx attains orders of magnitude speedup over BLASTX on translated nucleotide searches | ![]() |
20M Illumina Reads (100bp) were queried against the KEGG Database with 1.6M entries. RTG mapx found 95% of top hits reported by BLASTX above 70% sequence identity. |
metagenomic analysis with Illumina sequence
Translated Nucleotide SearchThe RTG mapx command queries translated nucleotide sequences from Illumina-type reads against protein databases such as nr or KEGG and computes statistical significance of resulting matches. The alignment mechanism of RTG's mapx is semi-global (short read matched across large database subject), as contrasted to the local alignment approach used by BLASTX to find short regions of high similarity. The mapx data structure and heuristics employed during the search phase efficiently reduce the enormous computational requirements of translated searches. The relative acceleration provided by mapx over BLASTX enables investigators to run computationally intensive search jobs on modest hardware. The RTG technology opens the window for metabolic profiling by mapx search and functional annotation. Another task large-scale task that can be performed by a mapx-engineered pipeline is the identification of metabolic pathways present in metagenomics communities. In a test with 100bp Illumina reads, mapx returned identical search results to BLASTX down to 60% homology, at speeds 100-1000X faster. |
![]() |
|
|
|
||
Molecular PhylogeneticsRTG's 'similarity' tool leverages the power of a proprietary hash table data structure for sequence composition-based molecular phylogenetics. The technique has several immediate applications in diagnostics and metagenomics. Read datasets (or smaller, representative samples) from a collection of metagenomics samples can be rapidly analyzed to detect sample relatedness and identify groupings based solely on sequence content, unbiased from existing sequence databases. Newly discovered community relationships, highlighting species or functional relatedness, can be further explored with additional RTG metagenomics analysis pipeline tools. The RTG similarity command uses a novel technique for k-mer analysis that produces a similarity matrix and nearest neighbor tree from high-throughput sequence data. These data are plotted as trees or clustergrams and useful for molecular phylogenetics analyses. |
![]() |
|
|
|
||
Reference Genome Database MappingDesigned as a foundational method for short read mapping, RTG's map algorithm is well-suited for metagenomics data processing and analysis. For most investigations, metagenomics samples are initially characterized by mapping read datasets to nucleotide sequence databases, typically microbial and viral genome collections containing 1000s of sequences. The RTG aligner provides maximal sensitivity and high mismatch tolerance to enable discrimination of closely related species and detection of remote homologs. The alignment score or percent identity threshold can be adjusted for desired cutoffs prior to determining species presence or absence in a sample. RTG's mapping algorithm optionally reports ambiguously mapped reads, which is important when a high level of sequence similarity is found across microbial phyla. |
![]() |
|
|
|
||
Species Frequency and CompositionTwo main tasks of metagenomics are to identify the spectrum of microbial species present in a complex community and to quantitate each individual species abundance. RTG's species command, working on baseline alignment data from reference genome mapping, produces breadth and depth of coverage per genome. Unique to RTG, the species frequency estimates are also reported with respect to a static database and the fraction of DNA contributed by each species to the sample. |
![]() |
|
|
|
||
Contaminant FilterBefore downstream analysis, an experimental protocol may require removal of reads from the query dataset that map to one or more identifiable species. Built around the RTG aligner, the mapf command filters metagenomics sample reads by mapping them against a reference template and produces a cleaned dataset for follow-on analysis. Tunable, high-sensitivity mapping ensures nearly complete removal of contaminants. |
![]() |
|






