Illumina Sequence Analysis
RTG Investigator sequence analysis software enables sequence alignment and genome analysis with sequence data from Illumina HiSeq and GAII instruments. RTG Investigator supports single end and paired end read data from Illumina systems for read lengths of 30 bp through to 150 bp and beyond. A unique feature of RTG Investigator’s mapping technology is that it performs mapping, gapped alignment and read pair mating in a single pass. While RTG Investigator’s core mapping engine can process reads from different platforms, it has been highly tuned to map and align the fixed length reads produced by Illumina sequencers at speeds that are unsurpassed by any other mapping technology. As such, mapping exercises that may take days with other mapping tools can be completed in hours with RTG Investigator.
Fast sensitive mapping technology
RTG's unique hash table index structure out-performs even the fastest algorithms built with Trie structures (BWA and Bowtie). When tested with real data from the 1000 Genomes Project, the RTG map command was shown to run 6.7x faster than BWA when tuned to map 93% of 108bp reads with accuracy.
RTG’s mapping engine has higher tolerance to indels, errors and mutations, producing more accurate data for downstream analysis. In tests with simulated 100bp reads over a typical range of error rates, RTG correctly mapped over 99% of reads at 0.5% and 1.0%. At 2.0% error the mapping percentage was still over 97%, and at 5.0% error, over 94%. This sensitivity can be used in novel studies with emerging sequencing platforms or cross-species mapping.
Variant detection analysis
RTG integrates multiple elements of a comprehensive variant calling pipeline, including paired end alignment, quality recalibration steps, and local realignments, into two functions: map and snp. Fewer steps make for a shorter learning curve and 10x faster execution time. With RTG, investigators quickly uncover more variants, and increase confidence in those that are shared.
Metagenomic analysis
An extensive set of integrated analysis functions from RTG enables rigorous metagenomic analysis with high throughput sequence data from Illumina sequencing systems. Investigators can combine accurate contaminant filtering, translated protein search, similarity clustering, and species estimation functions in one pipeline with speed to process the large volumes of data.
For this pipeline, the core RTG mapping engine has been wrapped and adapted to provide a contaminant filter tool, and a translated protein search tool.
Contaminant filter
The contaminant filter, called mapf, is used to remove reads that map to an identifiable species. High sensitivity mapping ensures nearly complete removal of contaminants. Reads are mapped against a contaminant sequence reference with filtered and contaminant reads datasets produced as outputs. This adaptation of the RTG mapper provides a one step solution to contaminant filtering and is less complex that the multi-step process required when using other tools.
Translated nucleotide search
The protein search tool, called mapx, performs translated nucleotide sequence searches against a protein database. It has been tuned for high throughput processing of fixed length Illumina reads, achieving speedups of 100x to 1000x when compared to BLASTX. The output from mapx has been modeled on BLASTX’s output format meaning that mapx can easily be dropped in as a replacement.
The RTG mapx command allows investigators to use the latest instruments, such as the Illumina HiSeq, to analyze and identify active metabolic pathways present in metagenomics communities. In a test with 100bp Illumina reads, mapx ran 1000 times faster than BLASTX and produced identical matches down to 60% homology. This increased throughput allows investigators to process large quantities of data in a short period of time, giving them more outputs to perform detailed statistical analysis on.
A typical application of the combined use of mapf and mapx is to process a metagenomic sample, in the form of a lane of Illumina HiSeq 100bp paired-end reads. The first step is to filter out contaminant reads using RTG mapf. This contaminant filtering removed unwanted donor sequences that may interfere with protein hit counts. The filtered reads are then processed with mapx to produce information about distribution, abundance and homology of proteins hits from the metagenomic sample.
![]() |
- George Weinstock, Associate Director |

