RTG Investigator

RTG Investigator bioinformatics application software applies the highest sensitivity in sequence alignment to deliver the most accurate results in downstream variant and metagenomic analysis. The product's plug-and-play bioinformatics tools enable researchers to deploy efficient and effective analysis pipelines in just days. 

Fast, comprehensive pipelines allow quick comparison of data from multiple sources, reducing false positives and increasing confidence in your results.   Click here for a detailed introduction to the core pipeline and its capability for accurate variant detection. Click here to learn how to apply RTG Investigator to optimized variant analysis.

 


Sensitive Sequence Alignment

Sensitive Sequence Alignment

Accurately align more short read sequences against a reference

 

Accurate calling of sequence variants

Accurate Variant Analysis

Identify more true positives and reduce lab validation costs

 

Fast translated sequence search

Fast Protein Search

Fast search enables metabolic profiling with metagenomics data

 

Unparalleled speed and sensitivity for variant detection and metagenomic analysis

 

Product Value

Streamlined Analysis Pipelines

Start work immediately

 Integrated pipelines for variant detection and metagenomics jump-start research with high-throughput sequence data.

 

Sensitive Sequence Alignment

Indel-tolerant mapping

Industry-leading sensitivity and precision coupled with flexible tuning enables highly accurate downstream analysis.

 

Fast Sequence Search

See results, AND iterate

Highly efficient algorithms make RTG the fastest tool for processing sequence data, which allows time for iteration and comparison.

Complete Genomics

Get more from your data

Ability to map and align read data on in-house systems gives additional insight that extends an investment in Complete Genomics data

 

Easy-to-integrate Functions

Free up staff to innovate

Consistent command parameters, standard file formats, and extensive filtering and reporting options simplify pipeline operations.

 

Simple, Stable Deployment

Run anywhere

Tested and documented product scales from a single workstation to large-scale Linux or Windows compute clusters.

 

Specifications:

SEQUENCE ALIGNMENT

Read Mapping (map command)

Features
  • High-speed gapped alignment algorithm
  • Integrated search, pairing and alignment operations
  • Read indexing with word and step settings
  • Optional limits on repetitive sequence frequency
  • Base quality recalibration file for variant calling
Performance
  • Throughput capability of >10M reads/hr/core
  • Handles single-end and paired-end read data
  • Multi-core utilization
  • Search settings for indel number and gap length
  • Find distant homologies in cross-species mapping
Input
  • FASTA or FASTQ files
  • No reference size restriction
Output
  • SAM format (sorted for variant calling, indexed for region-specific search)
  • Optional reporting of ambiguously mapped reads
  • Statistics on mapping performance
  • Optional files for unmapped reads
Reporting
  • All hits, top equals, top random
  • Optional scoring of Ns as match/mismatch
  • Settings for alignment score thresholds
  • Settings for insert size window

Protein Database Search (mapx command)

  • Performs translated nucleotide sequence searches against protein databases
  • Allows sensitivity tuning for gap and mismatch tolerance
  • Accepts single end read data
  • Multi-core utilization
  • Reports results in a format modeled after BLASTX

Contaminant Filtering (mapf command)

  • Maps a read data set against a contaminant template
  • Produces filtered and contaminant read datasets as output

Supported Data Types

  • Imports data from either a FASTA or FASTQ file format
  • Interprets the Solexa quality scores during input formatting
  • SE and PE reads from Illumina Genome Analyzer II systems, read lengths 35 bp and higher
  • SE and PE reads from Ion PGM systems, fixed and variable
  • SE and PE reads from 454 Titanium systems, read lengths from 35 bp and higher
  • PE reads from Complete Genomics (cgmap command)

Data Management Utilities

  • Stores sequences in compressed SDF directories if required, for efficient access to parsed sequence data
  • Provides SDF and SAM datafile statistics
  • Commands support process of SDF portions for parallel processing
  • Separate locations for input, temporary and output files can be specified, for increased I/O utilization

Command Line Interface (CLI)

  • Executes functions as commands launched from a single executable
  • Uses consistent and memorable parameter structure
  • Produces reports as ASCII files in TSV format
 

VARIANT ANALYSIS

Variant Calling (snp command)

Features
  • Pedigree-aware family variant calling (mother, father, and any number of sons and daughters)
  • Somatic variant calling (tumor, matched normal)
  • End-to-end handling of sex chromosomes
  • Haploid calling
  • Optimized Bayesian technique for SNP and genotype calling
  • Integrated realignment algorithm for Indels
  • Automated base quality recalibration
  • Genomic ambiguity threshold
  • Discovery and delineation of complex genomic loci
Performance 
  • Calls SNPs, MNPs and Indels (up to10bp)
  • Multi-platform aware SNP calling (combine data)
  • Set coverage, alignment score and mating thresholds
Input
  • Alignment data in SAM or BAM format
Output
  • VCF file format
  • Statistics for coverage, error correction and base counts
  • Computed posterior scores

Coverage Depth (coverage command)

  • Calculates read depth across a reference genome with smoothing options
  • Reports output in industry standard BED format

Copy Number Variation Analysis (cnv command)

  • Identifies and reports copy number variation ratio between two genomes

METAGENOMICS ANALYSIS

Species Frequency (species command)

  • Estimates species frequency with Bayesian statistics
  • Aligns ambiguous and unambiguous short read data against a reference database and reports a frequency distribution

Similarity Relationship (similarity command)

  • Executes an all-by-all comparison of short read sequences
  • Examines k-mer word frequencies and word intersections
  • Reports a similarity matrix that shows any relationships between reads

DATA CENTER DEPLOYMENT

Operating Environments

  • Designed for high throughput applications
  • Efficient utilization of CPU, Memory, I/O and datafile sizes
  • Java JAR file option runs in UNIX, Linux, and Windows HPC environments, tested on Java JVM 1.6 running in RHEL / Centos 5.3 (64 bit) Linux
  • Automated installation
  • License key allows flexible use on any node / CPU core
  • Logs run-time exception conditions
  • Alerts technical support of critical exceptions through online TalkBack system
  • Compresses and recognizes compressed pipeline files by default
Documentation
  • Command Line Help
  • Operations Manual
  • Quickstart Tutorial with example dataset