Optimized Variant Analysis
An original paper published in Nature Biotechnology this month reports that a 290-fold improvement in variant calling accuracy can be gained through the application of a set of filters that include a consensus filter. Brian Hilbush and John Cleary of Real Time Genomics collaborated with VIB scientists on the paper, which is freely available online (Optimized filtering reduces the error rate in detecting genomic variants by short read sequencing).
This paper shows the critical value of next-generation analytics that employ error reduction techniques. Specifically, it shows how a dual analysis pipeline including RTG Investigator sequence analysis software improves the quality of reported results in DNA sequence analysis and reduces costly, time-consuming lab validation steps. And for the application of consensus filtering to Complete Genomics analysis, RTG Investigator is the only software available for independent mapping and variant analysis.
When searching for somatic mutations that may appear at a rate of one in a million base pairs, the noise from machine errors in sequence data (1-2% of base pairs) drives need for improvements in variant detection analysis approaches. This paper shows that investigators can confidently use a range of filters to quiet the noise and dramatically reduce the number of putative variants that require lab validation. Applied to cancer genome sequencing, one can expect consensus filtering to address the challenges in driver mutation discovery and patient identification for stratified clinical trials.
Included in the set of filter options is a consensus filter, described as the intersection of genomic regions mapped and called by two unique variant analysis pipelines. This paper reports results from the intersection of RTG Investigator and Complete Genomics analysis pipelines in the consensus filter. To prove the efficiency of the filter strategy, the authors applied the technique in three experimental examples that used whole genomes: 1) monozygotic twins, 2) a cancer tumor/normal pair, and 3) a HapMap subject.
Filtering steps were evaluated independently, and then applied in combination. For each example, an optimal filter combination choice was identified with the best Matthews correlation coefficient (MCC) value. MCC measures the quality of results obtained, balancing true and false positives against true and false negatives. Essentially, it identifies the filter combination that delivers the strongest signal with the least noise. Consensus filtering was included in the filter combination with the best MCC for tumor-replicate comparison
The collaboration behind the consensus filter option formed out of VIB participation in the RTG Early Access Program from 2009-2010. VIB sought to validate the sequence data they were getting from Complete Genomics, and found utility in the intersection of mapping and variant results between the two analysis pipelines. They found areas of high concordance in their results, which gave them confidence in the data and proved useful in limiting downstream lab validation costs.
Others have identified the benefits of complementary results in NGS variant detection. Notably, investigators at the Stanford-based Snyder Lab reported on a performance comparison of whole-genome sequencing platforms. They compared Complete Genomics and Illumina data, and the intersection of results exhibited concordance of 88% reported from analysis pipelines of both technologies.
Compared to the optimized filtering approach though, this method requires additional sequencing of each genome. By contrast, the consensus filter approach allows an investigator to extend an investment in CG data. Rather than generate new sequence data with a different instrument, they can reanalyze their first data set for a fraction of the cost (10-20% of the original cost of genome sequencing and analysis).
