DNA Data Deluge
Are researchers really buried in genomic data? Does it really cost more to analyze a genome than to produce the sequence for it? A recent article by Andrew Pollack, biotechnology editor for the NY Times, titled "DNA Sequencing Caught in Deluge of Data," suggests an exploding bioinformatics market that specifically addresses the DNA data deluge. Advances in bioinformatics are necessary because, according to David Haussler of UCSC, "Data handling is now the bottleneck. It costs more to analyze a genome than to sequence a genome."
Isaac Ro, an analyst at Goldman Sachs, was quoted in the same article saying that the "field of bioinformatics for genetic analysis will be one of the biggest areas of disruptive innovation in life science tools over the next few years." At Real Time Genomics, we agree with this assessment. But rather than repackage existing tools, we approach the challenge differently, attacking the need for clinical-grade pipelines with uniquely effective and efficient algorithms developed in house. The result is RTG Investigator, a comprehensive set of tools and functions for variant and metagenomic analysis. The underlying sequence search technology offers uniquely sensitive sequence alignment without sacrificing speed, and the resulting analysis pipelines are extraordinarily accurate.
Though the popular press may be sounding the alarm now, the DNA data deluge is a well understood problem among industry insiders. With this post, you should get a sense of what we were learning twelve months ago that inspired the integration of sequence analysis components in RTG Investigator to support rapid delivery of high quality results in human genomic investigations by researchers and product developers in life sciences.
The NY Times article speaks to worldwide sequencing capacity of 13 quadrillion DNA bases per year (roughly 150,000 human genomes). Already, researchers have submitted over 700 petabytes of data to federal research archives. Though the National Center for Biotechnology Information (NCBI) threatened to shut down the archives for cost reasons in early 2011, researchers strongly argued for the need to maintain the data online. Today, the current status of the NCBI Sequence Read Archive (SRA) is full availability.
Eric Green, Director of the National Human Genome Research Institute, saw this coming. This past February, Green released a strategic plan for the next phase of genomic research that called for advancements in five research domains. Supporting this research would be investment in new technologies, and the development of new analytical methods, software tools, and a robust computational infrastructure.
In this video clip from the first Current Topics in Genome Analysis (CTGA) lecture, Green says specifically, "The rate limiting fact to deal with in genomics now, is not generating data, it's analyzing data." He goes on to say, "a small sequencing center facility completely needs a large bioinformatic analysis pipeline to deal with the data coming off." Paraphrasing this point in the lecture, the bottleneck is in computation, algorithms and staff.
A few months before the Green lecture, Elaine Mardis of The Genome Institute at Washington University at St. Louis, wrote about the problem in The $1,000 genome, the $100,000 analysis?, published in Genome Medicine. Mardis refers to the required expertise to ‘solve’ a clinical case (molecular and computational biologists, geneticists, pathologists and physicians with exquisite knowledge of the disease and of treatment modalities, research nurses, genetic counselors, and IT and systems support specialists) and asks "How will this dream team be assembled?" Mardis suggests that the community must "emphasize the development of clinical-grade interpretational analysis pipelines to perform much of the initial discovery from datasets derived from massively parallel sequencing."
Taken as a whole, these calls to action have motivated our team of computer science experts to deliver high quality bioinformatics software that is uniquely easy to implement for advanced sequence analysis applications with next generation sequencing data. Check out a free download of RTG Investigator for individual use, or contact us directly for an evaluation.
