Pre-formatted Metagenomic and Human reference datasets
December 21, 2014
While reference datasets can be created from existing FASTA files solely using RTG commands, to make full use of RTG functionality this can require the creation of additional configuration files. For example, the metagenomic species abundance tools work best when provided with species taxonomy information, and to make use of the sex-aware capabilities of the human variant calling pipeline requires reference configuration specifying sex chromosome information and the location of PAR regions.
To get our customers up and running more smoothly, we have provided several commonly used datasets in a pre-formatted form. The following datasets are available:
Human reference genomes:
- 1000g_v37_phase2.sdf.zip (996.8 MB) (chromosomes named as "1", "2", etc)
- hg19.sdf.zip (984.4 MB) (chromosomes named as "chr1", "chr2", etc)
- GRCh38.sdf.zip (999.2 MB) (chromosomes named as "chr1", "chr2", etc. This is the "no alt analysis set")
- GRCh38_hs38d1.sdf.zip (1001.2 MB) (chromosomes named as "chr1", "chr2", etc. This is the "no alt analysis set" plus decoys)
Metagenomics support databases:
The "Pipeline Commands" section of the user manual has instructions on how to use these databases, as well as how you can create and use your own databases.
Once downloaded and unzipped, you can run the rtg sdfstats command on the dataset for more information.