In a few words, GATB-Core takes a set of reads as an input (Fasta or Fastq files) and builds a structure representing these reads as a De Bruijn graph. The graph is saved in a HDF5 file. It is then loaded in memory in a tool (see GATB-TOOLS) to perform various genome analysis tasks.
One of the specificity of GATB-Core is its De Bruijn graph structure used for representing reads. From an historic point of view, the De Bruijn graph used in GATB-Core comes from the Minia software. Read more about GATB-Core concepts here. GATB-Core offers state of the art performance:
Bacterial dataset | Whole human dataset | Large (meta-)genome (10-20 Gbp) | |
Graph construction time | few minutes | 6 hours | 1-3 days |
Memory usage | < 1 GB | < 10 GB | < 100 GB |