GATB Global Architecture

The heart of GATB is GATB-Core : a high-performance and low memory footprint C++ library.

GATB-Core natively provides the following operations:

Reads handling:
  • FASTA/FASTQ parsing
  • Parallel iteration of sequences
  • K-mer counting
  • Minimizer computation of k-mers, partitioning of datasets by minimizers
  • Bloom data structure of k-mers
  • Hash table of k-mers
  • Minimal perfect hash function of k-mers
  • Arbitrarily large k-mers representations
de Bruijn graph:
  • graph construction
  • graph traversal operations (contigs, unitigs)
  • graph simplifications for assembly (tip removal, bulge removal)

The GATB-Core library serves as a layer to develop tools and pipelines to decipher NGS data:

  1. GATB-CORE: a C++ library holding all the services needed for developing software dedicated to NGS datasets.
  2. GATB-TOOLS: ready-to-use NGS analysis software mainly built upon the GATB-Core library: k-mer counter, contiger, scaffolder, variant detection, etc.
  3. GATB-PIPELINES: a set of NGS pipeline that links together tools from the previous layer.

Since GATB-Core is a software library, the audience is mainly developers interested in creating software to perform custom-made NGS data analysis tasks. Example usages are assembly tools, de novo variant detection, reads error correction, reads compression.

From a developper point of view, the GATB-Core library provides APIs for creating/loading/traversing de Bruijn graphs, counting kmers, etc. The provided APIs are intended to be simple to use and should allow easy development of new softwares.

GATB-Core is available through an open-source C++ API. Wrappers for other languages, such as Python and Java, will be available in the future (as of March, 6th, 2017: a Python Wrapper is under experimentation).

You can download GATB-Core as an archive holding the library and the header files, or by cloning a GIT repository from GitHub if you need the source code.

The best way to see how to use GATB-Core as a developper is first to have a look at the tutorials. In a second step, the reference documentation should give further details.

Comments are closed