GATB Global Architecture

The heart of GATB is GATB-Core : a high-performance and low memory footprint C++ library.

GATB-Core natively provides the following operations:

Reads handling:
  • FASTA/FASTQ parsing and writing
  • support for plain text and gzipped Fasta/Fastq files
  • Parallel iteration of sequences
K-mer:
  • K-mer counting
  • Minimizer computation of k-mers, partitioning of datasets by minimizers
  • Bloom data structure of k-mers
  • Hash table of k-mers
  • Minimal perfect hash function of k-mers
  • Arbitrarily large k-mers representations
de Bruijn graph:
  • graph construction
  • graph traversal operations (contigs, unitigs)
  • graph simplifications for assembly (tip removal, bulge removal)

Other optimized data structures

In addition to the de Bruijn graph data structure, GATB-Core provides several other ones that can be of interest for general purpose developments. These are:

The GATB-Core library serves as a layer to develop tools and pipelines to decipher NGS data:

  1. GATB-CORE: a C++ library holding all the services needed for developing software dedicated to NGS datasets.
  2. GATB-TOOLS: ready-to-use NGS analysis software mainly built upon the GATB-Core library: k-mer counter, contiger, scaffolder, variant detection, etc.
  3. GATB-PIPELINES: a set of NGS pipeline that links together tools from the previous layer.

Since GATB-Core is a software library, the audience is mainly developers interested in creating software to perform custom-made NGS data analysis tasks. Example usages are assembly tools, de novo variant detection, reads error correction, reads compression.

From a developper point of view, the GATB-Core library provides APIs for creating/loading/traversing de Bruijn graphs, counting kmers, etc. The provided APIs are intended to be simple to use and should allow easy development of new softwares.

GATB-Core is available through an open-source C++ API. A wrapper for Python 3 (pyGATB) is also available.

You can download GATB-Core as an archive holding the library and the header files, or by cloning a GIT repository from GitHub if you need the source code.

The best way to see how to use GATB-Core as a developper is first to have a look at the tutorials. In a second step, the reference documentation should give further details.

Comments are closed.