Leon

Leon is a software to compress Next Generation Sequencing data. It can compress Fasta or Fastq format.
The method does not require any reference genome, instead a reference is built de novo from the set of reads as a probabilist de Bruijn Graph. It uses the disk streaming k-mer counting algorithm contained in the GATB library, and inserts solid k-mers in a bloom-filter. Each read is then encoded as a path in this graph, storing only an anchoring kmer and a list of bifurcations indicating which path to follow in the graph if several are possible.

Reference

G. Benoit, C. Lemaitre, D. Lavenier, E. Drezen, T. Dayris, R. Uricaru, G. Rizk. (2015) Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph. BMC Bioinformatics, 2015, 16:288.

Ready-to-use executable

Leon is available as a binary for immediate use on Linux and MacOSX platforms, with the following requirements:

maclogo MacOS-X 10.9 or above.
(Intel 64bit processors)
Download
linuxlogo Linux running on Intel or AMD 64bit processors.
(kernel 2.6.32 or above, GLIBCXX_3.4.13 or above)
Download

For all other platforms or configurations, or if above binaries fail to run on your computer, you should download source code and compile it.

Source Code

cpp-logo LEON tool is fully written in C++. Download

Documentation

Documentation and some typical results are available here.

License

LEON binaries and source code are covered by the Affero GPL version 3 license.

ChangeLog

version 1.0.0 April 16, 2015

  • bug fixes.

version 0.3.0 March 19, 2015

  • added quality compression, and other optimizations.

version 0.2.1 Dec 18, 2014

  • bug fixes

version 0.2 Oct 31, 2014

  • major performance improvement (about ~ 2 times faster)

version 0.1.2  Aug 10, 2014  

  •  initial public release

Comments are closed