Leon is a software to compress Next Generation Sequencing data. It can compress Fasta or Fastq format.
The method does not require any reference genome, instead a reference is built de novo from the set of reads as a probabilist de Bruijn Graph. It uses the disk streaming k-mer counting algorithm contained in the GATB library, and inserts solid k-mers in a bloom-filter. Each read is then encoded as a path in this graph, storing only an anchoring kmer and a list of bifurcations indicating which path to follow in the graph if several are possible.
G. Benoit, C. Lemaitre, D. Lavenier, E. Drezen, T. Dayris, R. Uricaru, G. Rizk. (2015) Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph. BMC Bioinformatics, 2015, 16:288.
Leon is available as a binary for immediate use on Linux and MacOSX platforms, with the following requirements:
|MacOS-X 10.9 or above.
(Intel 64bit processors)
|Linux running on Intel or AMD 64bit processors.
(kernel 2.6.32 or above, GLIBCXX_3.4.13 or above)
For all other platforms or configurations, or if above binaries fail to run on your computer, you should download source code and compile it.
|LEON tool is fully written in C++.||Download|
Documentation and some typical results are available here.
LEON binaries and source code are covered by the Affero GPL version 3 license.
version 1.0.0 April 16, 2015
- bug fixes.
version 0.3.0 March 19, 2015
- added quality compression, and other optimizations.
version 0.2.1 Dec 18, 2014
- bug fixes
version 0.2 Oct 31, 2014
- major performance improvement (about ~ 2 times faster)
version 0.1.2 Aug 10, 2014
- initial public release