Leon is a software to compress Next Generation Sequencing data. It can compress Fasta or Fastq format.
The method does not require any reference genome, instead a reference is built de novo from the set of reads as a probabilist de Bruijn Graph. It uses the disk streaming k-mer counting algorithm contained in the GATB library, and inserts solid k-mers in a bloom-filter. Each read is then encoded as a path in this graph, storing only an anchoring kmer and a list of bifurcations indicating which path to follow in the graph if several are possible.
Reference
G. Benoit, C. Lemaitre, D. Lavenier, E. Drezen, T. Dayris, R. Uricaru, G. Rizk. (2015) Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph. BMC Bioinformatics, 2015, 16:288.
Ready-to-use executable
Leon is available as a binary for immediate use on Linux and MacOSX platforms, with the following requirements:
MacOS-X 10.9 or above. (Intel 64bit processors) |
Download | |
Linux running on Intel or AMD 64bit processors. (kernel 2.6.32 or above, GLIBCXX_3.4.13 or above) |
Download |
For all other platforms or configurations, or if above binaries fail to run on your computer, you should download source code and compile it.
Source Code
LEON tool is fully written in C++. | Download |
Documentation
Documentation and some typical results are available here.
License
LEON binaries and source code are covered by the Affero GPL version 3 license.
ChangeLog
version 1.0.0 April 16, 2015
- bug fixes.
version 0.3.0 March 19, 2015
- added quality compression, and other optimizations.
version 0.2.1 Dec 18, 2014
- bug fixes
version 0.2 Oct 31, 2014
- major performance improvement (about ~ 2 times faster)
version 0.1.2 Aug 10, 2014
- initial public release