Our new read compression software Leon has been published in BMC Bioinformatics : http://www.biomedcentral.com/1471-2105/16/288.
Leon is a software to compress Next Generation Sequencing data. It can compress Fasta or Fastq format. The method does not require any reference genome, instead a reference is built de novo from the set of reads as a probabilist de Bruijn Graph. Leon is based on the GATB library. It uses the disk streaming k-mer counting algorithm contained in the GATB library, and inserts solid k-mers in a bloom-filter. Each read is then encoded as a path in this graph, storing only an anchoring kmer and a list of bifurcations indicating which path to follow in the graph if several are possible.
Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph. G. Benoit, C. Lemaitre, D. Lavenier, E. Drezen, T. Dayris, R. Uricaru, G. Rizk. BMC Bioinformatics 2015 16:288.
Sources (A-GPL), binaries, documentation and news are available at Leon web page.