SNP detection between 2 mouse strains

Discovering SNPs between 2 inbred mouse strains has been tested using the discoSNP strategy. Unlike many other approaches, discoSNP does not require a reference genome. The method is simply based on analyzing a de-Bruijn graph built from raw sequencing datasets.In this experimentation, two 100bp read datasets, representing 2.88 billions of reads, have been used as input of discoSNPs. The software directly identified 1,794,515 SNPs, that is approximately the same amount of SNPs found by Wong et. al. in their study1. mice

discoSNP runs in two steps: (1) detection of putative SNPs from the read datasets; (2) filtering and ranking based on coverage and base quality. Thanks to the use of the GATB-core library, the first step is able to handle very large datasets of billions of reads with a reasonable amount of memory. The processing of the mouse datasets required less than 6 GB of RAM. In comparison, the NIKS, KissSNP and Bubbleparse tools exceeded the memory limit on a server with 512 GB of RAM.

Reference

  1. Wong, K., Bumpstead, S., Van Der Weyden, L., Reinholdt, L. G., Wilming, L. G., Adams, D. J., and Keane, T. M. (August, 2012) Sequencing and characterization of the FVB/NJ mouse genome. Genome biology, 13(8), R72.
  2. Uricaru R., Rizk G., Lacroix V., Quillery E., Plantard O., Chikhi R., Lemaitre., Perterlongo P. (2015) Reference-free detection of isolated SNPs. Nucl. Acids Res. 43(2):e11

Comments are closed.