The widespread use of Next Generation Sequencing (NGS) techniques and its application to non model organisms provided biologists with large amounts of genetic data and the ability to address problems which were untreatable just a few years ago.
However, the frequent lack of reference genomes on non-model organisms usually creates difficulties to researchers when trying to answer specific genetic questions, such as finding candidate genes or looking for intra and inter-population variation.
For instance, mining SNPs in NGS datasets of anonymous pooled individuals that cannot be compared to a reference is still not a simple task.
Here is described 4Pipe4, a NGS data analysis pipeline, optimized for SNP mining in the aforementioned datasets, particularly on Roche 454 data.
In order to assess its efficiency, a dataset of anonymous pooled individuals of Quercus suber (Cork Oak), which does not have a reference genome available, was analysed with 4Pipe4, and a subset of tenths of SNPs detected by the pipeline was randomly selected and sequenced in an array for validation.
The results of the genotyping array were explored to: a) provide insights on population structure and gene flow patterns and b) make an association study with environmental factors such as temperature, precipitation or drought periods.
This combined approach of 454/genotyping array with the 4Pipe4 pipeline proved to be a very efficient and cost effective way to obtain validated and mapped SNPs from orthologous regions, for population genomics studies.