Documentation

 

SNPbox is a software tool for large scale standardized primer design. Primer conditions are chosen in such a way that PCR amplifications are uniform facilitating the use of high-throughput genetic platforms. In the primer selection process, repeat regions are taken into account.

basic strategy

SNPbox automates the primer design for a number of well-defined genomic sequences or objects. These objects are the starting points to define targets for which Primer3 will design primers within a frame f of 70 bp 5’ and 3‘ of the target. The default length of a target is 450 bp but can be changed if required. In case an object is less than the optimal target length, it is first symmetrically extended 5’ and 3’ till the optimal length is reached (indicated as e). When the object is larger than the optimal target length, multiple overlapping targets are defined.


When SNPbox encounters repeat sequences while selecting suitable target sequences, they can be included in the target depending on their size, nature and distance to the object. Repeats that are included should be less than 300 bp and belong to the interspersed repeat class. Polymorphic repeats are excluded from a target sequence since these sequences often result in problematic sequence reads. Also, SNPbox is not allowed to design primers within repeat sequences.

3 modules

SNPbox holds 3 modules for automated primer design: a SNP module, an exon module and a saturation module. Primer design is always related to a genomic sequence that upon first use is masked for repeats using RepeatMasker and an adapted version of Sputnik. This adapted version is used to detect micro satellite repeats and single base stretches and its output is arranged in 3 classes: repeats with less then 8 repeat units, repeats with 8 or more repeat units and single base stretches of at least 8 identical bases.
The SNP module allows primer design for the validation of public SNPs. In order to map public SNPs on a given sequence, the genomic DNA is aligned to a database containing the HGVbase SNPs. SNPs found within a region of 300 bp are joined into one object.
In the exon module, coding sequences are identified within a genomic sequence by aligning cDNA and/or EST sequences using the program Spidey. In the object definition, exons are symmetrically extended by 50 bp on both sides to include the branch point and the splicing sites. In case an excluded repeat sequence is near an exon, the exon can be extended by 25 bp or not at all. In this module, objects within a region of 250 bp are joined into one object.
In the saturation module, the objects are the parts of the genomic sequence between the excluded. Targets are defined with a default overlap of 35 bp. Since the length of an object is not necessarily a multifold of the optimal target length, SNPbox aims to design targets approaching the optimal target length as close as possible.

output

The output of SNPbox consists of an HTML page with a graphical representation of the annotated genomic sequence and hyperlinks to a variety of files, allowing easy inspection of data. A tab-delimited file contains the primer sequences, genomic position and PCR amplification conditions. Also GC content is calculated per frame of 50 bp of the amplicon, that translates in a value for average GC content, and a minimal and a maximal GC content.