Documentation
SNPbox is a software tool for large scale standardized primer design. Primer conditions are chosen in such a way that PCR amplifications are uniform facilitating the use of high-throughput genetic platforms. In the primer selection process, repeat regions are taken into account.
basic strategy
SNPbox automates the primer design for a number of well-defined genomic sequences or objects. These objects are the starting points to define targets for which Primer3 will design primers within a frame f of 70 bp 5’ and 3‘ of the target. The default length of a target is 450 bp but can be changed if required. In case an object is less than the optimal target length, it is first symmetrically extended 5’ and 3’ till the optimal length is reached (indicated as e). When the object is larger than the optimal target length, multiple overlapping targets are defined.
When SNPbox encounters repeat sequences while selecting suitable target sequences, they can be included in the target depending on their size, nature and distance to the object. Repeats that are included should be less than 300 bp and belong to the interspersed repeat class. Polymorphic repeats are excluded from a target sequence since these sequences often result in problematic sequence reads. Also, SNPbox is not allowed to design primers within repeat sequences.
3 modules
SNPbox holds 3 modules for automated
primer design: a SNP module, an exon module and a saturation module. Primer
design is always related to a genomic sequence that upon first use is masked
for repeats using RepeatMasker and an adapted version of Sputnik. This adapted
version is used to detect micro satellite repeats and single base stretches
and its output is arranged in 3 classes: repeats with less then 8 repeat units,
repeats with 8 or more repeat units and single base stretches of at least 8
identical bases.
The SNP module allows primer design for the validation of public SNPs. In order
to map public SNPs on a given sequence, the genomic DNA is aligned to a database
containing the HGVbase SNPs. SNPs found within a region of 300 bp are joined
into one object.
In the exon module, coding sequences are identified within a genomic sequence
by aligning cDNA and/or EST sequences using the program Spidey. In the object
definition, exons are symmetrically extended by 50 bp on both sides to include
the branch point and the splicing sites. In case an excluded repeat sequence
is near an exon, the exon can be extended by 25 bp or not at all. In this module,
objects within a region of 250 bp are joined into one object.
In the saturation module, the objects are the parts of the genomic sequence
between the excluded. Targets are defined with a default overlap of 35 bp. Since
the length of an object is not necessarily a multifold of the optimal target
length, SNPbox aims to design targets approaching the optimal target length
as close as possible.
output
The output of SNPbox consists of
an HTML page with a graphical representation of the annotated genomic sequence
and hyperlinks to a variety of files, allowing easy inspection of data. A tab-delimited
file contains the primer sequences, genomic position and PCR amplification conditions.
Also GC content is calculated per frame of 50 bp of the amplicon, that translates
in a value for average GC content, and a minimal and a maximal GC content.