General Information/IS Identification
The families in ISfinder are defined using an initial manual BLAST analysis often followed by reiterative BLAST analyses with the primary transposase sequence of representative elements used as a query in a BLASTP[1] search of microbial genomes. Potential full-length Tpases are retained and that with the lowest score then used as a query in a second BLASTP search. This is continued until no new potential candidates are detected. The ClustalW multiple alignment algorithm[2] is then used and the results displayed using the Jalview alignment editor[3] for assessment. The corresponding DNA together with 1000 base pairs up- and down-stream is then extracted and examined manually for the IRs or other typical features such as secondary structures and flanking DRs. This, together with a comparison of the DNA extremities of various elements, allows identification of both ends of the collected elements. In cases where more than a single IS copy is identified, BLASTN can be used to define the IS ends. Where only a single copy is found, the ends can often be defined by identifying and comparing it with empty sites.
In a second step, we use the Markov Cluster Algorithm (MCL) (http://micans.org/mcl/)[4] to weigh the relationships between clusters of ISs and to validate prior ISfinder classification of ISs into families and subgroups. This is explained in detail in [5] and is based on the parameters used in the MCL (Fig.1.5.1) in addition to characteristics such as the specificity of target site duplications, the detailed sequence of the ends, genetic organization. It should be understood that the distinction between families and subgroups can evolve as the number of ISs in the database increases. Several semi-automatic IS annotation pipelines are now available. The interested reader is directed to three of these: ISsaga[6] which is now integrated into the ISfinder platform[7], ISScan[8] and Oasis[9]. At present, de novo prediction of ISs is not efficient and these pipelines all employ the ISfinder database to function. While all three pipelines permit identification of IS fragments as well as full length ISs, a certain level of manual assessment is essential.
Bibliography