Difference between revisions of "General Information/IS Identification"

From TnPedia
Jump to navigation Jump to search
Line 17: Line 17:
 
==Bibliography==
 
==Bibliography==
 
<references />
 
<references />
<br />
 
<br />
 

Revision as of 13:36, 30 April 2020

Fig.1.5.1. Results of Markov Cluster (MCL) Analysis. Each circle represents an individual IS transposase amino acid sequence. a) Inflation factor of 1.2, score >30 (links with scores of less than 30 were removed). IS1 family: blue circles. IS1595 family: green circles. b) The inflation factor of 2 increases stringency and separates each major group into groups. The IS1 family generated the ISMhu11 group (magenta). The IS1595 family separated into four groups: IS1595 (green), IS1016 (yellow), ISPna2 (blue-green), and ISH4 (light blue). c) Gradually reducing the weakest links between groups (Inflation factor 2, score >140) further divided the IS1595 family into 4 additional groups (IS1595, ISSod1, ISNwi1, and ISNha5).


The families in ISfinder are defined using an initial manual BLAST analysis often followed by reiterative BLAST analyses with the primary transposase sequence of representative elements used as a query in a BLASTP[1] search of microbial genomes. Potential full-length Tpases are retained and that with the lowest score then used as a query in a second BLASTP search. This is continued until no new potential candidates are detected. The ClustalW multiple alignment algorithm[2] is then used and the results displayed using the Jalview alignment editor[3] for assessment.

The corresponding DNA together with 1000 base pairs up- and down-stream is then extracted and examined manually for the IRs or other typical features such as secondary structures and flanking DRs. This, together with a comparison of the DNA extremities of various elements, allows identification of both ends of the collected elements. In cases where more than a single IS copy is identified, BLASTN can be used to define the IS ends. Where only a single copy is found, the ends can often be defined by identifying and comparing it with empty sites.

In a second step, we use the Markov Cluster Algorithm (MCL) (http://micans.org/mcl/)[4] to weigh the relationships between clusters of ISs and to validate prior ISfinder classification of ISs into families and subgroups. This is explained in detail in [5] and is based on the parameters used in the MCL (Fig.1.5.1) in addition to characteristics such as the specificity of target site duplications, the detailed sequence of the ends, genetic organization.

It should be understood that the distinction between families and subgroups can evolve as the number of ISs in the database increases. Several semi-automatic IS annotation pipelines are now available. The interested reader is directed to three of these: ISsaga[6] which is now integrated into the ISfinder platform[7], ISScan[8] and Oasis[9]. At present, de novo prediction of ISs is not efficient and these pipelines all employ the ISfinder database to function. While all three pipelines permit identification of IS fragments as well as full length ISs, a certain level of manual assessment is essential.



Bibliography

  1. <pubmed>2231712</pubmed>
  2. <pubmed>7984417</pubmed>
  3. <pubmed>14960472</pubmed>
  4. <pubmed>11917018</pubmed>
  5. <pubmed>19286454</pubmed>
  6. <pubmed>21443786</pubmed>
  7. <pubmed>16381877</pubmed>
  8. <pubmed>17686783</pubmed>
  9. <pubmed>22904081</pubmed>