General Information/IS Identification
The families in ISfinder are defined using an initial manual BLAST analysis often followed by reiterative BLAST analyses with the primary transposase sequence of representative elements used as a query in a BLASTP[1] search of microbial genomes. Potential full-length Tpases are retained and that with the lowest score then used as a query in a second BLASTP search. This is continued until no new potential candidates are detected. The ClustalW multiple alignment algorithm[2] is then used and the results displayed using the Jalview alignment editor[3] for assessment.
The corresponding DNA together with 1000 base pairs up- and down-stream is then extracted and examined manually for the IRs or other typical features such as secondary structures and flanking Direct Repeats (DRs). This, together with a comparison of the DNA extremities of various elements, allows identification of both ends of the collected elements. In cases where more than a single IS copy is identified, BLASTN can be used to define the IS ends. Where only a single copy is found, the ends can often be defined by identifying and comparing it with empty sites.
In a second step, we use the Markov Cluster Algorithm (MCL) (http://micans.org/mcl/)[4][5] to weigh the relationships between clusters of ISs and to validate prior ISfinder classification of ISs into families and subgroups. This is explained in detail in [6] and is based on the parameters used in the MCL (Fig.5.1) in addition to characteristics such as the specificity of target site duplications, the detailed sequence of the ends, genetic organization.
It should be understood that the distinction between families and subgroups can evolve as the number of ISs in the database increases. Several semi-automatic IS annotation pipelines are now available. The interested reader is directed to three of these: ISsaga[7] which is now integrated into the ISfinder platform[8], ISScan[9] and Oasis[10]. At present, de novo prediction of ISs is not efficient and these pipelines all employ the ISfinder database to function. While all three pipelines permit identification of IS fragments as well as full length ISs, a certain level of manual assessment is essential.
Contents
- 1 The logic behind the TEs nomenclature and naming attribution
- 2 Bioinformatics approaches for TE identification and annotation in prokaryotes genomes
- 3 Eukaryotic TE nomenclature and naming attribution
- 4 Bibliography
The logic behind the TEs nomenclature and naming attribution
"Why should researchers contemplate naming a newly identified TE?"
"Mainly to name it, so the discoverer and other researchers can specifically refer to it.
This is becoming increasingly important as our understanding about the influence of TEs upon their hosts becomes more apparent"[11].
Prokaryotic TE nomenclature and naming attribution
Transposable elements were discovered by Barbara McClintock during experiments conducted in 1944 on maize. However, her discovery was met with less than enthusiastic reception by the genetic community [12] (for a full detailed history, please see Evelyn Fox Keller's acclaimed biography: A Feeling for the Organism, 10th Aniversary Edition: The Life and Work of Barbara McClintock).
Only some decades later her discovery was brought back to life after the Szybalski group which in the early 1970s discovered the bacterial insertion sequences [12][13]. After that discovery, the picture of transposable elements started to change.
A committee assembled during the meeting on DNA Insertions at Cold Spring Harbor in 1976 proposed a set of rules to be used for the nomenclature of Transposable Elements [14]. The first attempt to create an concise nomenclature system for Transposable Elements started at Stanford University by Campbell and colleagues in 1979 [15]. They classified the prokaryotes elements as simple IS elements to more complex Tn transposons, and self-replicating episomes. In addition, definitions and nomenclature rules for these three classes of prokaryotic TEs were specified [15] (Table 1). ISs and TEs were named separately by having IS and Tn as a prefix, respectively, followed by a sequential number in italics such as IS1, IS2 and Tn1, Tn2, etc [16].
TE nomenclature rules defined in 1979
The allocation of numbers and database administration was carried out by the late Dr Esther Lederberg from Stanford University Medical School, CA, USA [17]. Lists for the registry of Tn number allocations were subsequently published [18]. The allocation of Tn numbers stopped with the retirement of Dr Lederberg and gradually a variety of rules were adopted for naming newly discovered transposons [17]. At the same time new types of transposable elements, such as the mobilizable and conjugative transposons, were being discovered [17]. Additionally, interactions between different elements including transposition and/or recombination events led to novel chimeric transposons. These exacerbated the nomenclature problem [17]. Subsequent nomenclature systems have become complicated, with different systems being adopted for related elements by different research groups [17]. Therefore, Robert and colleagues in 2008 proposed a new version of the early nomenclature system, but not including non-autonomous elements (such as integron cassettes and MITEs) [17] (Table 2).
Revised Nomenclature for Transposable Genetic Elements proposed at 2008
a; not all reported elements have been shown to be mobile
Following Roberts et al [19] classification system, the ISfinder team (Mick Chandler and Patricia Siguier) and Jacques Mahillon proposed a enhancement for this classification system, now including terms and definitions for MICs (Mobile Insertion Cassette), composed of passenger genes but no transposase, and MITES (Miniature Inverted repeat Transposable Element), which are short IS-related elements with no internal open reading frame but which, like most ISs, transposons and MICs, include IS-like extremities. It is important to note that a single IS might in principle be represented in all four forms (Table 3).
IS classification system proposed at 2008
Insertion Sequences nomenclature and naming attribution
Nowadays the IS nomenclature and naming attribution is controlled by ISfinder database team [20]. They adopted a nomenclature similar to that of restriction enzymes [20]. Although this is not perfect, since some ISs are found in different species or even in different genera, the system is viable and has the advantage of indicating the host species rather than being confronted with a long series of numbers in the names as in the original nomenclature system [21][20]. The original Campbell classification system assign blocks of IS numbers to individual scientists, groups or institutions[21]. The ISfinder database includes a listing of these original ISs numbers together with the last known address of the attribution [20] (See ISfinder page). Moreover, ISfinder includes an online form for registering (requesting a name for) new ISs [20] (See ISfinder form). Several journals now suggest that authors register their new IS elements with ISfinder before publication[20]. Since a unique name can be attributed, this avoids some of the confusion in the literature[20].
Tn nomenclature and naming attribution
The Tn nomenclature and naming attribution is controlled by "The Transposon Registry" database [22]. The Transposon Registry is a nomenclature system for the assignment of Tn numbers for bacterial and archaeal autonomous TEs, including unit transposons, composite transposons, conjugative transposons (CTns)/Integrative Conjugative Elements (ICEs), Mobilisable transposons (MTns)/Integrative mobilizable elements (IMEs) and mobile genomic islands[22]. It excludes ISs, which are managed by ISfinder database and other TEs such as introns and inteins for which other databases already exist, and non-autonomous TEs such as integron cassettes and MITES [22].
Bioinformatics approaches for TE identification and annotation in prokaryotes genomes
Most genome annotation pipelines developed to date are dedicated to the prediction and characterization of coding regions and their putative products, signal peptides, pseudogenes, and noncoding RNAs. Surprisingly, annotation of ubiquitous genomic features such as Mobile Genetic Elements (MGEs) is generally not fully addressed and/or is neglected in most of these pipelines, and thus MGEs are seldom well-characterized and annotated [23].
To begin an analysis of the MGE content of a given genome, it is first necessary to identify and annotate ordinary genome features such as coding sequences (CDSs), rRNAs, and tRNAs [23] . For MGE detection and annotation, it is essential to use a variety of approaches including additional specialized pipelines and software which are normally not implemented in a regular annotation pipeline [23]. Therefore, it is not a simple task and requires different approaches depending on the level of analysis. A recent review summarizes the currently available software used for prokaryotic genomes TEs annotation (reviewed in [23]).
Eukaryotic TE nomenclature and naming attribution
TEs have been found in virtually all eukaryotic species investigated so far, displaying an extreme diversity, revealed by thousands or even tens of thousands of different TE families[24]. Finnegan and colleagues in 1989, proposed the first TE classification system, which distinguished two classes by their transposition intermediate: RNA (Class I or retrotransposons) or DNA (Class II or DNA transposons)[24][25]. The transposition mechanism of Class I is commonly called ‘copy-and-paste’, and that of Class II, ‘cut-and-paste’ [25]. Later, in 2007, Wicker and colleagues proposed a common TE classification system that can be easily handled by non-specialists during genome and TEs annotation procedures[26]. This system provided a consensus between the various conflicting classification and naming systems that are currently in use[24]. A key component of this system is a naming convention: a three-letter code with each letter respectively denoting class, order and superfamily; the family (or subfamily) name; the sequence (database accession number) on which the element was found; and the ‘running number’, which defines the individual insertion in the accession. The unified system is also intended to facilitate comparative and evolutionary studies on TEs from different species [26] .
Nowadays, the Wicker TE classification system is well-recognized by the scientific community and is currently in use by the TE annotation pipelines.
Bibliography
- ↑ <pubmed>2231712</pubmed>
- ↑ <pubmed>7984417</pubmed>
- ↑ <pubmed>14960472</pubmed>
- ↑ Van Dongen S. A cluster algorithm for graphs. Technical Report INS-R0010, National Research Institute for Mathematics and Computer Science in the Netherlands Amsterdam. 2000;
- ↑ <pubmed>11917018</pubmed>
- ↑ <pubmed>19286454</pubmed>
- ↑ <pubmed>21443786</pubmed>
- ↑ <pubmed>16381877</pubmed>
- ↑ <pubmed>17686783</pubmed>
- ↑ <pubmed>22904081</pubmed>
- ↑ Tansirichaiya S, Rahman MA, Roberts AP . The Transposon Registry. - Mob DNA: 2019, 10;40 [PubMed:31624505] [DOI] </nowiki>
- ↑ 12.0 12.1 </nowiki>
- ↑ <pubmed>4567155</pubmed>
- ↑ </nowiki>
- ↑ 15.0 15.1 Campbell A, Berg DE, Botstein D, Lederberg EM, Novick RP, Starlinger P, Szybalski W . Nomenclature of transposable elements in prokaryotes. - Gene: 1979 Mar, 5(3);197-206 [PubMed:467979] [DOI] </nowiki>
- ↑ </nowiki>
- ↑ 17.0 17.1 17.2 17.3 17.4 17.5 </nowiki>
- ↑ <pubmed>3036649</pubmed>
- ↑ Roberts AP, Chandler M, Courvalin P, Guédon G, Mullany P, Pembroke T, Rood JI, Smith CJ, Summers AO, Tsuda M, Berg DE . Revised nomenclature for transposable genetic elements. - Plasmid: 2008 Nov, 60(3);167-73 [PubMed:18778731] [DOI] </nowiki>
- ↑ 20.0 20.1 20.2 20.3 20.4 20.5 20.6 </nowiki>
- ↑ 21.0 21.1 </nowiki>
- ↑ 22.0 22.1 22.2 Tansirichaiya S, Rahman MA, Roberts AP . The Transposon Registry. - Mob DNA: 2019, 10;40 [PubMed:31624505] [DOI] </nowiki>
- ↑ 23.0 23.1 23.2 23.3 </nowiki>
- ↑ 24.0 24.1 24.2 </nowiki>
- ↑ 25.0 25.1 Finnegan DJ . Eukaryotic transposable elements and genome evolution. - Trends Genet: 1989 Apr, 5(4);103-7 [PubMed:2543105] [DOI] </nowiki>
- ↑ 26.0 26.1 </nowiki>