Difference between revisions of "IS Families/IS200-IS605 family"

From TnPedia
Jump to navigation Jump to search
Line 1: Line 1:
===Introduction to IS200/IS256 family===
The IS''200''/IS''605'' family members transpose using obligatory single strand (ss) DNA intermediates<ref><nowiki><pubmed>26104715</pubmed></nowiki></ref> by a mechanism called “'''peel and paste'''”. They differ fundamentally in organization from classical IS. They have sub-terminal palindromic structures rather than terminal IRs  ([[:File:Fig. IS200.1.png|Fig. IS200.1]]) and insert 3’ to specific AT-rich tetra- or penta-nucleotides without duplicating the target site.
The IS''200''/IS''605'' family members transpose using obligatory single strand (ss) DNA intermediates<ref><nowiki><pubmed>26104715</pubmed></nowiki></ref> by a mechanism called “'''peel and paste'''”. They differ fundamentally in organization from classical IS. They have sub-terminal palindromic structures rather than terminal IRs  ([[:File:Fig. IS200.1.png|Fig. IS200.1]]) and insert 3’ to specific AT-rich tetra- or penta-nucleotides without duplicating the target site.
[[Image:Fig. IS200.1.png|thumb|center|580x580px|'''Fig. IS200.1.''' Genetic organization. '''Left''' (LE) and '''right''' (RE) ends carrying the subterminal hairpin (HP) are presented as red and blue boxes, respectively. Left and right cleavage sites (CL and CR) are presented as black and blue boxes respectively, where the black box also represents element-specific tetra-/pentanucleotide target site (TS). The cleavage positions are indicated by small vertical arrows. Gray arrows: ''tnpA'' and ''tnpB'' open reading frames (orfs); '''(i)''' IS''200'' group with ''tnpA'' alone; '''(ii)''' to '''(iv)''' IS''605'' group with ''tnpA'' and ''tnpB'' in different configurations; '''(v)''' IS''1341'' group with ''tnpB'' alone.|alt=]]
[[Image:Fig. IS200.1.png|thumb|center|580x580px|'''Fig. IS200.1.''' Genetic organization. '''Left''' (LE) and '''right''' (RE) ends carrying the subterminal hairpin (HP) are presented as red and blue boxes, respectively. Left and right cleavage sites (CL and CR) are presented as black and blue boxes respectively, where the black box also represents element-specific tetra-/pentanucleotide target site (TS). The cleavage positions are indicated by small vertical arrows. Gray arrows: ''tnpA'' and ''tnpB'' open reading frames (orfs); '''(i)''' IS''200'' group with ''tnpA'' alone; '''(ii)''' to '''(iv)''' IS''605'' group with ''tnpA'' and ''tnpB'' in different configurations; '''(v)''' IS''1341'' group with ''tnpB'' alone.|alt=]]

Revision as of 18:35, 2 June 2020

Introduction to IS200/IS256 family

The IS200/IS605 family members transpose using obligatory single strand (ss) DNA intermediates[1] by a mechanism called “peel and paste”. They differ fundamentally in organization from classical IS. They have sub-terminal palindromic structures rather than terminal IRs (Fig. IS200.1) and insert 3’ to specific AT-rich tetra- or penta-nucleotides without duplicating the target site.

Fig. IS200.1. Genetic organization. Left (LE) and right (RE) ends carrying the subterminal hairpin (HP) are presented as red and blue boxes, respectively. Left and right cleavage sites (CL and CR) are presented as black and blue boxes respectively, where the black box also represents element-specific tetra-/pentanucleotide target site (TS). The cleavage positions are indicated by small vertical arrows. Gray arrows: tnpA and tnpB open reading frames (orfs); (i) IS200 group with tnpA alone; (ii) to (iv) IS605 group with tnpA and tnpB in different configurations; (v) IS1341 group with tnpB alone.

The transposase, TnpA, is a member of the HUH enzyme superfamily (Relaxases, Rep proteins of RCR plasmids/ss phages, bacterial and eukaryotic transposases of IS91/ISCR and Helitrons[2][3] (Fig. IS200.2) which all catalyze cleavage and rejoining of ssDNA substrates.

Fig. IS200.2. The IS200/IS605 family transposases are “minimal” and the smallest transposases presently know. They include the HUH and Y motifs and use Y as the attacking nucleophile to generate 5’ phosphotyrosine covalent intermediates. HUH transposases from other transposon families include additional domains.

IS200, the founding member (Fig. IS200.3), was identified 30 years ago in Salmonella typhimurium[4] but there has been renewed interest for these elements since the identification of the IS605 group in Helicobacter pylori[5][6][7]. Studies of two elements of this group, IS608 from H. pylori and ISDra2 from the radiation resistant Deinococcus radiodurans, have provided a detailed picture of their mobility [8][9][10][11][12][13][14].

Fig. IS200.3. Top: IS200 Secondary structures in LE (red) and RE (blue), promoter (pL), ribosome binding site (RBS), and tnpA start and stop codons (AUG and UAA) are indicated. (i) DNA top strand with perfect palindromes at LE and RE in red and blue, interior stem-loop in black, (ii) RNA stem-loop structure in transcript originated from pL. Bottom: tnpA transcription originates at about nt 40 but promoter elements are not defined; the ‘left end’ contains two internal inverted repeats (opposing arrows), one of which acts as a transcription terminator (nts 12–34). The second, (nts 69–138) in the 5’UTR of the tnpA mRNA sequesters the Shine-Dalgarno sequence. IS200 in Salmonella also expresses a 90 nt sRNA (asRNA, art200, or STnc490) perfectly complementary to the 5’UTR and the first three codons of tnpA. The transcription start site and 3’ end for art200 in Salmonella (derived from RNA-Seq experiments) are shown but promoter elements were not previously defined.

Distribution and Organization

The family is widely distributed in prokaryotes with more than 153 distinct members (89 are distributed over 45 genera and 61 species of eubacteria, and 64 are from archaea). It is divided into three major groups based on the presence or absence and on the configuration of two genes: the transposase tnpA, sufficient to promote IS mobility in vivo and in vitro and tnpB of unknown function which is not required for transposition activity (Fig. IS200.1). These groups are: IS200, IS605 and IS1341. TnpB is also present in anther IS family, IS607, which uses a serine-recombinase as a transposase. In the phylogeny of this group (Fig. IS200.4) of IS, both tnpB and tnpA of bacterial or archaeal origin are intercalated suggesting some degree of horizontal transfer between these two groups of organisms[15].

Fig. IS200.4. (A) Phylogeny-based on tnpB of the IS200/IS605/IS607 family. (B) Phylogeny-based on tnpA of the IS607 family (serine recombinase). (C) Phylogeny-based on tnpA of the IS605 family (HUH transposase). IS608 elements are underlined, single orfB elements are indicated between brackets, and the asterisk indicates the mosaic construction of the elements of this family (see the text). The various Archaea have been color-coded as follows for clarity: Sulfolobales, red; Thermoplasmatales, magenta; halophiles, green; methanogens, blue; “other,” orange. Bacteria are indicated in black.

Isolated copies of IS200-like tnpA can be identified in both bacteria and archaea[16] but, except for a partial copy in Natronomonas pharaonis (NP4630A), are limited in the archaea to the methanogens (ISMma21, ISMba16, and ISMba18). Full length copies of IS605-like elements are also found in bacteria and several archaea and all have corresponding MITEs (Miniature Inverted repeat Transposable Elements) derivatives in their host genomes.

The IS200 group

IS200 group members encode only tnpA, and are present in gram-positive and gram-negative bacteria and certain archaea[17][18] (Fig. IS200.1 and Fig. IS200.3). Alignment of TnpA from various members shows that they are highly conserved but may carry short C-terminal tails of variable length and sequence.

They can occur in relatively high copy number (e.g. >50 copies of IS1541 in Yersinia pestis) and are among the smallest known autonomous IS with lengths generally between 600-700 pb. Some members such as ISW1 (from Wolbachia sp.) or ISPrp13 (from Photobacterium profundum) are even shorter.

IS200 was initially identified as an insertion mutation in the Salmonella typhimurium histidine operon[19]. It is abundant in different Salmonella strains and has now also been identified in a variety of other enterobacteria such as Escherichia, Shigella and Yersinia.

Different enterobacterial IS200 copies have almost identical lengths of between 707 and 711bp. Analysis of the ECOR (E. coli) and SARA (Salmonellae) collections showed that the level of sequence divergence between IS200 copies from these hosts is equivalent to that observed for chromosomally encoded genes from the same taxa[20][21]. This suggests that IS200 was present in the common ancestor of E. coli and Salmonellae.

In spite of their abundance, an enigma of IS200 behavior is its poor contribution to spontaneous mutation in its original Salmonella host: only very rare insertion events have been documented[22]. One reason for these rare insertions could be due to poor expression of the tnpAIS200 gene from a weak promoter pL identified at the left IS end (LE)[23][24] (Fig. IS200.3).

Besides the characteristic major subterminal palindromes[25] presumed binding sites of the transposase at both LE and the right end (RE) (Substrate recognition), IS200 carries also a potential supplementary interior stem-loop structure (Fig. IS200.3). These two structures play a role in regulating IS200 gene expression. The first (perfect palindrome at LE; nts 12–34) overlaps the tnpAIS200 promoter pL, can act as a bi-directional transcription terminator upstream of tnpAIS200 and terminates up to 80% of transcripts[26] (Fig. IS200.3). The second (interior stem-loop; nts 69–138) (Fig. IS200.3), at the RNA level, can repress mRNA translation by sequestration of the Ribosome Binding Site (RBS) ((Fig. IS200.3). Experimental data suggested that the stem-loop is formed in vivo and its removal by mutagenesis caused up to a 10 fold increase in protein production[27]. Recent deep sequencing analysis revealed another aspect in post-transcriptional regulation of IS200 expression: A small anti-sense RNA (asRNA) IS200 transposase expression ((Fig. IS200.3) was identified as a substrate of Hfq, an RNA chaperone involved in post-transcriptional regulation in numerous bacteria[28]. Interestingly, asRNA and Hfq independently inhibit IS200 transposase expression: knock-out of both components resulted in a synergistic increase in transposase expression. Moreover, footprint data showed that Hfq binds directly to the 5’ part of the transposase transcript and blocks access to the RBS[29].

In spite of its very low transposition activity, an increase in IS200 copy number was observed during strain storage in stab cultures[30][31]. However, the factors triggering this activity remain unknown[32]. Transient high transposase expression leading to a burst of transposition was proposed to explain the observed high IS200 (>20) copy number in various hosts and in stab cultures[33].

Although regulatory structures similar to that observed in IS200 (Fig. IS200.3) were predicted in IS1541, another member of this group with 85% identity to IS200, this element can be detected in higher copy number (> 50) in Salmonella and Yersinia genomes. However, no detailed analysis of its transposition is available and since no de novo insertions have been experimentally documented and chromosomal copies appear stable in Y. pestis[34], it remains possible that IS1541 also behaves like IS200.

However, the regulatory structures are not systematically present in other IS200 group members and understanding of control of transposase synthesis requires further study.

The IS605 group

IS605 group members are generally longer (1.6-1.8 kb) due to the presence of a second orf, tnpB in addition to tnpA. Alignment of TnpA copies from this group indicated that although they do not form a separate clade from the IS200 group TnpA, they generally carry the short C-terminal tail. The tnpA and tnpB orfs exhibit various configurations with respect to each other. They may be divergent (Fig. IS200.1 i top: e.g. IS605, IS606) or expressed in the same direction with tnpA upstream of tnpB. In these latter cases, the orfs may be partially overlapping (Fig. IS200.1 ii; e.g. IS608, ISDra2) or separate (Fig. IS200.1 iii; e.g. ISSCpe2, ISEfa4). tnpB is also sometimes associated with another transposase, a member of the S-transposases (e.g. IS607[35][36], see [37]. TnpB was not required for transposition of either IS608 or ISDra2.

Three related IS, IS605, IS606 and IS608 (Fig. IS200.1) have been identified in numerous strains of the gastric pathogen Helicobacter pylori[38][39]. IS605 is involved in genomic rearrangements in various H. pylori isolates[40].

The H. pylori elements transpose in E. coli at detectable frequencies in a standard "mating-out" assay using a derivative of the conjugative F plasmid as a target [41][42].

The two best characterized members of this family are IS608 and the closely related ISDra2 from Deinococcus radiodurans. Both have overlapping tnpA and tnpB genes (Fig. IS200.1 ii). Like other family members, insertion is sequence-specific: IS608 inserts in a specific orientation with its left end 3’ to the tetranucleotide TTAC both in vivo and in vitro[43] while ISDra2 inserts 3’ to the pentanucleotide TTGAT[44]. Interestingly ISDra2 transposition in its highly radiation resistant Deinococcal host is strongly induced by irradiation[45] (Single strand DNA in vivo). Their detailed transposition pathway has been deciphered by a combination of in vivo studies and in vitro biochemical and structural approaches (Mechanism of IS200/IS605 single strand DNA transposition).

The IS1341 group

Elements of the third group, IS1341, are devoid of tnpA and carry only tnpB (Fig. IS200.1 v). The IS occurs in three copies in Thermophilic bacterium PS3[46]. Multiple presumed full-length elements (including tnpA and tnpB) and closely related copies have been identified in other bacteria such as Geobacillus. On the other hand, IS891 from the cyanobacterium Anabaena is present in multiple copies on the chromosome and is thought to be mobile since a copy was observed to have inserted into a plasmid introduced in the strain[47].

Another isolated tnpB-related gene, gipA, present in the Salmonella Gifsy-1 prophage may be a virulence factor since a gipA null mutation compromised Salmonella survival in a Peyer's patch assay[48]. While no mobility function has been suggested for gipA, it is indeed bordered by structures characteristic of IS200/IS605 family ends and closely related to E. coli ISEc42.

In spite of their presence in multiple copies, it is still unclear whether IS1341 group members are autonomous IS or products of IS605 group degradation and require TnpA supplied from a related IS in the same cell for transposition.

IS decay

Circumstantial evidence based on analysis of the ISfinder database suggests that IS carrying both tnpA and tnpB genes may be unstable. Thus, although members of the IS200 group are often present in high copy number in their host genomes, intact full-length IS605 group members are invariably found in low copy number (P. Siguier, unpublished) (See also TnpB). On the other hand, various truncated IS605 group derivatives appear quite frequently. These forms seem to result from successive internal deletions and retain intact LE and RE copies. Sometimes, as in the case of ISSoc3, orf inactivation appears to have occurred by successive insertion/deletion of short sequences (indels) generating frameshifts and truncated proteins. For some IS (e.g. ISCco1, ISTel2, ISCysp14, ISSoc3) degradation can be precisely reconstituted and each successive step validated by the presence of several identical copies (P. Siguier, unpublished). This suggests that the degradation process is recent and that these derivatives are likely mobilised by TnpA supplied in trans by autonomous copies in the genome.

Mechanism of IS200/IS605 single strand DNA transposition

General transposition pathway

The transposition pathway of IS200/IS605 family members is shown in Fig. IS200.5. Much of the biochemistry was elucidated using an IS608 cell-free in vitro system which recapitulates each step of the reaction. This requires purified TnpAIS608 protein, single strand IS608 DNA substrates and divalent metal ions such as Mg2+ or Mn2+[49][50][51]. Similar and complementary results were also obtained with ISDra2[52][53][54]. The reactions are not only strictly dependent on single strand (ss) DNA substrates but are also strand-specific: only the “top” strand (defined as the strand carrying target sequence, TS, 5’ to the IS; Fig. IS200.5 top) is recognized and processed whereas the “bottom” strand is refractory [55][56]. Cleavage of the top strand at the left and right cleavage sites (TS/CL and CR, note that TS is also the left cleavage site CL) (Fig. IS200.5 B) leads to excision as a circular ssDNA intermediate with abutted left and right ends (transposon joint) (Fig. IS200.5 C bottom left). This is accompanied by rejoining of the DNA originally flanking the excised strand (donor joint).

Fig. IS200.5. Top: IS608 organization. The left (LE) and right (RE) ends with a subterminal hairpin (HP) are in red and blue, left and right cleavage sites (CL/TS and CR) are represented by black and blue boxes, respectively. Bottom left: Excision. (A) TnpA activity: top strand (active strand) structures are recognized and cleaved by TnpA (vertical arrows). (B) Upon cleavage, a 5′ phosphotyrosine bond (green cylinder) is formed with LE, and with the RE 3′ flank and 3′-OH (yellow circle) is formed at the left flank and RE. (C) Excision of the IS608 single-strand circle intermediate with abutted LE and RE (RE–LE junction or transposon joint) accompanied by the formation of donor joint retaining the target sequence. Bottom right: Integration. (D) Transposon circle with the transposon joint and target DNA (black) with the target site. (E) TnpA catalyzes the cleavage of transposon joint and single-strand target. (F) Integration.

The transposon joint is then cleaved (Fig. IS200.5 E bottom right) and integrated into a single strand conserved element-specific target sequence (TS) where the left end invariably inserts 3’ to TS (Fig. IS200.5 F). This target specificity is another unusual feature of IS200/IS605 transposition. The target sequence is characteristic of the particular family member and, although it is not part of the IS, it is essential for further transposition because it is also the left end cleavage site CL of the inserted IS[57] (The Single strand Transpososome and Cleavage site recognition) and is therefore intimately involved in the transposition mechanism.

TnpA, Y1 transposases and transposition chemistry

IS200/IS605 family transposases belong to the HUH enzyme superfamily. All contain a conserved amino-acid triad composed of Histidine (H)-bulky hydrophobic residue (U)-Histidine (H)[58] providing two of three ligands required for coordination of a divalent metal ion that localizes and prepares the scissile phosphate for nucleophilic attack. HUH proteins catalyse ssDNA breakage and joining with a unique mechanism. They all catalyse DNA strand cleavage using a transitory covalent 5' phosphotyrosine enzyme-substrate intermediate and release a 3' OH group[59] (Fig. 1.41.1).

The HUH enzyme family also includes other transposases of the IS91/ISCR and Helitron families as well as proteins involved in DNA transactions essential for plasmid/virus rolling circle replication (Rep; not to be confused with the TnpAREP/REP system described in Domestication) and plasmid conjugation (Mob/relaxase) (Groups with HUH Enzymes; Fig. 1.10.1).

IS200/IS605 transposases are single-domain proteins containing a single catalytic tyrosine residue, called Y1 transposase. They use the tyrosine residue (Y127 for IS608) as a nucleophile to attack the phosphodiester link at the cleavage sites (vertical arrows in Fig. IS200.5 A and D). Since cleavages at both IS ends occur on the same strand, the polarity of the reaction implies that the enzyme forms a covalent 5’-phosphotyrosine bond with the IS at LE producing a 3’-OH on the DNA flank and a 5’-phosphotyrosine bond at the RE flank producing a 3’-OH on RE itself (Fig. IS200.5 B). The released 3′-OH groups then act as nucleophiles to attack the appropriate phospho-tyrosine bond resealing the DNA backbone in one case and generating a single strand DNA transposon circle in the other (Fig. IS200.5 C). The same polarity is applied to the integration step (Fig. IS200.5 D, E and F). As an important mechanistic consequence of this chemistry, IS200/IS605 transposition occurs without loss or gain of nucleotides. In vitro, the reaction requires only TnpA and does not require host cell factors.

TnpA overall structure

Crystal structures of Y1 transposases have been determined for three family members: IS608 (TnpAIS608) from Helicobacter pylori [60][61], ISDra2 (TnpAISDra2) from Deinococcus radiodurans[62] and ISC1474 from Sulfolobus solfataricus[63]. In contrast to most characterised HUH enzymes, which are usually monomeric and have two catalytic tyrosines, Y1 transposases form obligatory dimers with two active sites (Fig. IS200.6 A). The two monomers dimerize by merging their β-sheets into one large central β-sheet sandwiched between α-helices. Each catalytic site is constituted by the HUH motif from one TnpA monomer (H64 and H66 in the case of TnpAIS608) and a catalytic tyrosine residue (Y127) located in the C-terminal αD helix tail of the other monomer (Fig. IS200.6 A). This is joined to the body of the protein by a flexible loop (trans configuration, Active site assembly and Catalytic activation and Transposition cycle: the trans/cis rotational model).

Fig. IS200.6. (A) Crystallographic structure of TnpA alone. The two monomers of the TnpA dimer are colored green and orange, respectively. Positions of helix αD and catalytic residues are shown. (B) Costructure TnpA–RE HP22. HP22 is shown in blue. The extrahelical T17 and the T located in the hairpin loop are indicated in red (6). Note that in the TnpA–HP22 co-structure, binding sites for the hairpins are located on the same face of the TnpA dimer whereas the two catalytic sites are formed on the opposite surface (A, C–F).

The TnpA enzyme active sites are believed to adopt two functionally important confor­mations: the trans configuration described above (Fig. IS200.6 A), in which each active site is composed of the HUH motif supplied by one mono­mer with the tyrosine residue supplied by the other, and the cis configuration, in which both motifs are contributed by the same monomer (IS200/IS605 video 1 below; kindly supplied by O. Barabas and Fred Dyda). The trans conformation is active during cleavage where Tyrosine acts as nucleophile whereas the cis conformation is thought to function during strand transfer where the 3’OH is the attacking nucleophile (Transposition cycle: the trans/cis rotational model). Only the trans configuration of TnpAIS608 and TnpAISDra2 has yet been observed crystallographically[64][65] but the existence of the cis configuration is supported by biochemical data [66].

IS200/IS605 video 1

The Single strand Transpososome

The key machinery for transposition is the higher order protein-DNA complex, the transpososome (or synaptic complex) which contains both transposase and two IS DNA ends with or without target DNA. Transpososome formation, stability and the temporal changes in configuration which occur during the transposition cycle have been characterized for TnpAIS608 by crystallographic and biochemical approaches. Although for technical reasons it was not possible to obtain structures with both LE and RE hairpins together, co-crystal structures with either LE or RE showed that a TnpA dimer binds two subterminal DNA hairpins suggesting that it could bind both LE and RE ends simultaneously. Binding sites for the hairpins are located on the same face of the TnpA dimer while the two catalytic sites are formed on the opposite surface (Fig. IS200.6 A and B) (IS200/IS605 video 2 below; kindly supplied by O. Barabas and Fred Dyda). The hairpin forms a distorted helix anchored by base interactions at the foot (IS200/IS605 video 2 below; kindly supplied by O. Barabas and Fred Dyda).

IS200/IS605 video 2

Substrate recognition

A key feature of TnpA is that it is only active on one strand, the “top” strand. The IS608 and ISDra2 ends carry subterminal imperfect hairpins. In addition to specific sequences on the loops, the irregularities on the hairpins help the enzyme to distinguish between “top” and “bottom” strands[67][68]. The initial co-crystal structure was obtained with TnpAIS608 and a 22nt imperfect RE hairpin (HP22) including its characteristic extrahelical T17 located mid-way along the DNA stem (Fig. IS200.6 and Fig. IS200.7). In addition to a number of backbone contacts with HP22, TnpAIS608 also shows several base-specific contacts, in particular with T10 in the loop and the extrahelical T17[69] (Fig. IS200.6 B). Exchange of T10 and neighboring T nucleotides in the loop abolished binding whereas exchange of T17 for an A significantly reduced but did not eliminate binding [70]. Similar studies with TnpAISDra2 showed that it also recognises a similarly located T in the hairpin loop of ISDra2 and that this is essential for binding[71]. Instead of an extrahelical T, ISDra2 LE and RE include a bulge caused by two mismatched nucleotides (G and T) in the hairpin stem. These unpaired nucleotides are specifically recognized and stabilized by the protein. Again, mutation of the T (to C which, in this case, eliminates the bulge to generate a GC base pair in the stem) greatly reduces binding (IS200/IS605 video 3A below; kindly supplied by O.Barabas and Fred Dyda).

Although most members of the IS605 group, which includes IS608 and ISDra2, have imperfect palindromes with extrahelical bases or bulges, some members of the IS200 group (e.g IS200, IS1541) include perfect hairpins. Whether base-specific interactions with the loop sequence is exclusively responsible for strand-specific activity of the corresponding transposase remains to be clarified.

IS200/IS605 video 3A
Cleavage site recognition

The left (CL/TS) and right (CR) IS608 cleavage sites (TTACl and TCAAl respectively, where l represents the point of cleavage) are located some distance from the subterminal recognition hairpins (19 nt at LE and 10 nt at RE) (Fig. IS200.7). The system is asymmetric because the two distinct cleavage sites are separated from the hairpins by linkers of different lengths and the CL/TS sequence does not form part of IS while CR does.

Fig. IS200.7. Canonical and noncanonical base interactions in (A) left end (LE) and (B) right end (RE). LE and RE (red and blue). Cleavage sequences CL or CR (black or dark blue boxes); guide sequences GL and GR pink or light blue, respectively. Two nucleotides at the 3′ foot of HPL, R involved in triplet formation are highlighted by bold and in a black frame. LE and RE and the base paring within HPL and HPR are shown. Insets show interactions between cleavage and guide sequences. Filled lines: canonical base interactions, dotted lines: additional noncanonical base interactions.

Structural studies revealed that the cleavage sites are recognised in a unique way which does not involve direct sequence recognition by TnpA. Instead, an internal part of the IS sequence is co-opted to recognize different cleavage sites allowing TnpA to catalyze both excision and integration of the element with a single DNA binding domain.

Internal transposon sequences, the left (GL) and right (GR) tetranucleotide guide sequences, AAAG and GAAT, located 5’ to the foot of the hairpins (Fig. IS200.7), recognize their respective cleavage sites by direct base interactions. These GL/CL and GR/CR interactions involve 3 of the 4 nt of GL and GR. They include both canonical Watson-Crick interactions and in the case of RE, non-canonical interactions resulting in base triplets (Fig. IS200.7 and Fig. IS200.8, bases joined by both regular and dotted lines respectively). In the case of LE and the transposon joint, base triples (dotted lines) are suggested from biochemical data[72] (IS200/IS605 video 3B below; kindly supplied by O. Barabas and Fred Dyda).

Fig. IS200.8. Structure of the co-complex TnpAIS608–RE35 adapted from reference 8 showing the active site and the base pairs between CR (TCAA, dark blue) and GR (GAAT, light blue). The gray sphere is bound Mn2+. Right: Two base triplets observed in the TnpAIS608–RE35 complex.
IS200/IS605 video 3B

These interactions place the scissile phosphate precisely into the two active sites of TnpAIS608 for nucleophilic attack by the catalytic Y127. Interestingly, the base-pairing patterns responsible for cleavage site recognition are similar at LE, RE and the target site in spite of sequence differences (Fig. IS200.7, Fig. IS200.8, Fig. IS200.9). Since TS is also CL, this type of recognition not only explains the requirement for the TS located at the left end of the inserted IS (Fig. IS200.5, Fig. IS200.9) for further transposition, but also the target specificity. Upon integration, TS is presumably recognized by the GL present on the excised transposon joint. Note that the transposon joint contains only the LE guide sequence GL but not the LE cleavage site CL (Fig. IS200.5, Fig. IS200.9).

Fig. IS200.9. Target recognition: single-strand transposon joint (RE–LE junction) and target Ts are presented. For simplicity, only the recognition of the target cleavage site is indicated. LE and RE are shown in red and blue. Cleavage sequences CL or CR are placed in black or dark blue boxes; guide sequences GL and GR are framed in pink and light blue, respectively. Two nucleotides at the 3′ foot of the left and right hairpin structures HPL and HPR involved in triplet formation are highlighted by bold and are in a black frame. Nucleotide sequences of LE and RE and the base paring within HPL and HPR are shown. The inset figures describe the interactions between the cleavage sequences and guide sequences. The filled lines indicate canonical base interactions and the dotted lines indicate additional noncanonical base interactions.

Similar crystal structures were obtained with TnpAISDra2 (see also Single strand DNA in vivo) with a similar interaction network between the guide sequences and cleavage sites. The ISDra2 transpososome is structurally very similar to those of IS608 despite only 34% sequence identity of the transposases. It is important to note that the target sequence in ISDra2 is a pentanucleotide instead of a tetranucleotide as in IS608. The fifth nucleotide in the ISDra2 sequence is however not involved in DNA-DNA interactions but in DNA-protein interaction[73].

The potential cleavage site recognition mode (i.e. the canonical interaction network between CL,R and GL,R) is indeed well conserved throughout the family (Fig. IS200.10).

Fig. IS200.10. Multiple sequence alignment of the cleavage sites and guide sequences using Weblogo was carried out on 38, 43 and 23 members of the IS200 (i), the IS605 (ii), and IS1341 (iii) groups, respectively.

This model has been validated in vitro and in vivo by showing that it is possible to modify cleavage sites by changing corresponding guide sequences. Moreover, in the case of IS608, modifications of GL in the transposon joint generate predictable changes in insertion site specificity of the element[74].

The IS608 recognition system has also been modified to include additional sequences which assist more specific targeting of insertions[75].

Active site assembly and Catalytic activation

Comparison of crystal structures of different TnpA protein-DNA complexes[76][77][78] revealed TnpA in both active and inactive configurations. In both the free TnpAIS608 dimer and TnpAIS608-DNA complexes bound to a “minimal” HP22 hairpin (which does not include the guide sequence), the catalytic tyrosine residue (Y127) points away from the HUH motif (H64 and H66) and therefore cannot act as a nucleophile[79] (Fig. IS200.11). The enzyme is therefore in an inactive conformation. Binding to the appropriate substrate containing the 4 nucleotide guide sequence 5’ to the hairpin foot (compare Fig. IS200.11 left and right) triggers a change in TnpA configuration that permits assembly of functional active sites. A single A (A+18, Fig. IS200.7 and Fig. IS200.8) in the guide sequence present in both GL and GR does not participate in base interactions with the cleavage site. On formation of the CL(R)/GL(R) base interaction network, this single base penetrates the structure and forces the C-terminal αD helix carrying Y127 closer to the HuH motif placing it in the correct position poised for catalysis[80] (compare Fig. IS200.11 left and right; Fig. IS200.12)(IS200/IS605 video 4 below; kindly supplied by O. Barabas and Fred Dyda). This movement also places a third amino acid (Q131 located at the C-terminal end of helix αD on the same face as Y127) in a position enabling it to function in conjunction with both H residues to complete the metal ion binding pocket. This movement is made possible by the fact that the αD helix is attached to the protein body by a flexible loop. This conformational change involving αD helix movement will be discussed below (Transposition cycle: the trans/cis rotational model).

Fig. IS200.11 The presence of the guide sequence AAAG at the foot of IPL results in the movement of helices αD and places tyrosine Y127 in the correct position with respect to the HUH to form the active site.
Fig. IS200.12. (C) Configuration of the active site in the TnpA–RE HP22. HP22 is shown in blue. Note that in A, B and C, TnpA is in the inactive conformation. The arrow shows the presumed rotation of the αD helix to activate the protein. (D) Configuration of the active site in the TnpA–LE HP26 costructure. LE HP26 is shown in red and the 5′ 4-nucleotide extension (GL) in yellow). The base A+18 has displaced Y127 to activate the protein. (Adapted from references 6 and 8.)

IS200/IS605 video 4

Transpososome assembly and stability

Excision requires assembly of a transpososome containing both LE and RE. However, it is technically difficult to generate crystallographically pure complexes of this type. Only crystal structures containing two LE or two RE were obtained. The excision transpososome was initially modelled using information obtained from the IS608LE-TnpA and RE-TnpA structures[81] (Fig. IS200.6 B; Fig. IS200.13). However complexes containing both LE and RE have now been identified using a band shift assay and characterized biochemically[82].

Fig. IS200.13. (E) TnpA–RE35 complex. Interaction of GR-CR (in light and dark blue, respectively) positions the cleavage site within the catalytic site of the protein. (F) Modeled TnpA–LE–RE complex. LE, RE, and flanking sequences in red, blue, and black, respectively.

A TnpA co-complex with either LE or RE can be titrated by addition of increasing quantities of the other end (RE or LE) to obtain a transpososome containing both LE and RE. This can be easily detected in a gel shift assay. Such species proved to be catalytically active since they could be removed from the gel and, when incubated with the essential divalent metal ion, robust reaction products could be detected in a denaturing gel[83].

This approach was used to monitor both transpososome formation and stability using oligonucleotides carrying point mutations in GL,R and CL,R. Robust transpososome formation and cleavage activity requires much of the network of GL,R and CL,R interactions observed in the crystal structures[84] (schematised in Fig. IS200.7). Although base triplets in the original LE co-crystal structure were not detected, since the LE substrate was too short[85], the biochemical data suggested that such interactions probably exist (grey dotted lines in Fig. IS200.7). For example, the two nucleotides 3’ to the foot of the LE hairpin (at equivalent positions to triplet forming bases in RE, Fig. IS200.7 are required for robust synaptic complex formation and cleavage[86]. This further implies that these base triplets might also be involved in target DNA capture (grey dotted lines in Fig. IS200.9).

Base changes in GL resulted in a predictable choice of target sequence[87]. However, large differences in insertion frequencies were observed. The influence of the presumed non-canonical interactions in LE would provide an explanation for this variability since these were not taken into account in the choice of LE guide sequence.

In both IS608 and ISDra2, the extra-helical bases in the hairpin stem and nucleotides in the loop are also important for transpososome formation even in a context which includes both GL,R and CL,R[88][89].

Transposition cycle: the trans/cis rotational model

Transpososome assembly is followed by two critical chemical steps: cleavage and strand transfer. These are thought to be accomplished by a series of large changes in transpososome configuration. A detailed model has been proposed for the dynamics of the IS608 transpososome during the transposition reactions[90][91] (Fig. IS200.13; IS200/IS605 video 1). As described in TnpA overall structure (above), TnpAIS608 could in principle assume two configurations: trans and cis. Switching between these two states would involve rotation of the two unconstrained flexible arms which join the αD helix to the protein body. The current model for IS608 and ISDra2 transposition proposes that the strand transfer step involves rotation of these arms from the trans to the cis configuration: cleavage occurs while the enzyme is in the trans configuration. A trans to cis conformational change then occurs allowing strand transfer. The ground state of the IS608 and ISDra2 transpososomes obtained from crystallography is the trans configuration. LE and RE binding and cleavage occur with the enzyme in its trans configuration (Fig. IS200.13; IS200/IS605 video 1). This results in formation of the 5’ phosphotyrosine bond with LE liberating a 3’-OH on the flanking DNA and the 5’phosphotyrosine bond with the RE DNA flank liberating a 3’-OH on the RE transposon end. Rotation of the two arms would displace LE towards the sequestered 3’-OH of RE and the RE flank towards the 3’-OH of the LE flank (Fig. IS200.13; IS200/IS605 video 1) and position them so that both 3’-OH can attack the appropriate phosphodiester bond. This model is supported by several lines of indirect evidence from studies of IS608.

An initial piece of evidence concerns the length differences in the LE and RE “linker” (the distance between the hairpin foot and the cleavage site): this is only 10 nt for RE but 19 nt for LE (Fig. IS200.9). The rotation model suggests that the longer LE linker may be required to provide sufficient length to rotate the 5’ LE phospho-tyrosine bond to position it close the immobile RE 3’-OH (Fig. IS200.13; IS200/IS605 video 1). This would imply that LE linker length is critical for strand transfer. Indeed, sequential reduction in the length of the LE linker has a large effect on transposition frequency and excision in vivo. In vitro, it also had a somewhat larger effect on strand transfer than on cleavage[92], supporting the idea that the linker is important for mechanical movement. However, transpososome formation and stability was also observed to be affected with the shortest linkers. This presumably reflects steric barriers to GL(R)/CL(R) interaction and supports the notion that these interactions are important in transpososome assembly. A survey of over 100 different IS from all three groups (35 from the IS200 group; 47 from IS605 and 24 from IS1341) in the public databases has shown that the asymmetry of the IS608 ends is conserved across the entire family: the left linker is always longer than the right (15-16 nt versus 8 nt)[93] (Fig. IS200.14).

Fig. IS200.14. Linker length distribution of LE and RE from 76 (red) and 80 (blue) different IS, respectively.

The second piece of evidence comes from the behaviour of TnpAIS608 heterodimers carrying point mutations in the HuH or catalytic Y. These were expressed and assembled in vivo and purified based on two different C-terminal affinity tags (one for each monomer). This permitted heterodimers to be distinguished from homodimers. A heterodimer with a combination of mutations which enforce a trans-active TnpA site (in which the wildtype HuH motif and Y127 belong to different TnpA monomers) is proficient for cleavage but not for rejoining. In contrast a heterodimer with cis-active TnpA site (in which the wildtype HuH motif and Y127 belong to the same TnpA monomer) is proficient for rejoining but inactive in cleavage[94]. This implies that all chemical reactions involved in cleavage occur in the trans site while the chemical reactions for strand transfer occur in the cis site. This strongly supports the rotational model.

A third piece of evidence comes from studies of the flexible arm which joins helix αD to the body of the protein and which is proposed to play a pivotal role in the rotation. This flexibility may be facilitated by two glycine residues (G117 and G118). Mutation of these two residues did not affect strand cleavage but led to inhibition of strand transfer suggesting that the two residues are required for achieving a cis configuration. The importance of these G residues is reflected in their conservation throughout the family[95].

Thus, while the cis configuration has not been observed crystallographically for these elements, its existence is strongly suggested by experimental data, supporting the trans/cis rotational model (Fig. IS200.15).

Fig. IS200.15. Strand transfer and reset model of IS608 transpososome. (A) The inactive form of TnpA dimer in the absence of DNA (pale green, orange ovals and dark green and orange cylinders represent the body and the αD helices of two monomers, respectively). At the ends, dotted red and blue lines represent linkers at the left end (LE) and the right end (RE), light red and light blue boxes represent GL and GR, respectively. (B) Binding of a copy of LE and RE resulting in TnpA activation (catalytic sites in trans). (C) Cleavage of both ends forms a 5′ phosphotyrosine linkage between Y127 and LE on one αD helix (dark orange cylinders) and between Y127 and the RE flank on the other (dark green cylinders). 3′-OH groups are shown as yellow circles. Reciprocal rotation of both αD helices from trans to cis configuration is indicated by large arrows. (D) Strand transfer takes place to reconstitute the joined donor backbone (donor's joint) and generate the RE–LE transposon junction at cis configuration. (E) Release of the donor joint and transition from cis to trans configuration. (F) Reset to the transform and target site engagement. (G) Cleavage of the RE–LE junction and target and transition from trans to cis configuration. (H) Regeneration of the left and right transposon ends.

Regulation of single strand transposition

Single strand DNA in vivo

The obligatory single-stranded nature of IS200/IS605 transposition in vitro suggests that it is limited in vivo by the availability of its ssDNA substrates inside the cells and processes that produce ssDNA may stimulate transposition. We describe below a link between the transposition of these elements and the replication fork. Moreover, in the case of ISDra2, single strand DNA produced during re-assembly of the D. radiodurans genome following irradiation results in stimulation of transposition[96][97]. Transcription or other processes leading to horizontal gene transfer such as transformation, conjugative transfer or transduction with single strand phages might also favor their mobility.

Replication fork

The replication fork modulates the transposition of many transposable elements (Tn7, IS903, IS10, IS50, Tn4430, P element[98][99][100][101][102][103]. For IS200/IS605 family members, the replication fork, in particular the lagging strand template, is an important source of ss DNA substrates for both excision and integration. Transposition can be considered to follow a “Peel and Paste ” mechanism (Fig. IS200.16) where the IS excises or is “peeled” off as a single strand circle from the lagging strand template of the donor molecule and then integrates or is “pasted” in a ss target at the replication fork.

Fig. IS200.16. Top: Excision of the single-strand circular intermediate (transposon joint) from the lagging strand template of a donor plasmid. Arrow tip: replication direction. Bottom: Integration of right end (RE)–left end (LE) transposon joint into the single-strand target at the replication fork.

Excision: Excision of IS608 is sensitive to the direction of replication across the element: it is more frequent when the active strand (top strand) is on the lagging strand (discontinuous) template (Fig. IS200.16 top; Fig. IS200.17) but difficult to detect when it is on the leading (continuous) strand[104]. Moreover, excision in vitro requires that both ends are in single strand form at the same time[105].

Fig. IS200.17. Orientation with respect to replication direction. The disposition of the IS608 active (top) strand with respect to replication direction is shown when the fork approaches from one direction (left) when it is part of the lagging-strand template or the other (right) when it is part of the leading strand. Okazaki fragments on the lagging strand are indicated as short lines. The direction of DNA synthesis is indicated with half arrowheads.

The length of ssDNA on the lagging-strand template depends on the initiation frequency of Okazaki fragment synthesis by the DnaG primase[106][107]. Transient inactivation of DnaG activity reduces this frequency and therefore increases the average length of ssDNA between Okazaki fragments; the IS608 excision frequency increased. Under permissive conditions for E. coli carrying a dnaGts mutation, using a plasmid-based assay with IS608 derivatives of different lengths, the excision frequency decreased strongly as IS length increased. In contrast, when DnaGts activity was reduced by growth under sub-lethal conditions, excision showed a much less pronounced length-dependence (Fig. IS200.18). This length-dependence might also contribute to the difference in copy numbers observed in the IS200 and IS605 groups (see "Distribution and Organization").

Fig. IS200.18. IS608 Excision of as a function of IS length (in kilobases).

Integration: IS608 integration is oriented (with its left end 3’ to a TTAC target site) and it requires an ssDNA target in vitro[108][109]. The close link between transposition and the replication fork is also illustrated by the integration bias, consistent with a preference for an ssDNA target on the lagging strand template (Fig. IS200.16 bottom). This was indeed found to be the case in E. coli for both plasmid and chromosome targets[110]. As expected, the orientation of insertions into the E. coli chromosome was correlated with the direction of replication of each replicore and was consistent with integration into the lagging strand template.

The orientation bias is not restricted to IS608 and ISDra2. An in silico analysis of a large number of bacterial genomes carrying copies of various family members revealed that most had a strong insertional bias consistent with the direction of replication[111] (Fig. IS200.19). Moreover, in certain cases, elements which did not follow the orientation pattern could be correlated to genomic region that had undergone inversion or displacement (Fig. IS200.20; Fig. IS200.21) suggesting that, once they occur, insertions are quite stable. It seems possible that this type of genomic archaeology based on orientation patterns could be used to complement the study of bacterial genome evolution.

Fig. IS200.19. Orientation of IS200/IS605 family members in different bacterial genomes. Overall GC skew (G – C / G + C) is indicated in blue and orange. Top. S. enterica (typhi) CT18; Middle. Y. pseudotuberculosis IP31758; Bottom. P. profundum SS9. Replication is bidirectional from a single origin (red spot). Arrows top or bottom indicate the point and orientation of insertion.
Fig. IS200.20. Orientation of IS1541 in Yersinia pestis (Microtus). Overall GC skew (G – C / G + C) is indicated in blue and orange. Replication is bidirectional from a single origin (red spot). Arrows top or bottom indicate the point and orientation of insertion. The IS orientation adheres strictly to the GC skew suggesting that there have been manychromosome rearrangements subesquent to IS insertion.
Fig. IS200.21. Comparison of S. enterica (typhi) CT18 and Ty2 genomes.

Stalled replication forks: Stalled replication forks appeared preferential targets for IS608 insertion. In the experiments using the Tus/ter replication termination or operator/repressor system, replication fork arrest attracts IS608 insertion[112]. Transient blockage of the unidirectional replication fork by the Tus protein at the ter site resulted in preferential IS608 insertion into the array of target sequences behind the stalled forks on the lagging strand but not on the leading strand (Fig. IS200.22). A similar result was obtained in the E. coli chromosome using the lacI/lacO and tetR/tetO repressor/operator roadblock systems[113][114] (Fig. IS200.23). Moreover, a significant number of IS608 insertions into the E. coli chromosome were localized in the highly transcribed rrn operons. This suggests that high transcription levels might affect replication fork progression (fork arrest by collision with RNA polymerase, R-loop formation, etc.) and could account for targeting the rrn operons. Thus, IS608 insertions can be targeted to the stalled forks and this may well represent a major pathway for targeting transposition.

Fig. IS200.22. Map of insertions with ter in the permissive and non-permissive orientations. Replication from d’ori is from left to right; * = target sequences TTAC close to ter ; horizontal arrow heads = Ternp (red) et Terp (black) ; vertical black arrow heads = IS608 insertions, vertical red arrow heads = multiple IS608 insertions upstream of Ternp and within Terp. (Ton-Hoang et al., 2010).
Fig. IS200.23. Top: Position of the lacO and tetO arrays in E.coli WX45 and WX51: The replication origin, ori, is shown as a red ellipse and the left and right replicores in blue and orange respectively. E.coli WX45 and WX51 contain arrays at different locations. Bottom: Insertions into E.coli WX45 and WX51; the left and right replicores have been separated for convenience. Above: a detail of the lacO array (light orange or light green rectangles) on the left replicore. Below: a detail of the tetO array (orange or green rectangles) on the right replicore. Black vertical arrows: insertions obtained in the absence of LacI (top) or TetR (bottom). Green or orange vertical arrows: insertions obtained in the presence of LacI (top) or TetR (bottom) in several independent experiments. The positions of the oligonucleotides (not to scale) used to localize the insertions are shown with half arrowheads. The kanamycin and gentamycin resistance cassettes used in the construction and insertion of the lac and tet operator arrays are also shown. * represents potential TTAC target sequences present in the region.
Genome re-assembly after irradiation in D. radiodurans

Deinococcus radiodurans, arguably the most radiation-resistant organism known, has a remarkable capacity to survive the lethal effects of DNA-damaging agents, such as ionizing radiation, UV light and desiccation. After exposure to high irradiation doses, the D. radiodurans chromosome which is present in multiple copies per cell[115][116] is shattered and degraded, but can be very rapidly reassembled in a process called ESDSA (Extended Synthesis Dependent Strand Annealing). This involves resection of the multiple dsDNA fragments to generate extensive ssDNA segments, reannealing of complementary DNA and reconstitution of the intact chromosome[117].

Mennecier et al.[118] analyzed the mutational profile in the thyA gene following irradiation. The majority of mutants were due to insertion of a single IS, ISDra2 which is present in a single copy in the genome of the laboratory D. radiodurans strain. Furthermore, using a tailored genetic system, both ISDra2 excision and insertion efficiency was found to increase significantly following host cell irradiation[119]. A PCR-based approach was used to follow irradiation-induced excision of the single genomic ISDra2 copy and re-closure of flanking sequences. Remarkably, these events are temporally closely correlated with the start of the ESDSA. The signal that triggers ISDra2 transposition is likely the production of ssDNA intermediates generated during genome reassembly. Consistent with this, the requirement of ssDNA substrates for ISDra2, as for IS608, was confirmed by in vitro studies of TnpAISDra2-catalysed cleavage and strand transfer[120].

ISDra2 excision also depends on the direction of replication and is consistent with a requirement for the active strand to be located on the lagging strand template in normally growing cells. However, this bias disappeared in irradiated D. radiodurans[121]. Since no apparent strand bias was observed in generating ssDNA during ESDSA, the lack of orientation bias in irradiated D. radiodurans suggests that ssDNA substrates are no longer limited to those rendered accessible during replication. This indicates that ssDNA sources are different in the contexts of vegetative replication and in genome reassembly.

Real-time transposition (excision) activity

The dynamics of IS608 excision from a donor site has been examined at the colony and single cell level in real-time using an artificial IS608 derivative inserted between the -35 and -10 elements of a PlacIQ1 promoter[122] driving expression of the blue fluorescent protein mCerulean[123]. TnpAIS608, N-terminally tagged with the bright yellow reporter Venus[124] was supplied in trans driven by PLTetO1 and controllable over a 100x range. Excision rates were proportional to the transposase levels and, as expected, excision depended on the orientation of the IS derivative with respect to the direction of replication in the donor plasmid: IS in an orientation with the active IS strand in the lagging strand template excised more frequently and at lower (10x) TnpA levels than when inserted into the leading strand, demonstrating the validity of the experimental system. In this system, individual excision events as bright flashes of blue fluorescence. Following an initial activity in part of the population when cells are applied to a solid medium, activity decreases or ceases during “exponential” growth but increases again at a constant rate (in a sub-population) upon growth arrest in a random (Poisson distributed) way. Moreover, the events do not occur randomly in the growing colonies and tend to be excluded from the colony edges. The study underlines the heterogeneity of TE activity rates in both space and time possibly resulting from heterogenous TnpA levels at the individual cell level in the population. These studies are reminiscent of the early studies of Jim Shapiro on phage Mu-mediated rearrangements in growin bacterial colonies[125][126].


TnpA alone can carry out both the cleavage and joining steps in vitro. TnpB is encoded only by the IS1341 and IS605 groups and not required for transposition of either IS608 or ISDra2 in Escherichia coli and Deinococcus radiodurans respectively[127][128][129] While TnpA activity has been extensively analyzed, few data are available concerning TnpB function. However several observations suggest that TnpB might play a regulatory role in transposition of IS200/IS605 family members.

Full length TnpB is approximately 400 amino acids long. An overview of TnpB organization was obtained by comparing the entire ISfinder collection of 85 tnpB copies with the Pfam domain database (Fig. IS200.24). This revealed three major domains: an N-terminal putative helix-turn-helix, a longer and more variable central domain OrfB_IS605 with a putative DDE motif[130] and a C-terminal zinc finger (ZF) domain[131]. The highest level of conservation is found at the C-terminal end of the protein which includes a highly conserved zinc finger of the CPXCG type[132]. Half of the analyzed TnpB copies including TnpBISDra2 but not TnpBIS608 contained all three domains, while only two did not include a zinc finger.

Fig. IS200.24. Organization of TnpB protein and derivatives: putative N-terminal helix-turn-helix motif (HTH), central OrfB_IS605domain with a putative DDE motif (Pfam) and C-terminal zinc finger motif (ZF) are shown. Numbers represent the occurrence of corresponding variants among 85 analyzed sequences: 46 carry all the three domains (e.g., ISDra2), 33 lack the HTH motif (e.g., IS608), whereas others retain separate domains.

Pasternak et al.[133] observed that TnpB has an inhibitory effect on ISDra2 excision and insertion in its host, D. radiodurans, and on excision in E. coli, and that the integrity of its putative zinc finger motif is required for this effect. Understanding the exact molecular mechanism of TnpB activity will require further study. Accordingly, analysis of the ISfinder database revealed that IS200/IS605 family elements exhibiting a high genomic copy number generally encode TnpA alone.

Although the details of TnpB activity are unknown, the protein has been identified in both prokaryotes and eukaryotes. It is carried by members of the IS607 family found both in prokaryotes and in eukaryotes and their viruses[134][135] but is dispensable for IS607 transposition in E. coli[136]. TnpB analogues, known as Fanzor1 and Fanzor2, have also been identified in diverse eukaryotic transposable elements[137]. TnpB/Fanzor proteins may function as transposition regulatory proteins in vivo. In addition, there appears to be a limited similarity of TnpB with the Cas12 and RuvC endo nucleases[138].

Y1 transposase domestication

There are many examples of eukaryotic transposases whose activities have been appropriated to perform various cellular functions (see [139][140][141]. However, the very few examples of this domestication for prokaryotic enzymes concern Y1 transposases.


Recently, a new clade of Y1 transposases (TnpAREP) was found associated with REP/BIME sequences in structures called REPtrons [142][143] (Fig. IS200.7 A). In spite of their compact size, bacterial genomes carry many repetitive sequences, often important for genome function and evolution. Among them, Repetitive Extragenic Palindromic sequences (or REPs) are short DNA repeats of 20-40 bp that can form stem-loop structures preceded by a conserved tetranucleotide (GTAG or GGAG) (Fig. IS200.25). REPs are found in intergenic regions in many bacterial species, particularly in proteobacteria, at high copy number[144][145][146]. There are nearly 590 copies in Escherichia coli K12[147] (Fig. IS200.26) and up to 2200 copies in Pseudomonas sp GM79[148]. REPs can exist as individual units but can cluster in more complex structures called Bacterial Interspersed Mosaic Elements (BIME). These are composed of two individual REPs in inverse orientation (REP and iREP) separated by a short linker of variable length. BIME are often found in consecutive tandem copies (Fig. IS200.25). Several roles have been attributed to these sequences including genome structuring, post-transcriptional regulation and genome plasticity. REPs are known to interact with protein partners such as Integration Host Factor[149], DNA gyrase[150] and DNA polymerase I[151]. REPs also increase mRNA stability and can act as transcriptional terminators[152][153] or as targets for different IS[154][155]. It has also been suggested that REP sequences are involved in REP sequences can downregulate translation of upstream genes dependent on trans-translation. This occurs only if they are within 15 nt of a termination codon. It has been suggested that that REPs can stall ribosomes, leading to mRNA cleavage and induction of the trans-translation process[156]. Recombination at REP sequences has also been shown to be involved in the formation of F’ plasmid derivatives (the classic F plasmid carrying various portions of the chromosome (Fig. IS200.27) from Hfr strains[157]. However, the origin of REPs and their dissemination mechanisms are poorly understood.

Fig. IS200.25. Top: Representation of two categories of REP structures in E. coli/Shigella with mismatches in the hairpin stem in orange and light blue, violet box represents the conserved tetranucleotide GTAG. Corresponding iREP structures in red and dark blue where green box represents the complementary tetranucleotide CTAC. (ii) Structure of BIME: REP and iREP separated by linkers C or D. BIME are frequently found as consecutive copies. Bottom: (iii) Examples of REPtrons from some representative E. coli strains. tnpAREP is shown in gray, the flanking genes yafL and fhiA in green and in violet, respectively. Arrows represent the direction of transcription.
Fig. IS200.26
Fig. IS200.27.

Although more complex, REPtrons are reminiscent of IS200 group members (Fig. IS200.25). However REPtrons do not appear to be mobile and, in general, a single copy of a given REPtron co-exists with numerous corresponding REP/BIME and genomes may harbor several distinct REPtrons[158][159]. It has therefore been suggested that REP/BIMEs represent a special type of non-autonomous transposable element mobilizable by TnpAREP.

In vitro analysis of REPtrons: Analysis of E. coli REPtron activity in vitro has shown that, like TnpAIS200/IS605, TnpAREP strictly requires single stranded REP/BIME DNA substrates and is strand specific, only REP can be processed, whereas iREP are refractory to cleavage[160]. Purified E. coli TnpAREP promotes ssREP cleavage (in the linker sequences either 3’ or 5’ to the REP structure) and rejoining, and this activity requires the conserved tetranucleotide GTAG and the bulge in the middle of the REP stem[161][162]. Cleavage in vitro is less specific than that of TnpAIS200/IS605 and occurs at a CT dinucleotide.

In contrast to TnpAIS608 and TnpAISDra2, E. coli TnpAREP is a monomer in solution and in the crystal structure[163]. Moreover, in the co-crystal structure, the short C-terminal tail is inserted into the active site blocking access to an ssDNA. It may therefore play a regulatory role in activity. Indeed C-terminal truncation of TnpAREP resulted in increased cleavage activity relative to the full-length protein in vitro. Biochemical and structural analysis suggested that the GTAG 5’ to the foot of the REP hairpin may play a similar role to the guide sequences GL/R in IS200/IS605. Moreover, structural data also highlighted numerous specific contacts between TnpAREP and GTAG, explaining its importance in the activity and clearly distinguishing TnpAREP from TnpAIS200/IS605, which do not directly contact the guide sequences (Cleavage site recognition). The way by which TnpAREP promotes REP/BIME proliferation through their host genomes remains to be determined.


Another role for Y1 transposases was suggested by the identification of chimeric genetic elements widely distributed in the genome of Clostridium difficile[164] and Bacillus cereus group[165][166][167]: IStrons. These combine functional and structural properties of group I introns at their 5’-end with those of an IS element at their 3’-end (Fig. IS200.28). This 3' part contains an IS200/IS605 related sequence including two full length or truncated orfs, tnpA and tnpB, very similar to those found in ISDra2 (D. radiodurans) and ISCpe2 (C. perfringens). IStrons are present at several loci in the same genome, indicating that this element is mobile and may move as a complete genetic unit. All IStron copies analyzed so far are inserted 3’ to the pentanucleotide TTGAT. In vivo, all variants can be efficiently and precisely excised signifying that components necessary for ribozyme activity are present[168]. Little is known about IStron behavior but the data suggest that IS components could mediate the spread of IStron while the intron component could assure splicing.

Fig. IS200.28. Organization of IStron where Intron and IS parts are indicated. P1–P8 and IGS represent characteristic features of group I Introns. LE, RE, TTGAT target site and two orf of the IS part are indicated.

In vitro oligonucleotide-based assays using purified IStron transposase confirmed that at the DNA level, TTGAT is the LE cleavage site in excision and the target site respectively (Caumont-Sarcos, unpublished). At the RNA level, the same sequence is probably required in the splicing reaction[169]. This would represent a novel type of intron invasion and transposition mechanism and provide a direct link between RNA and DNA worlds.

It is interesting to note that related IStrons have recently been identified which include components of the IS607 family[170][171]. These are characterised by a serine transposase together with a tnpB gene[172].


  1. <pubmed>26104715</pubmed>
  2. <pubmed>26350323</pubmed>
  3. <pubmed>23832240</pubmed>
  4. <pubmed>6313217</pubmed>
  5. <pubmed>9858724</pubmed>
  6. <pubmed>9631304</pubmed>
  7. <pubmed>11807059</pubmed>
  8. <pubmed>16209952</pubmed>
  9. <pubmed>16163392</pubmed>
  10. <pubmed>18280236</pubmed>
  11. <pubmed>18243097</pubmed>
  12. <pubmed>20090938</pubmed>
  13. <pubmed>20691900</pubmed>
  14. <pubmed>20890269</pubmed>
  15. <pubmed>17347521</pubmed>
  16. <pubmed>17347521</pubmed>
  17. <pubmed>10418150</pubmed>
  18. <pubmed>15179601</pubmed>
  19. <pubmed>6313217</pubmed>
  20. <pubmed>8253675</pubmed>
  21. <pubmed>8384142</pubmed>
  22. <pubmed>15179601</pubmed>
  23. <pubmed>9060429</pubmed>
  24. <pubmed>3009825</pubmed>
  25. <pubmed>3009825</pubmed>
  26. <pubmed>10471738</pubmed>
  27. <pubmed>10471738</pubmed>
  28. <pubmed>18725932</pubmed>
  29. <pubmed>26044710</pubmed>
  30. <pubmed>6313217</pubmed>
  31. <pubmed>6315530</pubmed>
  32. <pubmed>15179601</pubmed>
  33. <pubmed>6313217</pubmed>
  34. <pubmed>9422611</pubmed>
  35. <pubmed>10986230</pubmed>
  36. <pubmed>24195768</pubmed>
  37. <pubmed>26104715</pubmed>
  38. <pubmed>9858724</pubmed>
  39. <pubmed>11807059</pubmed>
  40. <pubmed>9789049</pubmed>
  41. <pubmed>9858724</pubmed>
  42. <pubmed>11807059</pubmed>
  43. <pubmed>11807059</pubmed>
  44. <pubmed>14676423</pubmed>
  45. <pubmed>17006450</pubmed>
  46. <pubmed>7557457</pubmed>
  47. <pubmed>2553665</pubmed>
  48. <pubmed>10913072</pubmed>
  49. <pubmed>16163392</pubmed>
  50. <pubmed>18280236</pubmed>
  51. <pubmed>18243097</pubmed>
  52. <pubmed>20090938</pubmed>
  53. <pubmed>20691900</pubmed>
  54. <pubmed>20890269</pubmed>
  55. <pubmed>16163392</pubmed>
  56. <pubmed>18280236</pubmed>
  57. <pubmed>16163392</pubmed>
  58. <pubmed>8374079</pubmed>
  59. <pubmed>23832240</pubmed>
  60. <pubmed>16209952</pubmed>
  61. <pubmed>18243097</pubmed>
  62. <pubmed>20890269</pubmed>
  63. <pubmed>16340015</pubmed>
  64. <pubmed>16209952</pubmed>
  65. <pubmed>20890269</pubmed>
  66. <pubmed>23345619</pubmed>
  67. <pubmed>16209952</pubmed>
  68. <pubmed>20890269</pubmed>
  69. <pubmed>16209952</pubmed>
  70. <pubmed>21745812</pubmed>
  71. <pubmed>20890269</pubmed>
  72. <pubmed>21745812</pubmed>
  73. <pubmed>20890269</pubmed>
  74. <pubmed>19524540</pubmed>
  75. <pubmed>29635476</pubmed>
  76. <pubmed>16209952</pubmed>
  77. <pubmed>18243097</pubmed>
  78. <pubmed>16340015</pubmed>
  79. <pubmed>16209952</pubmed>
  80. <pubmed>18243097</pubmed>
  81. <pubmed>18243097</pubmed>
  82. <pubmed>21745812</pubmed>
  83. <pubmed>21745812</pubmed>
  84. <pubmed>21745812</pubmed>
  85. <pubmed>18243097</pubmed>
  86. <pubmed>21745812</pubmed>
  87. <pubmed>19524540</pubmed>
  88. <pubmed>20890269</pubmed>
  89. <pubmed>21745812</pubmed>
  90. <pubmed>18243097</pubmed>
  91. <pubmed>23345619</pubmed>
  92. <pubmed>21745812</pubmed>
  93. <pubmed>23345619</pubmed>
  94. <pubmed>23345619</pubmed>
  95. <pubmed>23345619</pubmed>
  96. <pubmed>20090938</pubmed>
  97. <pubmed>16359337</pubmed>
  98. <pubmed>19703395</pubmed>
  99. <pubmed>9620951</pubmed>
  100. <pubmed>3000598</pubmed>
  101. <pubmed>2451025</pubmed>
  102. <pubmed>2546858</pubmed>
  103. <pubmed>21896744</pubmed>
  104. <pubmed>20691900</pubmed>
  105. <pubmed>16163392</pubmed>
  106. <pubmed>1531480</pubmed>
  107. <pubmed>1740453</pubmed>
  108. <pubmed>11807059</pubmed>
  109. <pubmed>18280236</pubmed>
  110. <pubmed>20691900</pubmed>
  111. <pubmed>20691900</pubmed>
  112. <pubmed>20691900</pubmed>
  113. <pubmed>12864855</pubmed>
  114. <pubmed>27466393</pubmed>
  115. <pubmed>649572</pubmed>
  116. <pubmed>7309705</pubmed>
  117. <pubmed>17006450</pubmed>
  118. <pubmed>16359337</pubmed>
  119. <pubmed>20090938</pubmed>
  120. <pubmed>20090938</pubmed>
  121. <pubmed>20691900</pubmed>
  122. <pubmed>27298350</pubmed>
  123. <pubmed>21479270</pubmed>
  124. <pubmed>11753368</pubmed>
  125. <pubmed>2838063</pubmed>
  126. <pubmed>2553666</pubmed>
  127. <pubmed>11807059</pubmed>
  128. <pubmed>16163392</pubmed>
  129. <pubmed>20090938</pubmed>
  130. <pubmed>11807059</pubmed>
  131. <pubmed>23461641</pubmed>
  132. <pubmed>12527760</pubmed>
  133. <pubmed>23461641</pubmed>
  134. <pubmed>23563966</pubmed>
  135. <pubmed>17109990</pubmed>
  136. <pubmed>10986230</pubmed>
  137. <pubmed>23548000</pubmed>
  138. <pubmed>28985291</pubmed>
  139. <pubmed>16937363</pubmed>
  140. <pubmed>23935529</pubmed>
  141. <pubmed>24348275</pubmed>
  142. <pubmed>20085626</pubmed>
  143. <pubmed>22199259</pubmed>
  144. <pubmed>20085626</pubmed>
  145. <pubmed>20528935</pubmed>
  146. <pubmed>23758774</pubmed>
  147. <pubmed>2092362</pubmed>
  148. <pubmed>23758774</pubmed>
  149. <pubmed>8262044</pubmed>
  150. <pubmed>9427406</pubmed>
  151. <pubmed>2197600</pubmed>
  152. <pubmed>20528935</pubmed>
  153. <pubmed>14731278</pubmed>
  154. <pubmed>26104715</pubmed>
  155. <pubmed>16563168</pubmed>
  156. <pubmed>25891074</pubmed>
  157. <pubmed>12511513</pubmed>
  158. <pubmed>20085626</pubmed>
  159. <pubmed>23758774</pubmed>
  160. <pubmed>22199259</pubmed>
  161. <pubmed>22199259</pubmed>
  162. <pubmed>22885300</pubmed>
  163. <pubmed>22885300</pubmed>
  164. <pubmed>10931294</pubmed>
  165. <pubmed>16907808</pubmed>
  166. <pubmed>18587153</pubmed>
  167. <pubmed>16030238</pubmed>
  168. <pubmed>10931294</pubmed>
  169. <pubmed><19667762/pubmed>
  170. <pubmed>16030238</pubmed>
  171. <pubmed>25324310</pubmed>
  172. <pubmed>10986230</pubmed>