Jump to content

IS Families/IS110 family

From TnPedia

Historical

IS110 was originally identified in 1985 in Streptomyces coelicolor A3(2) as an element present in a derivative of bacteriophage phiC31 carrying a selectable viomycin resistance gene. The phage was deleted for its attachment site and therefore unable to lysogenise its host. The presence of IS110 enabled the phage to integrate using homologous recombination with resident IS110 copies in the chromosome [1].

There are over 350 examples of IS110 family members from nearly 130 bacterial and archaeal species in the ISfinder database (May 2025) [2]. However,the Tpases of a very large number have also been identified in various sequenced bacterial genomes. Since the ends of most of these elements have not been defined they are not included in ISfinder.

Members such as the Mycobacterium paratuberculosis-specific IS900 and IS901 and the Coxiella burnetti IS1111 [3] are important because they can be used as a highly specific marker for precise strain identification (e.g. [4][5][6][7][8][9][10][11][12][13]).

One of the earliest studied IS110 group members to be studied in detail was IS492, from Pseudomonas atalantica originally identified by its activity in extracellular polysaccharide production (eps): inactivating the gene by insertion and reactivating it by excision [14][15].

Two IS110 family subgroups and relation to the Piv invertases

The family includes two subgroups which, it has been suggested, may represent two distinct families [16][17]: IS110 and IS1111. Members of the IS1111 sub-group are distinguished from those of the IS110 group principally by the presence of small (7 to 17 bp) sub-terminal IRs (Fig.IS110.1) and, recognized more recently, the location of relatively long non-coding regions [18].

Fig. IS110.1 Organization of IS110 and IS1111 groups and their transposase. Top. Organization of IS110 and IS1111 groups. The figure shows the subterminal inverted repeats typical of IS1111 group members (blue triangles) and their distance from the IS ends. The peach-colored boxes represent the relatively long non-coding regions (NCR) located upstream of the transposase gene in the IS110 group and downstream in the IS1111 group (see: [19][20]). Bottom. Organization of the IS110 DEDD transposase. The figure shows the constellation of the 4 residues, D, E, D and D towards the N-terminal part of the protein [21][22].

Both subgroups encode a DEDD transposase and, at present, is the only IS family known to encode this type of enzyme. DEDD transposases (see: Groups with DEDD Transposases) are related to the RuvC Holliday junction resolvase [23].

Relationship between IS110 transposases and the piv Site-Specific Recombination System.

The Tpase of the IS110 family was observed to be closely related to the Piv DNA invertase from Moraxella lacunata / M. bovis [24] and Neisseria gonorrhoeae [25][26][27] (Fig.IS110.2).

Piv catalyses inversion of a DNA segment permitting expression of a type IV pilin. Intriguingly, early studies revealed that the transposase of one IS, IS621, clustered within the Piv clade (Fig.IS110.2 A) and the IS carries ends with similarities to those of the 26 bp pilin gene inversion sequences [25] (Fig.IS110.2 B). Several piv-like genes (irg1-8 for invertase-related gene) were identified in Neisseria gonorrhoeae strain FA1090 [27]. However, none could complement either the Moraxella lacunata Piv or the IS492 transposase and inactivation of all eight genes and over-expression of one copy of each failed to show an effect on pilin variation, DNA transformation or repair.

Furthermore, analyses of DNA flanking the coding sequences supported the hypothesis that the Piv homologues are indeed transposases for two new IS110 family members, ISNgo2 and ISNgo3. ISNgo2 (irg3, 4, 5, 6 and 8) is present in multiple copies in N. gonorrhoeae while ISNgo3 (irg7 and also closely related to pivNM1) is found in single copy in N. gonorrhoeae and in duplicate copies in Neisseria meningitidis [27]. However, neither has yet been formally shown to transpose.

"Care should therefore be exercised in distinguishing between IS110 family transposases and functional piv genes."

Fig. IS110.2. Relationship between IS110/IS1111 family transposases and the Piv site-specific recombinase. Top. Piv genes: Shown in red : pivML (M34367, Moraxella lacunata ATCC17956, 969 aa); pivMB (M32345, Moraxella bovis EPP63, 969 aa); pivNG (U65994, Neisseria gonorrhoeae, 963 aa); pivNM1 (AE002505, Neisseria meningitidis MC58 ,957 aa); pivNM2 (AE002525, Neisseria meningitidis MC58, 951 aa); pivNM3 (AL162754, Neisseria meningitidis Z2491, 966 aa); pivEC (AB024946, Escherichia coli plasmid pB171, 828 aa); pivAB (AF282240, Acinetobacter sp. SE19, 975 aa); pivPC (AF011334, Pectobacterium chrysanthemi, 990 aa). ISs: Shown in orange (IS110) and blue (IS1111): IS621 (NC_009800, Escherichia coli ECOR28, 1,279 bp); IS110 (Y00434, Streptomyces coelicolor, 1,558 bp); IS116 (M31716, Streptomyces clavuligerus, 1,421 bp); IS117 (X15942, Streptomyces coelicolor, 2,527 bp); IS492 (M24471, Pseudomonas atlantica, 1,202 bp); IS900 (X16293, Mycobacterium paratuberculosis,1,451 bp); IS901 (X59272, Mycobacterium avium, 1,472 bp); IS902 (X58030, Mycobacterium avium, 1,470 bp); IS1000 (M33159, Thermus thermophilus HB8, 1,196 bp); IS1110 (Z23003, Mycobacterium avium, 1,457 bp); IS1111 (M80806, Coxiella burnetii, 1,450 bp); IS1328 (Z48244, Yersinia enterocolitica, 1,353 bp); IS1533 (M82880, Leptospira borgpetersenii, 1,464 bp); IS1547 (Y16254, Mycobacterium tuberculosis 9504, 1,346 bp); IS1594 (AF047044, Anabaena sp. PCC7120, 1,471 bp); IS1626 (AF071067, Mycobacterium avium, 1,418 bp); IS2112 (AF060871, Rhodococcus rhodochrous, 1,415 bp); IS4321(U60777, Enterobacter aerogenes plasmid pR751, 1,347 bp); ISNme1143 (AL162755, Neisseria meningitidis Z2491, 1,143 bp); ISH2e (ISfinder: ISMtsp6, Methylobacterium sp.) (AE000092, Rhizobium sp. NGR23, 1,201 bp) (.); ISRm19 (AL603647, Sinorhizobium meliloti, 1,224 bp); ISC1190 (AE006641, Sulfolobus solfataricus P2, 1,187 bp); ISC1229 (AE006641, Sulfolobus solfataricus P2 1,229 bp); ISC1491 (AE006641, Sulfolobus solfataricus P2, 1,488 bp); ISSt1206 (ISfinder: ISSto5) (AP000985, Sulfolobus tokodaii 7, 1,206 bp); ISSt1232 (AP000985, Sulfolobus tokodaii 7, 1,232 bp); ISSt1492 (AP000985, Sulfolobus tokodaii 7, 1,492 bp). The tree was constructed using the neighbor joining method. Scale bar is 0.1. Sequences marked with “??” are not presently available in ISfinder. Bottom. Comparison of the inversion recombination sequences of piv (invL and invR) with those of the left (LE and right (RE) end of IS629. The identities are shown in red. Bold CT dinucleotide at both ends indicates a possible 2-nucleotide DR, but more recently shown to be a “core” sequence involved in site-specific recombination. Data taken from Choi et al.,[21].

It was pointed out that one major difference in the organization of IS110 family members and the inversion systems is that, in the piv system, the recombinase is located outside the invertible segment, while in the IS110 family, it is located within the IS element [23]. It is interesting that the piv gene cluster is located together within a cluster of IS elements in the IS110 group suggesting that the piv recombination system arose from an IS110 ancestor (Fig. IS110.2, Fig.IS110.3A and Fig.IS110.3B). It has also been pointed out that the ends of IS621, an IS closely related to piv (Fig. IS110.2) bear some resemblance to the piv recombination site [21]; Fig IS110.2 B).

Organization

IS110 and IS1111 Subgroups Based on Transposase Sequences

Although the Tpases of the IS110 and IS1111 groups are very similar, more detailed analysis of those in the ISfinder library showed that they generally separate into two distinct groups delineating the IS110 members (orange segment in the figure) together with the related piv group containing IS621 (purple segment) from those of the IS1111 group (blue segment in the figure) (Fig.IS110.3A) and a deeply branching segment containing a mixture of both IS subgroups (green segment in the figure), an observation subsequently confirmed by Siddiquee et al., 2024 [28] using the same database. It is possible that the few IS110 elements found within the IS1111 group and the IS1111 elements within the IS110 group have been misclassified. A similar pattern was observed in a library of transposases from over 1000 family members including members of the ISfinder collection and members extracted from public databases (Fig.IS110.3B; [19][20]).

There are very few exceptions to this classification. Only 5 IS classified as IS1111 group members (small orange circle in Fig.IS110.3A) appear in the IS110 group (ISLin1, ISMba7, ISMba20, ISRta3 and ISRta4) and 6 classified as IS110 appear in the IS1111 group (ISStac1, ISSgr1, ISShwo5, ISShfr7, ISMahy12 and ISCARN52).

Clearly, in addition to the major IS110 and IS1111 subgroup division of this family, each contains additional deep branching clusters [28] more clearly shown in the analysis of Durrant et al., [19][20]; (Fig.IS110.3B).

Fig. IS110.3A. Transposase-based Phylogenetic Tree. All IS110/IS1111 family transposases available in ISfinder (06/2025) are shown. The blue segment indicates IS1111 group IS, the pale orange segment, IS110 group IS, the purple segment, the piv clade and the green segment indicates a clade with a mixture of both IS110 and IS1111 members as classified in ISfinder. Piv proteins are indicated by the long purple lozenges. Small blue and pale orange circles show members of the IS1111 group located in the IS110 sector and of IS110 members in the IS1111 sector. Purple lozenges show those IS observed to insert site specifically into attC integron recombination sites [29][30], the green lozenges show IS which insert site-specifically into REP (Repeated Extragenic Palindromes) sequences, the orange lozenges indicate insertions into IS3-family members specifically at the 3’ side of the codon for the second D of the DDE motif [28] and red lozenges indicate insertions into the IR of Tn21 group members of the Tn3 family [18]. These data concerning insertion specificity have been augmented by information submitted in the ISfinder database. The IS indicated by an arrow are those mentioned in the text.


Fig. IS110.3B. A phylogenetic tree based 1,054 IS110 family recombinase sequences. The small circles indicate those family members cataloged in the ISfinder database [2]. The segments are colored as in Fig. IS110.3 A: blue, IS1111 group ; pale orange, IS110 group; green segment indicates a clade with a mixture of both. Modified from Durrant et al [19].
Length Distribution.

Members (Fig.IS110.4) vary between 1136 bp and 1558 bp, with most clustered in the 1450 bp size range. The length distribution of the IS110 group is more disperse than that of the IS1111 group. The organization of IS110 family members is quite different from that of IS with DDE transposases: they do not contain the typical terminal IRs of the DDE IS and do not generally generate flanking target DRs on insertion. This implies that their transposition occurs using a different mechanism to that of DDE IS.

Fig. IS110.4. Length Distribution of IS110/IS1111 Family Members. All IS110/IS1111 family transposases available in ISfinder (06/2020) are shown. The number of IS in a given interval is shown at the top of each bin and the length, in base pairs, is shown at the bottom.


Direct Target Repeats, DR and the Problem of Defining the Ends

Some family members have been reported to generate small Direct Repeats (DRs) while others do not (e.g. Gómez-García et al [31] and [21]). However, in most cases where flanking DR occur, the data can be interpreted to show that one DR copy is present in the target while the second copy belongs to the IS and is transmitted via a circular transposition intermediate suggesting that integration is sequence-targeted. The fact that identification of IS110 and IS1111 ends is problematic due to the absence of terminal inverted repeats might also confound the question of the presence or absence of DR. The most conclusive way to identify the IS ends would be to compare empty and occupied sites and to determine the DNA sequence across the junction formed by the abutted IS ends of the circular DNA intermediate (see below: Transposon circles). This is rarely undertaken.

In this light, it should be noted that many of the IS110 family in ISfinder may have incorrect ends and require readjustment.

Subterminal inverted repeats.

Partridge and Hall [18] observed that a number of IS1111 subgroup members carry sub-terminal inverted repeats (IRst) (Fig. IS110.5 Left ) of 11 to 13 bp. These were located at approximately 6-7 bp from the left and 3-4 bp from the right end and shared significant sequence identity between the different IS. As for other IS, these sequences might be expected to be recognized and bound by the transposase. IS110 group members do not carry these long IRst. However, when Durrant et al [19][20] undertook a covariance analysis of a number of IS1111 and IS110 group members, they not only observed the long IRst in the IS1111 group but also revealed very short IRst in the IS110 group (Fig. IS110.5 Right).

Fig. IS110.5 Subterminal Inverted Repeats. Left: Long Subterminal inverted repeats identified in a number of IS1111 group members [18]. Right: Results of the covariation analysis of IS110 donor sequences identified a short subterminal IR. Target and donor sequences were analyzed using a covariation analysis in a large sequence library; target sequences showed no detectable covariation signal; donor sequences showed a prominent 3-base covariation signal corresponding to a LE ATA tri-nucleotide and an RE TAT tri-nucleotide. The features of both IS ends of IS110 and IS1111 group elements are shown using the actual sequences of IS621 (IS110) and IS1111A (IS1111) as examples. The IS is shown as a yellow box with a purple arrow indicating the transposase orf and its direction of expression. Left (LE) and right (RE) ends are pointed. Target DNA is shown in green, the core sequences involved in recombination (see later) in blue, and the subterminal inverted repeats in red [19].
Non Coding Region (NCR)

Unlike many IS families, the transposase orf does not occupy the entire IS length. Members of the IS110/IS1111 family contain a non-coding region (NCR). This was first noted for members of the IS1111 group downstream of the transposase orf [18] where it was stated that:

“It seems unlikely that such a long downstream region would be retained if it had no function”.

This was noted for ISPpu9, an example which is clustered with both IS110 and IS1111 related IS (Figs. IS110.3A and Fig. IS110.3B), to include both upstream and downstream NCR regions [18][31].

However, there appears to be a distinction between the IS110 and IS1111 group in this respect. For the IS110 group, the NCR is generally upstream of the tnp orf while in the IS1111 group it is located downstream [28][19][20]). A number of examples are shown in Fig.IS110.6. Although most conform to the IS110/IS1111 pattern, several such as IS621, ISRta3, ISHvo9, ISAzo22 and ISPpu9, exhibit both the upstream and downstream regions (Fig.IS110.6). In the case of the IS110 group member, ISPpu9, the downstream NCS is due to the presence of an ISPpu9 MITE (Fig. IS110.7A).

A small number of other exceptions have also been identified (Fig. IS110.3A). Of those IS classified as IS1111 but appearing in the IS110 group, the following have a typical IS1111 downstream NCR together with IRst: ISShwo5 (IRst 19bp 3 mismatches), ISShfr7 (IRst 13 bp 3 mismatches), ISMahy12 (IRst 11bp) while two have IS110-like upstream NCR but also IRst (ISStac1 IRst 11bp 2 mismatches; ISSgr1 IRst 12bp 1 mismatch).

Of those classified as IS110 but appearing in the IS1111 group: ISLin1 includes both upstream and downstream NCR and no apparent IRst; ISMba7 carries a long upstream NCR and short 4bp IRst; ISMba20, has a long upstream NCR and a 6 bp IRst; ISRta3 has upstream and downstream NCR and with no obvious IRst; ISRta4 also has upstream and downstream NCR and with no obvious IRst but is long and, finally, ISCARN52 exhibits a downstream NCR but no IRst,

It is probably worthwhile to analyze these probable exceptions in more detail to verify their ends and examine their transposases.

Fig IS110.6. Table illustrating the position and length of Non Coding Regions. The left-hand column indicated the group to which the IS belong; column two gives the IS name; column three gives the overall IS length; column four column five indicates the NCR length to the left of the transposase gene; column five indicates the NCR length to the right of the transposase gene and column six shows the sequence of the internal inverted repeat where known.


NCR, ISPpu9 and MITEs: a warning

The copy of the IS110 group, ISPpu9, which was originally included in ISfinder appeared to have NCR both upstream and downstream of the transposase gene. However, more detailed analysis revealed that the downstream NCR results largely from an extension which appears to be a diverged defective ISPpu9 copy. It is not clear how frequent this type of structure may be or whether it occurs at all with other family members but should be kept in mind when undertaking large scale genomic analyses.

One of these downstream NCR regions observed in the IS110 group member, ISPpu9, results largely from an extension which appears to be a diverged defective ISPpu9 copy. It is not clear how frequent this type of structure may be or whether it occurs at all with other family members but should be kept in mind when undertaking large scale genomic analyses.

It includes a junction of the right (RE, called box B by the authors) and left (LE, called box A’) ends separated by a characteristic AG dinucleotide (a characteristic dinucleotide which flanks ISPpu9 insertions [31]). This was identified from an analysis of the Pseudomonas putida KT2440 genome which carries seven ISPpu9 copies, each inserted site-specifically into one of the more than 900 35bp highly conserved REP sequences (Repeated Extragenic Palindromes) [31] (see: Circle formation and the integration of the IS110 group:ISPpu9) . The insertions are flanked by a 2 bp dinucleotide (5’AG 3’). Two types of ISPpu9 derivative with intact transposases (Fig. IS110.7A, i and ii) were indentified: two ISPpu9 copies which we will call wildtype (wt; Fig. IS110.7A, i) and five copies of the ISPpu9 catalogued in ISfinder (Fig. IS110.7A, i). Moreover, three copies of a third (defective) ISPpu9, devoid of the tnp gene but including both left (LE, called box A’) and right (RE, called box B’) ends were also identified (Fig. IS110.7A, iii).

Fig. IS110.7A. ISPpu9 Types found in the Pseudomonas putida KT2440 Genome. The transposable elements are represented by yellow horizontal boxes and transposase genes by horizontal purple arrows indicating the direction of expression. The left (LE) and right (RE) ends of the ISPpu9 module are represented by grey boxes. Those of the MITE module are indicated in blue. The magenta lines bordering LE and RE represent the flanking dinucleotide AG “core” sequences. i) ISPpu9. The red panel above shows the degree of similarity of the MITE with the right end of the longer ISPpu9 derivative which includes the ISPpu9 MITE. ii) ISPpu9 including a short MITE. iii) The MITE which has also been called an “orphan”. The Black horizontal arrows show promoters identified in ISPpu9 and the MITE. [32].


These were called “orphans”. They are in fact IS110 family MITEs. The catalogued IS carries an extension on the right which includes an abutted right and left end separated by an AG dinucleotide (Fig. IS110.7A, ii). This resembles the junction expected to form in a circular transposition intermediate (see: Transposon Circles below) while the region downstream is similar to, but diverges from, the non-coding region upstream of the transposase gene (Fig. IS110.7A, i, top). These similarities and differences between the upstream NCR and the sequence of the “orphan” were pointed out by Gomez-Garcia et al [31]. It produces an RNA which the authors called Ssr9 (see Mechanism: ISPpu9 and regulation by RNA below) which was also identified in other Pseudomonas putida strains: in Pseudomonas sp KBS0802, immediately downstream of the tnp genes in five cases with one in tandem and three independent copies; in Pseudomonas putida NCTC13186, immediately downstream of six of the seven tnp copies with an additional ssr9 gene in tandem in two of these, and four independent copies, two of them in tandem, in different genomic locations. This suggested that the ISPpu9 copies could transpose independently (“detach from the tnp gene” [31]).

The LE and RE of all 7 ISPpu9 copies were identical in sequence as were those of the accompanying MITE. However, the LE differed in sequence by a single base pair and the MITE RE differed by 3 bp from their ISPpu9 counterparts (Fig. IS110. 7B; [32]. In addition, LE of both the ISPpu9 and MITE moieties carried a short inverted repeat not present in the RE or in many other IS of this family.

Fig. IS110.7B. ISPpu9 Types found in the Pseudomonas putida KT2440 Genome. The DNA sequences of LE and RE of the 7 extended ISPpu9 (Aii) are shown below the schematic maps of the ISPpu9 (top) and accompanying MITE (bottom). Bases differing between the ISPpu9 and MITE ends are highlighted in red. The flanking AG core dinucleotides are shown in white and contained in a magenta box. Short inverted repeats in ISPpu9 and MITE LE are boxed [32].


Fig. IS110.7C. ISPpu9 Types found in other Pseudomonad Genomes. Legend as for Fig. IS110.7B. Bases differing between the ISPpu9 ends highlighted in red as are those within MITE ends [32].


These studies were extended to an analysis of additional Pseudomonas sp strains. Using the Pseudomonas putida KT2440 ISPpu9 transposase gene, tnpISPpu9 as a query, similar genes were identified in multiple copies in nine different Pseudomonas putida strains and one strain of P. plecoglossicida. All were flanked by LE and RE copies (Fig. IS110.7C).

These analyses confirmed that the MITE was only found in strains KT2440, NCTC13186 and KBS0802 [31][32] and since all three strains contained seven ISPpu9-MITE copies in the same genomic context, the authors concluded that the three strains evolved from a common ancestor. Moreover, the fact that the same differences between the ISPpu9 and associated MITE LE and RE occurred in each IS copy, it is probable that the association occurred prior to amplification (transposition) of the ISPpu9 genomic copies. Minor structural variations were observed between the strains: in particular, a tandem duplication of the ISPpu9 MITE at some loci and, in one case, the acquisition of an associated MITE [32] indicating subsequent diversification in the individual strains.

Transposase Coding Sequence

The single long, relatively well conserved, transposase reading frame shows some clusters of conservation within the N- and C-terminal portions. One characteristic which distinguishes IS110 family members from all other elements whose Tpases exhibit a predicted RNase fold is that the predicted catalytic domain of their DEDD Tpases is located N-terminal to the DNA binding domain [25][22] (Fig.IS110.1). In the DDE Tpases it is generally located downstream towards the C-terminal end of the protein. The alignment shown in Fig.IS110.5, based on 149 IS110 and 187 IS1111 group members, shows that the N-terminal catalytic domain of both IS110 and IS1111 groups share significant identities.

It had been noted that the DEDD region resembles a site-specific recombinase similar to the Piv invertase from Moraxella lacunata and Moraxella bovis [22][33]. In the absence of a suitable assay for IS492 activity at the time, the function of the DEDD residues was investigated using the Moraxella Piv inversion system where it was first shown that a mutant E59G of the DEDD motif was unable to accomplish inversion at the Piv recombination sites although it had no apparent effect on DNA binding [22]. Further mutational analysis confirmed that all conserved DEDD residues are required for Piv inversion [33]. It was also pointed out that the DEDD motif (and therefore the equivalent DEDD transposase motif) is analogous to the catalytic center of the RuvC Holliday junction resolvases.

The probable C-terminal DNA binding domains of the two groups vary somewhat from each other (Fig.IS110.8A). Those of the IS1111 group show significant conservation compared with IS110 group members, perhaps reflecting the different types of ends carried by each group. It has been pointed out that, while the C-terminal transposase ends are somewhat variable, both the IS110 and IS1111 subgroups show a conserved SG residue [28][19]). Moreover, as can be seen from Fig. 110.8B, the shared conserved residues are not restricted to SG but are somewhat more extensive.

Fig. IS110.8A. Alignment of the N-terminal catalytic domains of 149 IS110 and 187 IS1111 group transposases. Alignments were performed with Clustal omega using default settings and output used Jalview. Only a handful of alignments from the entire collection are shown. Conserved positions are indicated as different degrees of blue. The conserved positions and consensus sequences are shown below. Common DEDD motifs are indicated between the two panels.
Fig. IS110.8B. Alignment of the C-terminal probable DNA binding domains of 149 IS110 and 187 IS1111 group transposases. Alignments were performed with Clustal omega using default settings and output used Jalview. Only a handful of alignments from the entire collection are shown. Conserved positions are indicated as different degrees of blue. The conserved positions and consensus sequences are shown below. The figure illustrates the high conservation of this domain in the IS1111 group.
Fig. IS110.8C. Consensus Sequence of the C-terminal domain of 153 IS110 and 200 IS1111 group transposases. Alignments were performed with Clustal omega using default settings and output used Jalview. The analysis used the entire ISfinder IS110 family library (06/10/2025). Conserved positions are indicated below. The percentage conservation is indicated at the bottom. A sample list of aligned sequences is presented above. The conserved serine is boxed. In the case of IS1111 the serine threonine pairs are boxed.
Can Threonine Functionally Replace Serine?

An extended alignment of an updated ISfinder IS110 family library (06/10/2025) is shown in Fig. IS110.8C. This revealed that, while the IS110 group exhibited the highly conserved Serine (98% serine; 2% threonine) and G (90% glycine; 10% alanine), in the IS1111 group (as pointed out to us by Ruth Hall pers. com.), this serine is less well conserved (69% serine; 29.9% threonine). However, in cases where threonine occurs, the neighboring upstream amino acid is often serine (67% serine with no other particular preferred amino acid). In a few cases, both amino acids are threonine. This raises the interesting possibility that recombination may be catalyzed by the chemically related threonine.

Predicted Transposase Structures of IS110 and IS1111 group Members show Identical Domain Structures

Siddiquee et al., [28] used AlphaFold to predict the structure of several IS110 family transposases including ISEc21 (IS110 group) and ISEc11 (IS1111 group). Not unexpectedly, both these transposases are remarkably similar and also closely correspond to the structure obtained from cryo-em [34]; Fig.IS110.43 and Fig. IS100.45). AlphaFold predicted the three domain structure composed of an N-terminal RuvC-fold catalytic domain carrying the DEDD amino acid cluster (Fig. IS110.8D), a C-terminal domain carrying the conserved Serine (Tnp) and a coiled coli domain composed of two α-helices separated by a variable linker region. Both dimer and tetramer structures were also predicted and proved to be remarkably accurate. Fig. IS110.8D shows the AlphaFold predicted monomer structures of the IS110 and IS1111 transposases, TnpIS110 and TnpIS1111, and Fig. IS110.8E shows the overlay of these structures using the FATCAT software package, confirming that they have highly similar structures. Figures Fig.IS110.8 S1 to S9 presents the predicted structure and pairwise comparisons of additional members of the IS110 and IS1111 groups. These data strongly suggest that the reaction mechanisms of both groups are quite similar and provide strong support for including both the IS110 and IS1111 groups into a single family.

Fig. IS110.8D. Predicted Structures of IS110 and IS1111 Transposases. ) Alphafold prediction of IS110 (left) and IS1111 (right) transposases indicating the N-terminal ruvC domain carrying the DEDD tetrad (blue circle), the catalytic serine-carrying domain Tnp (yellow circle) and the bridging coiled coil domain (CC).
Fig. IS110.8E. Predicted Structures of IS110 and IS1111 Transposases. FATCAT superposition of both structures. Note the presence of a longer C-terminal alpha-helical tail carried by the IS1111 transposase.

Transposase activity

The close relationship between DEDD Tpases and the Piv invertases which resolve Holliday Junctions (HJ) structures during inversion [35] suggests that transposition of DEDD Tpases encoding IS may be unusual and involve Holliday Junction (HJ) intermediates [36] which are resolved using a RuvC-like mechanism [37]. The presence of the conserved serine residue (Fig. IS110.8B) is consistent with a site-specific recombination mechanism. However, the serine does not appear to be 100% conserved, especially in the case of IS1111 group members (Fig. IS110.8C; see: Can Threonine Functionally Replace Serine?) where it is present in only 2/3 of the examples and replaced by the related threonine. In some of these cases, it is possible that it could be replaced a neighboring serine. It should be noted that the serine is conserved to a much greater extent (98%) in the IS110 group.

However, together with the difference in domain organization between the DEDD (Fig. IS110.8A) and DDE Tpases, these obseravations reinforce the idea that the two IS types possess entirely different transposition mechanisms.

Until 2024, few data were available concerning enzymatic activities of the putative Tpases of this family of elements and indeed the transposase had not been isolated: for example, the IS900 Tpase had only been detected by immunological methods in the Mycobacterium paratuberculosis host [38].

Subsequently, other IS110 transposases have been purified and their properties investigated. These include those of ISEc11, ISKpn4, ISPa11, ISPst6 (IS1111 group) and ISEc21 (IS110 group) [28] and IS621 [19]. Interestingly, they all co-purify with, or have high affinity to, an IS-specified RNA species (see: A Specific Guide RNA direct target choice).

Mechanism

IS110 family members generate circular double strand DNA intermediates.

The early observation that another Streptomyces coelicolor IS110 family member, IS117, occurred in a circular form which integrated in a target DNA at a frequency two orders of magnitude higher than when cloned as a "linear" copy [39] led to the idea that IS110 family transposition occurs by production of an excised double stranded circular DNA IS intermediate (Fig. IS110.9).

Henderson et al, 1989[39] were perhaps the first to suggest that this family used site-specific recombination to transpose. IS117, originally identified as a “mini” circle shows a 2/3 base pair identified now called the “core” sequence (from the core nucleotides involved in cleavage during site-specific recombination; see: Transposons_families/Tn3_family#Resolution) between the circle junction and its specific site of insertion into the host chromosome [39][40][41] (Fig.IS110.9). Transposition was often found to result in tandem dimer inserts, behavior which might indicate some type of rolling circle insertion mechanism such as observed in the case of the IS91 family elements.

All family members analyzed from both the IS110 and IS1111 groups produce double strand circular transposon copies in vivo generally detected, using PCR, as DNA “junction” fragments carrying abutted IS ends. Their nucleotide sequences have also identified a single copy of the core sequence (the short nucleotide sequence flanking an inserted IS; see: Fig. IS110.5) in many family members: these include junctions of: IS117/IS116 (IS110) (Fig. IS110.13) [39][40][41][42], IS492 (IS110) [43][44], IS1383 (IS1111) [45], ISEc11 (IS1111) [46], IS4321/IS5075 (IS1111) [17] , ISPa11 (IS1111) [17], , ISEc21 (IS110) (see Fig.IS110.11) and ISPpu9 (Fig. IS110.7B and C ; Fig. IS110.16E) [32]. In earlier studies, circle junctions with interstitial sequences of various length have been reported e.g. IS117, 3, TAG [39][47]; IS492, 5bp [43][48]; IS1383, 10bp [45] comprising the two 5bp flanks.

In the case of the IS110 family member ISPpu9 with its accompanying MITE (Fig. IS110.7A; Fig. IS110.23A), multiple types of circle are observed by PCR[32]): minicircles of ISPpu9 itself (carrying the transposase gene), of the ISPpu9 MITE (specifying an RNA, ssr9, alone; Fig. IS110.23A) and of the entire ISPpu9-MITE structure could be detected indicating that all four IS ends (Fig.100.7A and B) are active. Following cloning and sequencing, all junction fragments carried an AG dinucleotide flanking sequence between the abutted ends.

In all cases examined, circle formation is dependent on the presence of an intact transposase gene. For IS492 at its eps site precise excision in Pseudomonas atlantica and circle formation in E. coli requires between 5 and 10bp flanks on both LE and RE

More detailed requirements for both circle formation and for IS insertion have been determined for a number of family members. These include ISEc21, ISPpu9 and ISPpu10 [32] of the IS110 group and ISEc11 of the IS1111 group (below). The exact molecular mechanism of IS110 family circle formation, however, is yet to be elucidated.

Circles could be generated by a copy-out-paste-in mechanism as adopted by IS families such as IS3, IS30 or IS256 family members or alternatively, in light of the similarities of the IS110 family transposase with site-specific recombinases, by site-specific recombination between the repeated flanks (Fig. IS110.9). In the latter case, unless there is a specific function which maintains the IS in its donor site (e.g. IS200/IS605), transposition might be expected to generate an empty donor site.

In early studies with IS117, no empty site was detected following transposition from the single chromosomal locus occupied by the IS to other sites [42]. On the other hand, IS492 was found to precisely excise from its site in the eps gene in Pseudomonas atlantica restoring eps activity.

However, since excision from the eps::IS492 was significantly higher than that of four additional IS492 copies at different chromosomal locations, and was correlated with a higher transcription level, it remains possible that precise excision is a special case.

Fig. IS110. 9. Transposition via IS circle formation and insertion. Circle formation: The IS is indicated in red and the flanking immediate flanking nucleotide as magenta box. The flanks are in blue. Circularization uses the immediate flanking sequences, resulting in abutted left and right ends (grey boxes) separated by one copy the immediate flanking sequence (core). The IS may be either retained in its donor site (left) or excised, leading to an empty donor site (right). Circle Insertion: Insertion occurs into a donor site by recombination between the interstitial core and the target core sequences.
Circle formation and integration of the IS110 group: ISEc21

ISEc21 was identified in 5 copies in the E. coli E2348/69 chromosome each with an identical target sequence (Iguchi and Hayashi, 2008. Direct submission to ISfinder). The target sequence was confirmed by Siddiquee et al., [28] (Fig. IS110.10 and 11) and, furthermore, shown to be a sequence including and surrounding the central D of the DDE motif of IS3 family members (e.g. ISCfr6, ISEc92, ISEc93). ISEc21 transposition has been studied in some detail [28].

The requirements for transposition activity were examined using a plasmid-cloned ISEc11 copy including ~100bp of flanking DNA (Fig. IS110.10 top). Abutted IS ends, presumably circular transposition intermediates, were detected by PCR, and the junction sequence with the junction promoter determined (Fig. IS110.10 top). Deletion of the upstream NCR sequence (bp 20 – 150) eliminated detectable circles. In addition, insertion into a suitable target DNA (involving both circle formation and insertion) was monitored by PCR reactions at both insert juntions (Fig. IS110.10, A) and was eliminated by deletion of the NCR (Fig. IS110.10, B). However, providing NCR in trans under control of a T7 promoter on a third plasmid, restored the entire reaction (Fig. IS110.10, C). This is analyzed in more detail below (see: Analysis of ncrRNA for a Second IS110 Group Member: ISEc21).

This system was also used to investigate the target sequence requirements which, although not systematic, clearly demonstrated that target specificity was robust and depended on a surprisingly small number of conserved nucleotides: 5/6 consensus nucleotides on the left and 5 on the right or only 3 on the right still permitted IS circle formation and insertion (Fig. IS110.11). However, mutation of a single base pair of the dinucleotide CA flank, prevented insertion.

Fig. IS110.10. The ISEc21 Transposition System. : Donor plasmid (grey circle); transposase gene, tnpEC21 (lilac); upstream non-coding region, NCR, left and right ends, LE and RE, (yellow); flanking sequences (green); ampicillin resistance gene (red). Junction formation was monitored by PCR Top: excision of the IS circle from the donor plasmid. Below: DNA sequence of the circle junction -10 and -35 junction promoter components (grey boxes) ; the left, LE, and right, RE, ends (yellow boxes). A, B and C) target plasmid backbone (red circle). Kanamycin resistance gene (red). ISEc21-target junction formation (insertion) was monitored by PCR at both ends. A) Insertion assay with a wildtype ISEc21. B) ISEc21 without its upstream NCR. C) NCR supplied in trans [28].


Fig. IS110.11. Defining the ISEc21 Target Sequence. Top: Sequence of the target and DNA flanks at the left and right IS ends. Left (LE) and right (RE) IS ends are in yellow boxes. Sequence of PCR products containing the Left flank, LF/LE, and right flank, RE/RF, junctions compared to the target. Identity of the target (green) sequence with the LE and RE flanks is represented by “:”. Bottom: Essential Base pairs in the Target for Integration. Various “target” sequences are shown. The insertion point is indicated by a yellow box. Conserved target bases (green, upper case); adjacent bases and bases altered in the target (black lowercase). Detection of LF/LE and RF/RE junctions by PCR is shown by + or – on the right [28].
Circle formation and integration of the IS110 group: ISPpu9.

In contrast to ISEc21, whose analysis used a plasmid-based system, a detailed analysis of ISPpu9 circle formation and insertion employed a system based on IS located in the host chromosome [32].

One particularity of this IS is the presence of a conserved internal inverted repeat located in LE (Fig. IS110.7B and C) which has not been noted in other family members. This was thought to be important since, as shown below this is partially conserved in the ISPpu9 target sequence (Fig. IS110.16; [49]).

A number of ISPpu9 derivatives with their flanking sequences were constructed, cloned into a mini-Tn5-carrying suicide plasmid and delivered to the chromosome of P. putida strain F1 (Fig. IS110.12). Their capacity for circle formation was assessed by PCR. Deletion of either LE or RE eliminated circle formation as did mutation of the terminal 5 bp of RE and of the 3’REP sequence (Repeated Extragenic Palindromes) together with the G nucleotide of the core AG dinucleotide (Fig. IS110.12 middle). Surprisingly, neither substitution of the internal IR within LE or within the right flank affected the level of IS circles.

For two mutants, a 5 bp substitution within RE and a 5 bp substitution at the tip of LE, a larger junction fragment was detected, possibly in higher quantity. This proved to be generated by recombination between one flanking AG copy and a second located next to a NotI restriction site used in cloning the IS (Fig. IS110.12 bottom).

Fig. IS110.12 ISPpu9 Circle Formation. Top: ISPpu9 insertion in the P. putida strain F1 chromosome.The sequence includes the left (LE) and right (RE) ends (called box A and box B by the authors). The flanking dinucleotides (« core ») are shown in white within a magenta box. The flanking target sequences (chromosomal REP sequences) are shown in green. NotI restriction sites used in cloning into the delivery vector are indicated along with secondary core sites shown in black within magenta boxes. A map of ISPpu9 (yellow box) with its left and right ends (grey boxes), dinucleotide core sequences (magenta lines) and transposase gene (purple horizontal arrow) is shown below. Middle: Mutant ends used in the analysis. The left end mutants used with a wildtype right end are indicated in the left box. Flanking sequences are shown in green. Flanking dinucleotides (« core ») are shown in white within a magenta box. Mutant positions are shown in red. Strike through indicates a deletion. The +/- symbols to the rigth indicate the level of circle production as judged by PCR of the circle junctions. The right hand box indicate mutant right ends used with wildtype right ends. Bottom: Unusual circle junctions obtained with wildtype LE and mutant RE1 and wildtype RE and mutant LE2 [32].
Circle Excision and Insertion Specificity of Additional IS110 and IS1111 Group IS.

A number of studies which have investigated the sequence specificity of insertion of various members of the IS110 family are summarized in the following:

IS117 was one of the earliest IS110 family members to be identified and analyzed. It has a 3 base pair core sequence.

Fig. IS110.13. IS117 (IS110) Circle Excision and Insertion.The left (LE) and right (RE) ends of the IS are indicated by horizontal blue arrows directed towards the inside of the IS. A) The empty chromosomal site in Streptomyces coelicolor is shown, with the target sequence in red. (Leskiw et al 1990) B) The result of IS117 insertion with the flanking repeat shown in red. (Leskiw et al 1990) C) The circle junction which includes a single copy of the flanking sequence shown in red. (Henderson et al 1989). D) Secondary integration sites with conserved sequences, shown in red. (Smokvina and Hopwood 1993).

Another member of the IS110 group, IS492, clearly undergoes Tpase dependent precise excision to regenerate a functional eps gene in Pseudomonas atlantica (Fig.IS110.14 A). The inserted IS copy is flanked by 5 bp directly repeated sequences (5’-CTTGT-3’) (Fig.IS110.14 B). The circle junction carries a single copy of this sequence (Fig.IS110.14 C) as does the empty target site. This suggested that one copy is carried by the IS and is required for activity. Sequential deletion of the ends of (Fig.IS110.14 D) clearly showed that the pentanucleotide and/or sequences immediately upstream were required for excision. On the other hand, a sequence 5’-GTTT-3’ located upstream in those insertions analyzed (Fig.IS110.14 E) was not required for excision. It is possible that they are needed for circle integration.

Fig. IS110.14. IS492 (IS110) Excision as a Circle. The left (LE) and right (RE) ends of the IS are indicated by horizontal blue arrows directed towards the inside of the IS. A) The empty chromosomal site in Pseudomonas atlantica is shown, with the target sequence indicated in red. B) The result of IS117 insertion with the flanking repeat shown in red. C) The circle junction, which includes a single copy of the flanking sequence, shown in red.
Fig. IS110.14. IS492 (IS110) Excision as a Circle. The left (LE) and right (RE) ends of the IS are indicated by horizontal blue arrows directed towards the inside of the IS. D) The effects of deletion towards the IS ends on circle formation (Perkins-Baldwin et al., 1999)

Similar flanking sequences have also been identified in insertions of IS900, IS901, IS902, IS116, IS1110, and IS2112 (Fig.IS110.15) and IS621 was also shown to have a flanking sequence, in this case a dinucleotide, CT [25].

Fig. IS110.15. Insertion Specificity of a Number of IS110 group Members. The left(LE) and right (RE) ends of the IS are boxed and in red. Flanking sequences at RE with total or partial identity to LE are also boxed and shown in red. The conserved sequence int the target upstream of LE is boxed, underlined, and bold. Where available the empty target sequence is shown on the far left. The publications from which the data have been extracted are Green et al 1989 and Doran et al., 1997 (IS900), Kunze et al., 1991 (IS901), Moss et al., 1992 (IS902), Hernandez Perez, et al., 1994 (IS1110), Leskiw et al.,[42] (IS116), Puyang, et al., 1999 (IS1626) and Kulakov, et al., 1999 (IS2112).


In the case of the IS110 family member ISPpu9 with its accompanying MITE (Fig. IS110.7A; Fig. IS110.23A), multiple types of circle have been observed[32]. In all three circular species one of the flanking “core” dinucleotides (an AG in this case; Fig. IS110.7A, B and C) was retained at the circle junction between the abutting LE and RE.

Like a number of IS110 family members (Fig. IS110.16) ISPpu9 had been observed to insert into Pseudomonas REP sequences at a specific site (Fig. IS110.16, B; [49]). Likewise, all seven P. putida KT2440 ISPpu9 copies had inserted at the same site, an observation reinforced by the upstream and downstream flanks of another 47 ISPpu9-like ISs from the Pseudomonas Genome Database.

The insertion specificity was also confirmed experimentally by conjugating a suicide plasmid carrying either a kanamycin (Km) or Gentamycin (Gm) resistant-tagged ISPpu9 into the ISPpu9-free P. putida strain, F1, which contains over 300 intergenic REP sequences (Fig. IS110.16, E).

Fig. IS110.16. IS110/IS1111 insertion into REP sequences. Arrows indicate the insertion point. Sequences found at the left and right ends are circled in red. A) IS621 (IS110) Insertion into two REP derivatives Z1 and Z2 as defined by Bachellier et al., 1993 and 1994 (data from Choi et al., [21]) B) ISPpu9/ISPpu10 (IS110). Both strands are shown. Each IS inserts into the same position but in opposite orientations. (data from Ramos-Gonzalez et al [50] and Tobes and Pareja [49]) C) ISRm19 (IS110) (data from Tobes and Pareja [49]) D) ISPa11 (IS1111). Note that there are no sequence similarities between the left and those flanking the right end. (data from Tobes and Pareja [49] and Partridge and Hall [18]).


Fig. IS110.16. IS110/IS1111 group insertion into REP sequences. E) Top: Cartoon showing the structure of the “tagged” IS copies. The resistance marker (gentamycin, Gm or kanamycin, Km, resistance gene is shown as a red arrow. Insertion of Km and Gm tagged ISPpu9 into P. putida strain F1 REP sequences. Insertion of Km and Gm tagged ISPpu9 into P. putida strain F1 REP sequences. The target AG core dinucleotide is shown in white within a magenta box. Top sequence represents the consensus determined for the insertions endogenous, observed in P. putida KT2440. Insertions of the antibiotic resistance ISPpu9 derivatives are shown below, The inverted repeat is indicated in red and corresponds to the lower part of the REP sequences shown in (B). Below: ISPpu9 LE with the AG dinucleotide and an extended inverted repeat with homology/complementarity to the target repeat [32].
Transposon Circles and insertion specificity: IS1111 group

The ends of IS1111 group members differ from those of the IS110 group by including short subterminal IRs (ISLst and IRRst). IS1383 was identified as flanking insertions into each end of the IS5 family member, IS1384 [17][45] and was also shown to generate IS circle junctions (Fig.IS110.17 A). Like most members of this group, IRLst is located further from the IS tip than is IRRst. In this case IRLst is preceded by the sequence 5’-agatgg-3’ (lower case indicates the IS end sequences upstream and downstream of IRLst and IRRst respectively). The insertions into the ends of IS1384 had occurred into a resident AG(A) sequence and excision to form the circle junction appeared to have occurred by recombination between the resident AG(A) and the terminal aga at the left end of IS1383 [45]. This this is compatible with a site-specific recombination mechanism in IS1383 transposition. A similar arrangement was observed for a second IS1111 group member, ISEc11 [46], where a flanking tetranucleotide AAAT also appeared as part of the circle junction (Fig.IS110.17 B) and it has also been argued that this is compatible with a site (sequence)-specific recombination transposition mechanism [46]. However, in two additional cases from the Hall lab, IS4321/IS5075 and ISPa11, no such “micro-homologies” were detected [17] (Fig.IS110.17 C and D). However, it should be noted that transposon circles are generated in vivo and analyzed by PCR. Since there may be a number of copies of the IS in the host genome, this might compromise the sequence of the PCR product.

. The subterminal inverted repeats IRL and IRR are in uppercase, and the IS sequences external to these in lowercase. A) IS1383 insertion sites and circle junction (Muller et al., 2001; Lauf et al., 1999). The left end sequence similar to that flanking the right end is shown in the circle junction as lowercase bold red. B) ISEc11 insertion site and circle junction (Prosseda et al., 2006). The left end sequence similar to that flanking the right end is shown in the circle junction as lowercase bold red. C) IS4321/IS5075 insertion site and circle junction. There is no similarity between the left end and the sequences flanking the right end. D) ISPa11 insertion site and circle junction. There is no similarity between the left end and the sequences flanking the right end.

The number of fully studied examples of IS1111 group members is limited, it is possible that the flanking “micro-homologies” observed for IS1383 and ISEc11 are chance occurrences and that excision and insertion of IS1111 members is truly mechanistically different from those of IS110 group members and that their division into separate families is justified. However, for present classification, both groups are included in the IS110 family in ISfinder for convenience.

Insertion specificity and target secondary structures

The particular insertion specificities of the IS110 family has been mentioned in the context of the mechanism of transposition and is often one factor in making definition of the IS ends difficult. However, one characteristic of insertion of this family of IS is that they often prefer sequences with the propensity to form secondary structures. This is consistent with the fact that the transposases are similar to the RuvC and the RuvC endonuclease is involved in resolving branched Holliday junctions during recombination (e.g.[51]).

For example, IS621 insertions were observed to be flanked by a CT dinucleotide [25]. On further examination this was shown to be a dinucleotide located at the foot of Rep sequences in the host Escherichia coli genome (Fig.IS110.16 A). REP sequences are small Repeated Extragenic Palindromic sequences often present in many hundreds of copies in bacterial genomes and which play a variety of structural and regulatory roles [52][53][54][55][56][57][58]. Both Z1 and Z2 Rep [53][54][55] sequences are used as targets and all 10 copies of IS621 in the E. coli ECO28 genome were found in this position in resident Rep sequences [25].

There are at least six other examples of this type of “structural” insertion specificity (Fig.IS110.2). All 7 copies of ISPpu10 were identified in short REP sequences of Pseudomonas putida KT2440 [59][60] and a cloned ISPpu10 derivative was shown experimentally to transpose into this REP target [59] (Fig.IS110.16 B). Seven (of 7) copies of a related IS, ISPup9, were identified in similar REP sequence at the same position but inserted in the opposite orientation (i.e. on the opposite strand)[61] (Fig.IS110.16 B) while 4/4 examples of ISRm19 were identified in a REP sequence of Rhizobium meliloti (Fig.IS110.16 C). Similarly, ISPa11 of the IS1111 group inserts specifically into a Pseudomonas aeruginosa REP (6 examples) [61] and one example from Partridge and Hall [17] (Fig.IS110.16 D).

Two types of Insertion have been described [61]. In type 1, the IS inserts at the same position within the REP whereas type 2 insertions occur adjacent to a REP. Most IS110 family members exhibit type I insertion patterns in all examples identified. However, one IS, ISPsy7 exhibited type II insertion pattern but only in 6/10 examples and a second unspecified IS from Neisseria meningitidis MC58 was also reported to exhibit a type II pattern in 3/5 cases examined [61]. It is possible that this N. meningitidis IS is the same as that described by Skaar et al. [27].

It is worth noting that few, if any, REP sequences have been identified in plasmids. Therefore, IS which target REPs can be expected to be found mainly in their host chromosomes.

At least six different members of the IS1111 subgroup (ISKpn4, ISPa21, ISPst6, ISUnCu1 = ISPa62, ISAvX1 = ISAzvi12 and ISPa25) show a preference for another type of target which can assume a structured configuration, the attC sequences of integrons [30][62]. IS which insert into attC sequences are grouped into a specific clade (Fig.IS110.2) [62]. The integron attC is central to integration of circular integron cassettes [63] and had been called “59 base pair element” [64] but can vary considerably in length [65]. Studies from the Mazel lab have shown that attC sequences can form foldback structures (Fig.IS110.18A) with imperfect matches in which extrahelical bases are involved in driving the direction of the excision and integration reactions [63][65][66][67]. Integration of IS1111 group members appears to occur at a specific position on these attC foldback sequences (Fig.IS110.18B).

Other IS of this family also appear to insert into conserved target sequences: IS1533 occurs in 84 copies in Leptospira borgpetersenii and inserts into a partially conserved sequence (ttAGACAAAA [IS1533] TATCAGagcc-gtct--aaa); ISRfsp2 from Roseiflexus sp RS-1, present in 40 copies in the host genome, is flanked by the sequence, CTCtGCGaaCGCtGCGc [ISRfsp2] CTCtGCGGtg (Fig.IS110.19) while ISMpa1 from Mycobacterium avium subsp. Paratuberculosis is flanked by the consensus CCAGN0–1CTA [ISMpa1] GCCN0–6GCCG [68].

Fig. IS110.18A. IS1111 group insertion into attC sites. Top: The secondary structures shown have been functionally and structurally identified by the Mazel group (Bouvier et al. 2005; MacDonald et al., 2006; Bouvier et al. 2009). The nomenclature of the repeat sequences are those used by these authors, since this reflects their position in the folded structure. Extra helical bases that are important in regulating the attC-attI recombination process are highlighted in green. The figure underlines the large variation in the length of attC as a result of “linker” DNA located between L’ and L’’.
Fig. IS110.18B. The position of insertion of different IS1111 group IS in a number of different attC sequences. The genes or identifier to which the particular attC sequence is attached are noted to the left of the figure. The names of the inserted IS are shown on the right. Data from Partridge and Hall [18] show the complete attC sequence. Those from Tetu and Holmes 2003 show only the left (5’) region.
Fig.IS110.19. An example of a high copy number IS110 group member, ISRfsp2 in the Roseiflexus sp. RS-1 Genome. A map localizing the IS on the sequenced genome is shown on the left. The alignment of the insertion sites is shown on the right.
Defining the ends: DNA flanks, empty targets and circle junctions

Certain family members have directly repeated DNA flanks (which have been termed “core” sequences). However, this is not universal since many others do not. In the case of IS110 group members, the flanks are often simply a dinucleotide which has been called a “core” sequence at which recombination between donor (circle junction) and target DNA occurs. Many IS110 group members possess this dinucleotide repeat (examples are shown in Fig. IS110.20A top left) and, where known, the sequence occurs at the circle junction (Fig. IS110.20A top right). However, some exhibit longer repeats and some have no obvious repeats. Members of the IS1111 group have also been identified with longer flanking repeats (examples are shown in Fig. IS110.20A bottom; Fig. IS110.20B) and with no repeats at all (Fig. IS110.20C). Examples whose sequence has been verified by identifying inserted IS copies, an empty target site and the circle junction are shown in fig, IS110.20A, 20B and 20C).

Fig.IS110.20. DNA Flanks, Empty Target Sites and Circle Juctions. A) DNA Flanks and Circle Junctions. The figure shows examples of the DNA flanks of IS110 (top) and IS1111 (bottom) group members. The left hand side identifies the DNA repeats at the flanks (white text in a magenta box). The Right hand side presents the circle junction sequences where known. Top: Examples of IS110 group members. The position of one conserved flank between the left and right ends in the circle junction is indicated (white text in a magenta box). The figure also shows examples with longer flanking repeats. Bottom: Equivalent analyses of IS1111 group members. Examples of IS1111 group members with flanking direct repeats with no flanking repeats and their equivalent circle junctions. The Subterminal inverted repeats are shown in bold with the regions conserved between them indicated in red.
Fig.IS110.20. DNA Flanks, Empty Target Sites and Circle Juctions. B) DNA Flanks, Circle Junctions and an Empty site of ISEc11. The empty site and integrated copy sequences of an IS1111 group member with directly repeated sequences are from Siddiquee et al.,[28]. The examples of both plasmid and chromosomal copies indicated on the left are extracted from Genbank sequences.
Fig.IS110.20. DNA Flanks, Empty Target Sites and Circle Juctions. C) DNA Flanks, Circle Junctions and an Empty sites of Two Additional IS1111 examples. Neither IS4321/IS5075 (top) nor ISPa11 (bottom) have direct flanking repeats and appear to integrate into their targets without identifiable homology. For each, a sequence flanking the left end (with a different length) is observed at the circle junction. The examples of both plasmid (IS4321/IS5075) and chromosomal (ISPa11) copies indicated on the left are extracted from Genbank sequences.
Extensive Bionformatic Analysis of Target Sequences

Siddiquee et al.,[28] undertook an extensive analysis of the IS110 family members in ISfinder using a library of IS together with their flanking DNA extracted from public databases and ranked in order of abundance and number of independent insertions. The different IS were found to occur with a very large range of frequencies. A number were represented only once in the library while others from both IS110 and IS1111 groups were present in very high numbers: some in several thousand with hundreds of unique insertion events: or IS1111 group members from 4620 copies and 1162 individual insertion sites for IS1533 to 23 copies with a single insertion event for ISXpo1; and for IS110 group members, from 9059 copies and 7061 insertion events for IS1663 to 163 copies and a single insertion event for ISSde13.

Analysis of these data using WebLogo revealed that the consensus target sequences with large differences between different IS in the strength and length of the conserved sequence (Fig. IS110.21 A and B). The results of the entire analysis can be found at https://github.com/AtaideLab/Targets/.

Fig.IS110.21A. Analysis of insertion sites of IS110 and IS1111 group members. A) IS1111 group members. B) IS110 group members. Weblogos of Consensus Insertion target sites of a selection of IS from large scale genome database analysis. The complete analysis can be found at https://github.com/AtaideLab/Targets/.
Fig.IS110.21B. Analysis of insertion sites of IS110 and IS1111 group members. A) IS1111 group members. B) IS110 group members. Weblogos of Consensus Insertion target sites of a selection of IS from large scale genome database analysis. The complete analysis can be found at https://github.com/AtaideLab/Targets/.
Transposase expression

Like many other IS which use double strand circular intermediates, circle formation often results in the assembly of a junction promoter formed from a -35 promoter element in the right end oriented outwards and a -10 promoter element in the left end oriented inwards [69][70][71]. For the IS110 family, this was originally identified in circular forms of IS492 [43] (Fig.IS110.22). which was significantly stronger than the lacUV5 promoter , and has also been demonstrated for a number of others (e.g. ISEc11 and a naturally occurring derivative, ISEc11p, IS621 and ISPpu10 [32].

A list compiled of many IS1111 group IS [17] and in silico construction of IS circle junctions indicated that all had the capacity to generate probable promoters. Due to small variations in the distance of the subterminal IRs from the probable end of the IS, some were separated by 10 bp and some by 9 bp. A notable observation for the IS1111 group is that while the -35 promoter elements are located entirely within the right IS end, the -10 promoter element was not located entirely within the left end but was composed of sequences from both the left and right ends and was only assembled on circle formation.

Few of these have been examined for activity. However, not all family members appear to specify a junction promoter. For ISPpu9 (IS110) no junction promoter was predicted using the Pseudomonas-specific promoter prediction tool (https://sapphire.biw.kuleuven.be/index.php) and no junction promoter could be demonstrated using β-galactosidase translational fusions (Fig. IS110.23B). However, the ISPpu9 (IS110) transposase promoter appears to be strong and, the authors argue, this alleviates the necessity for the transient junction promoter. In the same study, the circle junction of ISPpu10 generated a robust promoter[32] (see Fig. IS110.23A and Fig. IS110.23B).

Fig. IS110.22. Transitory promoter assembly at IS1111 family circle junctions. -35 and -10 promoter elements are shown in pink and green boxes respectively, and the subterminal IRs are labeled in pale yellow. Top: IS492 (IS110) was the first of this family to be shown to create a functional promoter (Perkin-Baldwin et al., 1999). Below: A compiled list of IS1111 ends assembled into circle junctions. (data from Partridge and Hall, 2003) Most of these have been assembled in silico, but those with published sequenced junctions are marked with a blue circle.
Transient Promoter Formation: the circle junction

It is important to note that there are some ambiguities in a number of the ends of IS110 family members documented in ISfinder due to the absence of terminal IRs as pointed out by Siddiquee et al., [28] the most definitive method of resolving these problems would obviously be to obtain the DNA sequence of the RE-LE IS circle junction and to compare this with an empty target site.

ISPpu9 and its Regulation by asr9 RNA

One of the first suggestions that control of transposition of IS110 family members might involve RNA came from studies on ISPpu9 [31] (Fig. IS110.23A and IS110.23B and IS110.7A).

An analysis of transcription in Pseudomonas putida [72] led to the identification of two untranslated regions (NCR) in ISPpu9 from which two small RNAs (sRNAs) are produced: one, ssr9, is located downstream of the tnp gene (tnpISPpu9) expressed from the probable defective ISPpu9 MITE-like structure (Fig. IS110.7, A) in the same direction and the second, asr9 (antisense sRNA of ISPpu9), is located upstream, convergent with the transposase promoter and expressed from the opposite DNA strand (Fig. IS110.23, A). Asr9 was determined to be nearly 5 times more abundant than ssr9. Tnp ISPpu9 transcripts were only detected at very low levels.

Fig. IS110.23. A) RNA seq on genomic ISPpu9. Top: Map of ISPpu9 (yellow horizontal box) showing the transposase gene (purple horizontal arrow) and the results of RNAseq (red). The IS ends, including those of the associated MITE on the right, are indicated by grey boxes and the promoters as black arrows. Bottom: DNA Sequence of the left and right IS regions (left and right boxes respectively). Note that the right sequence contains the entire MITE. The 5’ and 3’ REP target sequences are shown in blue boxes in lower case. Left and Right ends are indicated by grey boxes LE and RE. Inverted repeats are shown as blue arrows. The left hand box shows the probable transposase -10 promoter region, the +1 transcription start together with the transposase initiation codon are shown in red as are the probable -10 and -35 asr promoter regions and the +1 transcription start. The right hand box shows the transposase termination codon, the probable -10 region of the defunct transposase and of the ssr transcript [31].Flanking AG “core” dinucleotides required for activity are shown in bold white and underlined within magenta colored boxes.
Fig. IS110.23. B) Plasmid LacZ Transcriptional fusions. Top. Lac sequences are included in a blue box. Promoter elements are shown in red, as is the translational start. β-galactodidase units are shown to the right. The left-hand column shows the results obtained from P. putida KT2440 and the right column, those from strain F1 [31]. Bottom. Transcriptional fusions, including various RE-LE junctions. The horizontal blue arrow represents lacZ gene. Grey boxes represent the ISPpu9 LE and RE, the blue box shows the MITE RE. The magenta line shows the AG core dinucleotide. The transposase (tnp) asr9 and ssr9 promoters are shown as arrows. The right-hand columns show the b-galactodidase units measured from the different plasmid constructs in exponential and stationary phases [32]. NOTE that the β-galactosidase units are approximate in both Top and Bottom.

Inspection of the sequences of both asr9 (upstream) and ssr9 (downstream) indicated a significant divergence (Fig. IS110.23 and Fig. IS110.24) which presumably eliminates the asr9 promoter in the downstream ssr9 sequence although both maintained an upstream inverted repeat.

Fig. IS110.24. Sequence Differences between the ISPpu9 Left End and the Right Hand Mite. The two sequences are aligned Red characters indicate differences. Bold characters indicate various functional nucleotides including: the probable transposase (Ptnp) -10 promoter region; the +1 transcription start (missing in the MITE); -10 ssr (missing in the ISPpu9 sequence); the ssr transcription start site (missing in ISPpu9 sequence); the asr +1 (missing in the MITE); the 1-10 and -35 asr promoter (Pasr9) signals (missing in the MITE); and the transposase translation initiation codon (missing in the MITE). The LE-associated inverted repeat is present in both and the more internal inverted repeat (missing in the MITE) are shown by blue horizontal arrows. The IS ends are shown as grey boxes [31].


Clearly, asr9 could act as an anti-RNA to control transcription/translation of the tnp gene. To investigate this, a series of plasmid-based Tnp-lacZ translational fusions were constructed (Fig. IS110.25). These included derivatives containing either the first two tnp codons (called 2 and 2+S, Fig. IS110.25, 1 and 2) eliminating the asr9 -35 promoter component or the first 8 (called 8, 8+S and ; Fig. IS110.25 3, 4 and 5) which include the entire asr9 promoter (Fig. IS110.25, 3 and 5) or a copy with a mutated -35 promoter component (Fig. IS110.25, 4). The 2 and 8 tnp codon derivatives were also constructed with (Fig. IS110.25, 2 and 5) or without the corresponding downstream ssr9 promoter (Fig. IS110.25 1 and 3).

Propagation of these plasmids in Pseudomonas putida F1 (which is devoid of ISPpu9 or associated genes) revealed that plasmids 8 and 8+S (Fig. IS110.25, 3 and 5) produced significant levels of asr9 RNA while plasmids 2 and 2+S1 and 2 (Fig. IS110.25, 1 and 2) did not. The plasmid which had a mutated -35 promoter box (Fig. IS110.25, 4), however continued to produce a low level of the RNA. Measurement of β-galactosidase activity from these plasmids in Pseudomonas putida F1 (which is naturally devoid of ISPpu9 sequences) revealed that plasmid 2 (Fig. IS110.25, 1) was only 25% that of construct 8 (Fig. IS110.25, 3) although the levels of lac mRNA were only 70 % lower suggesting that the major effect of asr9 RNA was on translation.

The authors propose that the tnp ribosomal binding site in the mRNA is masked by the inherent secondary structure and that interaction with asr9 RNA liberates this, facilitating TnpISPpu9 translation (Fig. IS110.25 bottom). Moreover, introduction of an asr9 gene into the chromosome of Pseudomonas putida F1 further significantly increased β-galactosidase expression from plasmid 8 (Fig. IS110.25, 3). However, this expression enhancement did not occur with plasmid 2 (Fig. IS110.25, 1) and the authors suggest that this could be because asr9 cannot properly hybridize with the NCR RNA of plasmid 2 possibly because the sequence between codons 2 and 8, plasmid might be important for asr9 activity by, for example, providing an initiation point for pairing. This was not further tested.

Additionally, the presence of ssr9 appeared to alleviate the effect of asr9 suggesting that this RNA, with partial identity to the upstream NCR (Fig. IS110.23), might be able to sequester asr9 thus reducing its activity. Such an interaction was detectable in vitro. This effect was observed in Pseudomonas putida F1 as a 27% lower β-galactosidase level from the 8+S plasmid than from the 8 plasmid and a 35% lower level in the Pseudomonas putida KT2440 host.

The notion that the NCR secondary structure is responsible for sequestering the translation initiation signals is supported by the observation that a number of mutations designed to disrupt or weaken the NCR secondary structure and therefore demask the ribosome binding site resulted in a large increase in β-galactosidase expression in the absence of asr9.

Using lacZ transcriptional fusions, the activities of Pasr9 and Pssr9 were found to be about 3 fold higher than Ptnp and asr9 RNA was significantly more stable (half life >60 min) than ssr9 (half life ~3 min). The authors present experiments which lead to the conclusion that asr9 stability is due to its sequence and secondary structure rather than to interaction with ssr9 or the 5’NCR RNA.

It should be noted that these studies addressed “linear” IS copies and did not involve the presumed circular intermediate (see: Transient Promoter Formation: the circle junction). Regulation of Tnp expression among other characteristics is likely to be modified in these transposition intermediate structures.

Fig. IS110.25. Tnp-lacZ Translational Fusions. The effect of the 5’ NCR, Asr and Ssr on Transposase expression measured by translational fusions to the lacZ reporter gene. The constructions are shown as cartoons on the left. The horizontal blue arrow represents lacZ gene and the purple box shows a fusion with either the first 2 or 8 codons of the transposase. The transposase promoter, Ptnp, is included in all constructions while the ssr promoter, Pssr, is only included in constructions (2) and (5). The complete asr promoter, Pasr, with its -10 and -35 is present in constructions (3) and (5) while construction (4) carries a mutated -35. These features are shown on the aligned DNA sequences to the right together with the +1 translational start for the Asr RNA (red). -10 and -35 positions are underlined and in bold. Note that the Tnp ribosome binding site (RBS-tnp) is boxed and ovelaps the asr – 10 promoter component. The positions in black font correspond to tnp, those in blue (boxed) to‘ lacZ , and those in gray to extra codons introduced during cloning. Positions mutated at the -35 region of promoter Pasr9 are indicated in green. The table on the right shows the relative levels of β-galactosidase produce (-/+) and the presence (+) or absence (-) or asr- or ssr-RNA. The schemas at the bottom show how pairing of asr RNA to the 5’NCR of the tnp mRNA could unfold the hybridization loop providing access of the RBStnp to ribosomes thus facilitating tnp translation.
ISPpu10 and its Regulation by RNA

A similar analysis of ISPpu10 also from P. putida KT2440 showed that it too specified an asr RNA, asr10 (Fig. IS110.26). Moreover, as judged by transcriptional fusions to lacZ, the asr10 promoter (Fig. IS110.26, 5) was significantly stronger than that of the transposase with or without the convergent asr10 promoter (Fig. IS110.26, 3 and 4) which appeared to be significantly weaker than the ISPpu9 Ptnp. In the case of ISPpu10, the circle junction assembled a very strong promoter (Fig. IS110.26, 2)[32].

Fig. IS110.26. RNA seq on genomic ISPpu10. Top: Map of ISPpu10 (yellow horizontal box) showing the transposase gene (purple horizontal arrow) and the results of RNAseq (blue). The IS ends are indicated by grey boxes and the asr10 promoter as a black arrow. The CT dinucleotides are indicated by magenta lines. Middle: Trancriptional fusions. The horizontal blue arrow represents lacZ gene. Grey boxes represent the ISPpu10 LE and RE. The magenta line shows the CT core dinucleotide. The transposase (tnp) and asr10 promoters are shown as arrows. 1) vector alone. 2) Circle junction. 3) transposase promoter with convergent asr10 promoter. 4) transposase promoter without the corresponding asr10 promoter. 5) asr10 promoter. The right-hand columns show the β-galactodidase units measured from the different plasmid constructs in exponential and stationary phases. NOTE that the β-galactodidase units are approximate. Bottom: ISPpu10 junction sequence. -35 and -10 promoter elements are shown in red[32].

RNA from the NCR may be Involved with Target choice and Integration

NCR RNA from IS110 group members:IS621

The involvement of an RNA from the downstream NCR in determining IS1111 group insertion specificity had been suggested [30] based on comparison of ISKpn4 and ISPa25. ISKpn4 belongs to an IS1111 subgroup targeting att sites of integron cassettes (Fig. IS110.3A) and while ISPa25 also targets att sites, it belongs to an IS1111 subgroup including IS4321 and ISPa11 (Fig. IS110.3A) whose transposases have low amino acid similarity with the ISKpn4 subgroup and targets the IR of Tn21 transposons. It was noted that ISKpn4 and ISPa25 share a block of sequence similarity in the downstream non-coding region (Fig. IS110.27) and it was suggested that, as RNA, this might be responsible for target choice. More careful analysis presented here has revealed that the two IS also share blocks of similarity at the 3’ end of their transposase genes and that this results in strong amino acid conservation in the transposase itself (Fig. IS110.19). The first block of similarity carries the G..P/SG conserved residues (Fig.IS110.8B).

Fig. IS110.27.A) Sequence Patchwork of IS1111 Group Members: ISKpn9 and ISPa25. Top: Comparison of ISKpn4 and ISPa25. The IS are shown as horizontal yellow boxes and the transposase orfs as purple horizontal arrows showing the direction of expression. Regions of strong similarity are shown as blue boxes with the IS coordinates above (ISKpn4) or below (ISPa25). The coordinates of the transposase codons for ISKpn4 are indicated between the two IS. Middle: DNA sequences of the three blocks of similarity. ISKpn4 (top lines) and ISPa25 (bottom lines in each box). Identical nucleotides are shown in black text. Bottom: Protein Sequence of the C-Terminal transposase end. The block od similarity are shown in blue (bold) and the identities are underlined.


Fig. IS110.27.B) Alphafold predicted structures (left) and structure overlay (right) based on FATCAT superposition of both structures.


Moreover, Durrant et al [19][20] extracted and aligned a large number of examples of this family from public databases (2023) (Fig. 110.3B) which greatly increased the number of family members in the ISfinder database. They observed that, compared to other IS families, members of the IS110 family exhibit some of the longest non-coding ends (NCR or Untranslated Regions, NCR) among IS families. That this is a conserved family feature is suggested by a relatively narrow length distribution (between 230 and 290 bp).

Identification of Specific NCR from IS621 (IS1111) with Strong Transposase Affinity

To further explore the mechanism involved in IS110 transposition, Durrant et al [19][20] used IS621 of the IS110 group as a model system. IS621 (Fig. IS110.2, B) was first described by Choi et al [21] and comparison of a number of resident IS621 homologues in E.coli demonstrated that they insert at the foot of a REP sequence and are flanked by a CT dinucleotide (Fig. IS110.18). IS621 has both upstream and downstream NCR sequences (Fig. IS110.6A and Fig. IS110.27A). The predicted RE-LE junction of the probable IS621 circular transposition intermediate was cloned together with the tnp upstream NCR and analyzed for RNA expression in E.coli [19][20]. A prominent RNA region of approximately 170 nts was identified which appeared to originate just downstream from the junction promoter and continue until immediately before the TnpIS621 +1 codon (Fig.IS110.28).

Fig. IS110.28. IS621, the IS Circle Junction and its Transcript. Top: Map of IS621 (yellow box) showing the transposase (purple arrow) and the left and right ends (grey arrows). Bottom: the DNA sequence (black characters) across the RE-LE junction in the IS621 circular transposition intermediate. Right (RE) and Left (LE) ends are indicated within a grey box. They are separated by the CT dinucleotide (blue) which flanks the original inserted copy [21]. The junction promoter, Pjunc, -10 and -35 components are shown within yellow boxes and the transcription start site (TSS) is shown within a red box. The RNA transcript is shown as a red dotted line and the left target guide (LTG), right target guide (RTG), left donor guide (LDG) and right donor guide (RDG) sequences are shown as red characters and underlined. The transposase start codon, ATG, is shown in red.


Using purified TnpIS621 and in vitro transcribed ncRNA, it was found, using Microscale thermophoresis (MST) to determine the equilibrium dissociation constant, that the protein showed high affinity for the RNA. This is a characteristic of guide RNAs in other systems where they co-purify with their guide endonucleases (see: IS200/IS605 family: TnpB and its Relatives).

A Consensus ncRNA Double Loop Structure for IS621 Orthologues

A consensus ncRNA (non-coding RNA) structure was then determined for over 100 IS110 orthologues using structural alignments and structural prediction software together with sequence conservation. Development of a covariance model revealed the presence of a 5’ stem-loop followed by two larger stem-loop structures each with a large internal loop (Fig. IS110.29). The first had low sequence conservation while the second was significantly more conserved.

Fig. IS110.29. Generalised Secondary RNA Structure. The consensus ncRNA secondary structure was constructed from 103 IS110 LE sequences. The predicted structure comprises a 5′ stem - loop and two large internal loops. A key is included to the right of the figure.

The strong binding of the ncRNA to the Tnp protein raised the possibility that it may favor target recognition.

Extending the Consensus to Other Group Members: ncRNA Complementarity with Donor Junction and with Target

To explore this, the authors first defined the ends of a large number of IS110 elements enabling identification of their insertion sites and reconstruction of both the target sequence and the junction of the circular form. They then performed an iterative search with the structural covariance model (CM) developed for IS621 ncRNA (Fig. IS110.29) to predict ncRNA structures in the LEs of this IS collection, generated paired alignments of the ncRNAs with their corresponding target and donor (abutted LE and RE ends) using a 50bp window centered on the donor “CT” dinucleotide core, and undertook covariation analysis (2,201 donor - ncRNA pairs and 5,511 target - ncRNA pairs) detected by homology with IS621 [73]. This incorporated base-pairing analysis to identify stretches of these ncRNA complementary to either the top or bottom strand of the target or donor DNA. It identified possible pairings with the two internal ncRNA loops. By projecting the overall covariation pattern for the entire collection onto the model IS621 ncRNA sequence, the authors inferred that the first loop could base-pair with the target and the second to the donor junction: the 5’ side of the loop would pair with the bottom target donor strand (8-9 nts) and the 3’ end with the top strand (4-6 nts) (Fig. IS110.30A) [19][20].

Fig. IS110.30. Covariance Analysis and Complementarity of ncRNA with Target and Donor. A) The analysis was carried out with 5, 511 ncRNA–target pairs (top left) and 2,201 ncRNA–donor pairs (top right). The target (left, green) and donor (right, organge) are represented vertically. The IS621 ncRNA sequence is shown below along with dot-bracket notation secondary structure predictions together with LTG and RTG sequences in green and LDG and RDG sequences in orange. Covariation scores are colored according to strand complementarity (insert bottom left): blue, high covariation and bias toward top-strand base-pairing; red, high covariation and bias toward bottom-strand base-pairing. Regions of notable covariation signal indicating base-pairing for IS621 are boxed. An extended signal for the top strand (purple lozenges) is observed and, on the IS621 sequence is indicated by the ribonucleotides UGC marked in red. The double strand target (left) and donor (right) sequences are included below showing the sequence of complementarity (boxed) Complementary nucleotides within covarying regions are highlighted in bold. The CT dinucleotide which occurs as a direct flanking repeat in the inserted IS [21] and at the circle junction is shown in blue.
Fig. IS110.30. Covariance Analysis and Complementarity of ncRNA with Target and Donor. B) Nucleotide conservation across the predicted ncRNA. 2,715 ncRNA orthologue sequences were identified using an iterative search with the original IS621 model. Top: Nucleotide conservation represented in WebLogo format. The various secondary structure elements are indicated mapped onto the IS621 ncRNA and delimited by vertical blue lines. Stems are indicated by horizontal colored arrows. The first loop shows low sequence conservation, while the second is much more conserved. Sequence features of the bridge RNA are highlighted for clarity. From Durrant et al [19].
An Invasion Model for Bridging Donor and Target Sequences

These strong signals of covariation and base pairing led to the idea that ncRNA bridges the target sequence and the IS circle junction during transposition and led to the “invasion” model shown in Fig. IS110.31 [19][20]. In this model both upstream and downstream loops engage and align the target and donor DNA sequences facilitating recombination at the core by the DEDD Tnp (Fig. IS110.8.A) presumably with the aid of the conserved serine residue located in the C-terminal domain as the nucleophile (Fig. IS110.8.B). The authors underline the observation that the “core” dinucleotide is included in all 4 of the base pairings (Fig. IS110.30A). Thus there is an overlap between top- and bottom-strand pairings precisely at the core dinucleotide. This presumably plays a key role in the recombination (cleavage and strand exchange) reactions which was confirmed by structural studies (below).

The covariance data also suggested that the IS621 right target guide sequence (RTG) is short and that other members of the IS110 group include longer RTG (Fig. IS110.30A- note the purple extension on the Upstream Loop, Top strand). This is indicated on the IS621 sequence by the red ribonucleotides (see also Insertion in vivo)

An Efficient in vitro Recombination Reaction: ncRNA Functions to Bridge Donor and Target.

An in vitro IS621 recombination reaction was assembled to test this idea. This was composed of an in vitro-transcribed ncRNA, the purified IS621 transposase/recombinase and short, double stranded oligonucleotides containing the target and donor sequences. The reaction mixture also included NaCl and MgCl2.

Microscale thermophoresis (MST) experiments demonstrated that the ncRNA-transposase/recombinase complex bound both donor and target DNA molecules in a sequence-specific manner. This combination of components led to the expected reciprocal DNA exchange reaction at the CT “core” site with the expected junctions as detected by appropriate PCR assays. Since the ncRNA was capable of binding both the donor IS circle junction containing abutted RE and LE as well as the target, Durrant et al [19][20] have called it a Bridge RNA (Fig. IS110.31).

Fig. IS110.31. Bridge RNA Interaction with Donor and Target. The left of the figure shows the configuration of the bridge RNA with the Target Binding Loop (TBL) which includes the left and right target guide sequences (green characters) and the Donor Binding Loop (DBL) with the left and right donor binding sequences in orange characters. Those residues which are not complementary to the donor or target sequences are shown in grey. Below (orange) and above (green) are the donor (circle junction) and target double strand DNA respectively. The “core” CT dinucleotides are marked in blue. Interaction of the TBL with the target sequence and of the DBL with the donor circle junction (right hand secton) involves unwinding of these double strand DNA segments and annealing of the LTG with the left target (LT) sequence and the RTG with the right target (RT) and of the LDG with the left donor (LD) sequence and the RDG with the right donor (RD) sequence. This facilitates recombination between the two core CT dinucleotides resulting in IS integration. Redrawn from Durrant et al [19].
Testing the Model: an in vivo Plasmid-Based Integration System.

Further support for this “invasion” model was obtained from experiments designed to reprogram either donor or target sequences. The experiments used a 2 plasmid system in vivo: one plasmid, pTarget, carried tnpIS621, the 50 bp target site (a REP sequence) and a flanking promoter; the other, pDonor carries the RE-LE donor circle junction, the bridge RNA and a promoter-less gfp gene. Donor-target recombination places gfp under control of the pTarget promoter (Fig. IS110.32) and can be assayed by measuring fluorescence. This assay was used to monitor the effect of mutations in TnpIS621: alanine substitution of the conserved catalytic residues, DEDD, of the RuvC-like domain (Fig. IS110.8A) or the recombinase domain, S, (Fig. IS110.8B) abolished activity. Gfp expression was measured using a flow cytometer by scraping and resuspending colonies from a plate after co-transformation of a recipient strain with the two plasmids under standard plating conditions. In a number of cases, the plasmid sequences were also obtained to confirm the recombinant structures.

Fig. IS110.32. Gfp Activation Integration Assay. Top panel: Donor and target plasmids. Selective CmR (pTarget) and KmR (pDonor) genes are shown in red, transposase in purple with an IPTG-inducible promoter promoter (Ptnp, blue arrow), target (a REP sequence) in dark green interrupted by the recombination point (CT dinucleotide) in blue and impinged by a synthetic promoter, Bba_R0040 (TetR-Regulated Promoter)(Px, blue arrow), promoterless Gfp gene in light green, and circle junction (donor joint) in brown with the right and left ends intersected by the core CT (blue). The bridge RNA is shown as a dotted line. Bottom: linear depiction of the plasmids and recombinant product. Upper map: Target plasmid with divergent promoters and including the target sequence and transposase gene. Middle map: donor plasmid. Lower map: recombinant plasmid produced by site- (sequence-) specific recombination at the aligned CT dinucleotide cores (blue). Gfp production is driven by the promoter Px and the nc Bridge RNA cannot be expressed because the component which is normally provided by RE is no longer available.
Reprogramming Bridge RNA

The assay was also used to determine whether the target sequences could be changed. A number of changes to the target loop sequence were made (Fig. IS110.32 and Fig. IS110.33) and tested against wildtype target sequence and the corresponding (complementary) target sequence. The results demonstrated that changes in the ncRNA target loop sequences eliminate integration into the wildtype target sequence but result in robust integration into the corresponding modified target sequences (Fig. IS110.33). This sequence reprogramming provides convincing support for the invasion model (Fig. IS110.31). Although the junction promoter is likely to be strong (that of IS492 is stronger than placuv5; Perkins-Balding et al [74] also observed that supplying ncRNA in trans from a strong promoter can further increase the activity of ncRNA on integration (in this case for mutant T5, by almost 2 fold.

Target specificity can therefore be modified by changes in the sequence of the target binding loop sequence.

Fig. IS110.33. Integration of Target Loop Variants. The GFP mean fluorescence intensity (MFI) of E. coli after plasmid recombination using the indicated reprogrammed bridge RNA target-binding loop and target sequences (WT and T1–T7). Bold bases highlight differences relative to the WT target sequence. Mean ± s.d. of three biological replicates. None of the target binding loop mutants gave significant activity with a wildtype sequence.
Flexibility in IS621 Target Specificity.

The flexibility of target recognition was further explored [19] using a plasmid-based high throughput method. One plasmid carried the target (Fig. IS110.34, A) (together with a promoter), the bridge RNA orf (with the wildtype donor binding loop, DBL) separated by a 12 bp barcode, a chloramphenicol resistance gene and the tnpIS621 gene driven by an inducible T7 promoter (Fig. IS110.34, B). The donor plasmid carried the wildtype LE-RE junction (Fig. IS110.26, A) (together with an Ampicillin resistance gene and a promoter-less Kanamycin resistance gene). Integration of the donor into the target would bring the inactive kanamycin resistance gene under control of the promoter from the target site and result in KmR recombinants (Fig. IS110.26, B).

Fig. IS110.34. Screening for Variation in Target Site Sequence Recognition. Top: A) The screen used a library of variable target (Rep) sequences (shown by the red N nucleotides, top left) and a wildtype donor sequence (bottom left) together with a library of bridge RNAs with a library (right) of variable TBL sequences (red N nucleotides, top right) and a wildtype DBL (bottom right). The blue boxes of the donor and target sequences indicate the complementary strand to those in the TLB and DLB sequences. B) the target plasmid including the barcode, symbols are the same as those shown in Fig. IS110.32. Integration results in activation of the KmR gene.

The target and TBL were cloned as a single oligonucleotide (Fig. IS110.35). The core CT dinucleotide was retained in all cases. Non-CT (core) target and corresponding LTG and RTG positions were then varied to assess single and double mismatch tolerance at each position. For this, several oligonucleotide sets were used and cloned by the Gibson method into a vector plasmid carrying the downstream donor binding loop (Fig. IS110.35). These were designed to test: 1) different target guides with single mismatch pairs; 2) double TBL and target mismatches; 3) negative controls ensuring none of the 9 programmable positions (excluding the CT core) matched in the TBL and target; 4) additional single mismatch combinations in TBL and target; 5) how mismatches in the dinucleotide CT core of the bridge RNA sequences affected recombination efficiency.

The results demonstrated that: full complementarity between the target and TBL was highly preferred (both single and double base mismatches severely impacted integration); integration occurred with sequence complementary changes over all positions in the target and TBL could be reprogrammed and reprogramming showed a large degree of flexibility over all positions.

Fig. IS110.35. Cloning of the Oligonucleotide Library. The plasmid used to clone the oligo nucleotide includes the wildtype DBL, a pT7-driven transposase gene and a CmR gene. The oligonucleotide insert contains the mutant target site, two synthetic and divergent promoters, Bba_R0040 (used to drive the KmR gene in the recombinant product) and a J23119 consensus promoter (used to drive expression of the recomposed nc Bridge RNA) separated by the 12 bp barcode sequence and followed by the TBL mutant sequence.
Insertion in vivo: Reprograming the Target site.

In vivo insertion into the E. coli genome was investigated using a conditional replication defective plasmid with a 22bp wildtype IS621 donor sequence and a wildtype IS621 bridge RNA. Following inhibition of plasmid replication while maintaining selection of a plasmid selective marker, 144/173 unique insertions were identified in known Rep sequences: 96% occurred in the naturally observed target sequence (ATCAGGCCTAC) with only 2 with the exact target binding loop sequence (ATCGGGCCTAC) suggesting that the mismatch which would create an rG:dT base pair might be important; 4/10 of the most frequent integration sites may use an extended base-pairing of RTG and RT (i.e. 7 instead of 4 bp) since they are flanked by 5’-GCA-3’ which is complementary to the 5’-UGC-3’ immediately 5’ the RTG (red ribonucleotides in Fig. IS110.30A). Indeed, many of the orthologues naturally include longer RTGs (purple lozenges in Fig. IS110.30A).

Two reprogrammed bridge RNAs were designed to target two unique E. coli target sequences each with a 4 or a 7 RTG/RT base-pairing. While the most frequent insertion sites were observed to be those expected, some off-site insertions were also observed. These were greatly reduced with the extended 7 nt RTG compared to the 4 RTG bridge RNAs.

Reprograming the Donor site

The fact that the IS621 donor sequence was observed to be more conserved than the target sequence (see: Fig. IS110.30B) may render it more difficult to reprogram. To examine this, a system similar to that used in reprograming the target site was used but in which the bridge RNA was produced in cis from the donor junction sequence (Fig. IS110.36). Recombination was, again, designed to activate a KmR gene. Similar to the results of target-TBL sequence variation, donor- DBL mismatches significantly reduced activity.

Fig. IS110.36. Screening for Variation in Donor Site Sequence Recognition. A) The screen used a library of variable donor (LE-RE junction) sequences (shown by the red N nucleotides, bottom left) and a wildtype donor sequence (top left) together with a library of bridge RNAs (right) with variable TBL sequences (red N nucleotides, bottom right) and a wildtype TBL (top right). The blue boxes of the donor and target sequences indicate the complementary strand to those in the TLB and DLB sequences. B) the target plasmid. Integration results in activation of the KmR gene.
Insertion in vivo: Reprograming the Donor site

The insertion activity of donor sequences was determined with the Gfp assay used to examine the target sequences. A number of donor mutants and their paired DBL (Fig. IS110.31: 1-9) were combined with a target sequence (Fig. IS110.33: 5) and its paired TBL sequence. The reprogrammed donor bridge RNAs yielded between 27% and 95% of wildtype activity (Fig.IS110.31) whereas the wt donor performed poorly with each of the mutants. The reaction was dependent on an intact RuvC domain in the transposase.

This confirmed that, like the target loop, the donor loop sequences can be reprogrammed.

Fig. IS110.37. Integration of Donor Loop Variants. The GFP mean fluorescence intensity (MFI) of E. coli after plasmid recombination using the indicated reprogrammed bridge RNA donor-binding loop and donor sequences (WT and 1–9). Bold bases highlight differences relative to the WT donor sequence. Mean ± s.d. of three biological replicates was included in the original figure.
NCR RNA from IS110 Group Members: ISEc21.
Involvement of NCR RNA in ISEc21 Transposition

In addition to IS621, results of a detailed study of another IS110 group member, ISEc21 have shown that an RNA from the upstream NCR region is involved in interaction with the ISEc21 target DNA [28].

Small RNA was recovered associated with TnpISEc21 during purification. RNA seq. of this material produced a strong but extended peak in the upstream NCR (Fig. IS110.38, a). This was of three principal lengths which mapped to the upstream NCR region: nt 1-281, 90-163 and 90-147 (Fig. IS110.38, b). The position of the 3 sRNA spans a region which includes identities to the left and right halves of the of the target site while the entire ISEc21 NCR region, if expressed in its enrirety would also span sequences with identity to the donor site (Fig. IS110.38, c) as has been found by Durrant et al [19] for IS621. The reason for this difference is unclear but in view of the results from their studies on IS1111 group members (in particular ISPa11; Fig. IS110.42B), it seems probable that the longer RNA is biologically relevant and, we find, carries both the target guides and the downstream donor guides (not shown). Siddiquee et al., [28] have called this sRNA seek RNA since it shows complementarity to the target.

The activities of these sRNA in an in vivo coupled reaction involving excision and insertion of a derivative IS circle were tested in a system in which insertion could be monitored by activation of an mCherry gene (Fig. IS110.39). All constructs except RNA 90-163 gave positive results in this assay (Fig. IS110.38, b). One explanation for the absence of activity of this RNA is that the region between nt 147 and 163 may generate a structure unable to pair with the target sequence.

Fig. IS110.38. Organization of ISEc21. a) Map. ISEc21 (yellow horizontal box) with scale in base pairs above; transposase gene (lilac box) and direction of expression (arrowhead); NCR falls within the blue brackets Above shows the results of RNA seq (red) with coordinates in bp indicated. b) Expanded map showing the sRNA species identified (blue) and their capacity to facilitate integration in the mCherry assay (Fig. IS110.39). Dotted lines are linked to the sequence of the NCR and show the left target guide (LTG) and the right target guide (RTG) sequences (green in grey boxes). Also shown are potential right donor guide (RDG) and left donor guide (LDG) sequences (organge in grey boxes). The yellow boxes represent the sequences of the LE-RE containing circle junction. c) Predicted secondary structure of the sRNA showing the position of the LTG and RTG (green).
Fig. IS110.39. mCherry Transposition Assay. Donor plasmid with a promoter-less mCherry gene (pink) flanked by LE and RE (yellow) in turn, flanked by the left and right halves of a target sequence (green); the donor also contains a transposon gene (lilac) and the cloned RNA containing ISEc21 segment (yellow) with a downstream HDV ribozyme (orange) and transcription terminator (blue). Expression is driven by a phage T7 promoter. The target plasmid (red circle) carries a target sequence (green) and a proximal T7 promoter together with a kanamycin resistance gene (red). Excision of the mCherry circle from the donor as a consequence of transposase and NCR expression and its insertion into the target plasmid should result in mCherry expression (deep pink).
Exploring Bridge RNA Secondary Structures from Other IS110 Family Members

Durrant et al [19] also undertook a short survey to determine whether other members of this family also exhibited an RNA with similar structure to the IS621 bridge RNA. A bridge RNA was predicted in nearly 86% of IS110 group members in their library using the RNA covariance models. These were largely located at the left end (see also Fig. IS110.6). Three IS potential bridge RNAs were examined for complementarity to their donor and target sites. These are shown in Fig. IS110.40.1, Fig. IS110.40.2, and Fig. IS110.40.3 and their position on the phylogenetic tree is shown in Fig. IS110.3A. Perhaps surprisingly they include a diverse collection of secondary structures.

RNA from IS1111 Group Members.

Following the proposal that IS1111 group members might use an RNA in the downstream NCR for targeting and integration [30] (Fig. IS110.27A), the Hall group chose the IS1111 group member ISEc11 as a model but also investigated other IS1111 members, ISKpn4, ISPst6 and ISPs25 (which all target one end of certain attC integron cassette sites, ISPa11 which targets REP sequences), ISXne4, and an IS110 member (ISEc21; see above). Their positions in the phylogenetic tree is shown in Fig. IS110.3A.

ISEc11, A Model IS1111 Group Member and Some Others.

ISEc11 (Fig. IS110.41) was isolated originally from an enteroinvasive E. coli (EIEC) strain and is located both on the chromosome and on a large (260-kb) F-like virulence plasmid (pINV) [75]. Southern hybridization showed that it was present in 9 EIEC strains with differences in the number and the relative location of the chromosomal copies: five East African EIEC strains carry 4 ISEc11 copies in the same position, while the in the remaining four the number varies from 0 to 4. Abutted IS ends, presumably circular transposition intermediates, were detected by PCR. They shared a potential target target sequence, 5’-GTNAAAANANTG-3’, and were all inserted in the same orientation. It was proposed that insertion generated a 4bp DR (5’-AAAT-3’).

Functional Analysis

Using a system similar to that used in analysing ISEc21 (Fig. IS110.10) with a target plasmid into which a specific target sequence is inserted and a donor plasmid carrying either a full ISEc11 copy (Fig. IS110.10, A), a copy deleted for the NCR (ΔNCR; Fig. IS110.10, B), or a with an additional plasmid which provides the NCR expressed in trans (Fig. IS110.10, C), it was demonstrated that the downstream NCR was necessary for transposition and could be supplied in trans from another plasmid. Moreover, in the sequence of the circle junction Prosseda et al [75] proposed a 4bp target DR. This has now been included within LE where it would contribute to the -10 promoter component. PCR was used to identify the IS circle junction (Fig. IS110.41, d) and determine its sequence, revealing the formation of the probable junction promoter. Definition of the target sequence and its use in the target plasmid (Fig. IS110.10) confirmed the expected ISEc11 LE and RE flanks in the insertion products (Fig. IS110.41, e) while mutation of the flanking sequences (Fig. IS110.41, f) inhibited both circle formation and integration.

Fig. IS110.41. A) Organization of ISEc11. a) Map. ISEc11 (yellow horizontal box) with scale in base pairs above; transposase gene (lilac box) and direction of expression (arrowhead); NCR falls within the blue brackets. Above shows the results of RNA seq (red) with coordinates in bp indicated. b) Expanded map showing the NCR RNA sequence with the left target guide (LTG) and the right target guide (RTG) sequences (green in grey boxes) and their location on the target sequence below. c) Predicted secondary structure of the sRNA showing the position of the LTG and RTG (green). d) IS circle junction. LE and RE (yellow boxes); -10 and -35 promoter components (grey boxes); Subterminal inverted repeats (red text within grey arrows). e) Sequence of the target and DNA flanks at the left and right IS ends. Left (LE) and right (RE) IS ends are in yellow boxes. Sequence of PCR products containing the Left flank, LF/LE, and right flank, RE/RF, junctions compared to the target. Identity of the target (green) sequence with the LE and RE flanks is represented by “:”. f) Transposition with altered target sequences flanking ISEc11 and in pTarget. (see Fig. IS110.30 for reference) Sequences tested are on the left with consensus target bases green and the boundaries between IS and target indicated by a yellow box [28].
Identification of IS1111 Group ncrRNA

Like that of IS621, an RNA, ncrRNA, was found to copurify with the ISEc11 transposase and its presence increased transposase yield. RNA seq revealed a peak located within the NCR located downstream of the transposase, tnpEc11, gene (Fig. IS110.41, a). This yielded two principal species of ~80 and 150 nt (82-164 and 82-227; Fig. IS110.41, a) although the RNA peak was somewhat disperse. Similar results identifying a long and shorter sRNA were obtained with 5 additional IS1111 group members ISKpn4 (Fig. IS110.42A), ISPa11 (Fig. IS110.35B), ISPst6 (Fig. IS110.42D), ISPa25 (Fig. IS110.42E) and ISXne4. While ISPst6 is very similar to ISKpn4 (Fig. IS110.42D and Fig. IS110.42E), has identical IRst sequences and a Tnp 86% identical and 92% similar to TnpISKpn4, ISPa25 is more distant: TnpISPa25 and TnpISKpn4 and are 46% identical and 60% similar (Fig. IS110.42E). ISKpn4, ISPst6 and ISPa25 fall into the same IS clade (Fig. IS110.3A) and Interestingly the RTG and LTG are nearly identical and identically spaced (Fig. IS110.42E) reflecting their similar target sites.

Fig. IS110.42. A) Organization of ISKpn4. a) Map. ISKpn4 (yellow horizontal box) with scale in base pairs above; transposase gene (lilac box) and direction of expression (arrowhead); NCR falls within the blue brackets. Above shows Ethe results of RNA seq (red) with coordinates in bp from the tnp stop codon indicated. b) Expanded map showing the NCR RNA sequence with the left target guide (LTG) and the right target guide (RTG) sequences (green in grey boxes) and their location on the target sequence below. c) Predicted secondary structure of the sRNA showing the position of the LTG and RTG (green).


Fig. IS110.42. B) Organization of ISPa11. Features are indicated as in A). a) Map. b) Expanded map showing the NCR RNA sequence with the left target guide (LTG) and the right target guide (RTG) sequences (green in grey boxes) and their location on the target sequence below. Also shown are potential right donor guide (RDG) and left donor guide (LDG) sequences (orange in grey boxes). The yellow boxes represent the sequences of the LE-RE containing circle junction. c) Predicted secondary structure of the sRNA showing the position of the LTG and RTG (green).
Fig. IS110.42. C). Predicted LTG/RTG and LDG/RDG in the downstream ISPa11 NCR from Durrant et al [19].
Fig. IS110.42. D) Organization of ISPst6. Features are indicated as in A). a) Map. b) Expanded map with the NCR RNA sequence and left (LTG) and right (RTG) target guide sequences (green in grey boxes) and their location on the target sequence below. c) Alignment of ISKpn4 with ISPst6.


Fig. IS110.42. E) ISPa25 a) Map b) Expanded map with the NCR RNA sequence and left (LTG) and right (RTG) target guide sequences (green in grey boxes) and their location on the target sequence below. c) Alignment of ISPa25 and ISPst6 on ISKpn4. Identities are shown in red. d) Alignment of RTG and LTG of ISKpn4, ISPst6 and ISPa25 [28].

Additionally, Siddiquee et al., [28] identified the equivalent of LTG and RTG in the smaller, majority, RNA from all five IS1111 group IS (Fig. IS110.41; Fig. IS110.42A and Fig. IS110.42B), but the short RNA sequence did not include the donor LDG and RDG sequences. It was noted that the order of LTG and RTG within the IS1111 IS NCR RNA was inverted compared to that found for the IS110 group, ISEc21 (Fig. IS110.42A, b), an observation also made by Durrant et al [19]; Fig. IS110.42C; Fig. IS110.43A and 43B). Since the short RNA would have affinity for the target site but not the donor site, it was called RNA seek. However, the longer RNA (not shown) also includes sequences resembling LTD and RTD.

This is illustrated in the case of ISPa11 analysed by both Siddiquee et al [28] and Durrant et al [19] but can also be seen in the other IS. Inspection of the short RNA sequence of Siddiquee et al [28](Fig. IS110. 42B, b) shows that it terminates within a potential LDG signal. Extending this RNA sequence uncovers not only an LDG but a corresponding RDG which would be present in the long RNA species (Fig. IS110.42B, b). Again, the LDG and RDG are inverted with respect to the IS110 group members. These sequences were those predicted by Durrant et al [19] (Fig. IS110.42C). A similar arrangement was also exhibited by two additional IS1111 group members ISCARN28 and ISAzs32 [19]; Fig. IS110.43A and 43B).

Other IS1111 Group Members.

As in the case of the IS110 group, Durrant et al [19] also undertook a short survey of members of the IS1111 group to identify RNA with similar structure to the IS621 bridge RNA. In addition to those shown in Fig. IS110.37C and Fig. IS110.38, a bridge RNA was predicted in 93% of IS1111 group members in the library using the RNA covariance models. These were largely located in the right end (see also Fig. IS110.6A).

Fig. IS110.43A. Predicted Bridge RNA from 3 IS1111 group Members. Top of the figure shows a map of the IS as a yellow horizontal box containing a purple arrow representing the transposase gene and its direction of expression. The predicted secondary structure is shown below within the blue dotted line which also indicates its location on the IS, its polarity (5’ and 3’ ends), the IS name and length in nucleotides. A code showing the meaning of the symbols is included on the right. The structure shows the left and right target guide sequences (LTG and RTG) as green ellipses and the left and right donor sequences (which interact with the RE-LE junction; LDG and RDG) as brown ellipses. These interactions are indicated in the box on the right, with the target and donor sequences appropriately color coded.
Fig. IS110.43B. Predicted Bridge RNA from 3 IS1111 group Members. Top of the figure shows a map of the IS as a yellow horizontal box containing a purple arrow representing the transposase gene and its direction of expression. The predicted secondary structure is shown below within the blue dotted line which also indicates its location on the IS, its polarity (5’ and 3’ ends), the IS name and length in nucleotides. A code showing the meaning of the symbols is included on the right. The structure shows the left and right target guide sequences (LTG and RTG) as green ellipses and the left and right donor sequences (which interact with the RE-LE junction; LDG and RDG) as brown ellipses. These interactions are indicated in the box on the right, with the target and donor sequences appropriately color coded.
Programming ISEc11 Integration.

Siddiquee et al.,[28] tested whether, like the IS110 member Bridge RNAs (Fig. IS110.33 and Fig. IS110.37; [19], the IS1111 group Seek RNA can be reprogrammed to recognize both alternative target sites. This was explored using ISEc11 in the mCherry assay system (Fig. IS110.39). Transposition was measured by flow cytometry as the percentage of mCherry expressing cells in the population. Two modified long seek RNAs together with the corresponding modified LE and RE flank sequences in the donor gave robust transposition (Fig. IS110.44, e and f) although their target activities were not tested with wildtype seek RNA. It is interesting to note that the short wildtype seek RNA was significantly more efficient in promoting transposition than the long wildtype seek RNA (compare Fig. IS110.44, c and d).

Fig. IS110.44. Reprogramming seekRNA. Both the LE and RE flanks and the target DNA sequences were changed concomitantly. The ISEc11 seekRNA used in the donor plasmid was the long (154 nt) species. Insertion resulted in expression of the mCherry gene carried within two ISEc11 ends from a resident T7 promoter located in the target plasmid (Fig. IS110.32B). The percentage of mCherry-expressing cells in the population was measured by flow cytrometry. c) transposition with wildtype target and long seekRNA, 15% When the portion of the target that flanks the IS on the right was altered and the corresponding changes were made in the seekRNA. d) transposition with wildtype target and short seekRNA, 42% e) transposition to the M1 target occurred at about 23% frequency. f) transposition to the M2 target was 15%.

Use in Genome Modification

Clearly, the use of the mCherry system demonstrates that the IS110 family is capable of delivering a genetic cargo and that TnpISEc11 can be supplied in trans. Siddiquee et al., [28] extended these observations to demonstrate that the ~750bp chloramphenicol acetyltransferase gene (CAT) can also be inserted either upstream or downstream of the tnpISEc11 gene and that the ISEc11 derivative remains transpositionally active. Additionally, Durrant et al [19] designed a GFP reporter system for the IS110 member IS621 which allowed them to demonstrate the capacity of this system to generate deletion and inversion events when donor and target are located on the same DNA molecule. The system was designed such that recombination brought the GFP gene under control of a neighboring adjacent promoter. As might be expected from other systems, such as transposon Tn3 family resolution, deletion occurs when the target and donor sites are present in the same orientation where inversion occurs when they are inverted with respect to one another.

Structural Analysis: the Synaptic Complex Involved in IS621 Circle Integration

Cryo-EM was used to explored the IS621 insertion mechanism in detail [34]. It revealed the organization of the IS621 synaptic integration complex in three different stages of the recombination pathway involved in IS insertion. The complex was assembled using full length (177nt) purified bridge RNA (b-RNA) obtained by in vitro transcription from a T7 promoter (see Fig. IS110.30A), the double stranded RE-LE IS circle junction DNA (j-DNA or d-DNA; 44bp), the double stranded target DNA (t-DNA; 38bp) and purified transposase, TnpIS621, obtained using a standard expression vector. This complex was unstable but could be stabilised by introducing 6 consecutive mismatches in the top strands of d-DNA and t-DNA (positions 2–7; Fig. IS110.45A, top) in TBL and DBL. The structure was solved at 2.5 Å resolution.

It was composed of: 4 TnpIS621 monomers (A-D) (Fig. IS110.45A, bottom left), both TBL and DBL segments of the b-RNA and both t- and d-DNA. The 5’ b-RNA stem loop (Fig. IS110.32) was not visible, suggesting flexibility, its deletion reduced complex stability implying that it may enhance b-RNA/TnpIS621 Interactions. It was also suggested that that two different b-RNA molecules may contribute the TBL and DBL, respectively.

Fig. IS110.45A IS621 Synaptic Integration Complex (PDB ID:8WT6). Top: t-DNA and d-DNA sequences. left (LTG) and right target guide (RTG) sequences (green in grey boxes). Right (RDG) and left donor guide (LDG) sequences (orange in grey boxes). The yellow boxes represent the sequences of the LE-RE containing circle junction. Blue letters show the core nucleotides. Lower case bold characters indicate the mismatches introduced into the sequences which lead to formation of stable complexes. Below left: synaptic complex. All 4 TnpIS621 monomers are color-coded as are the b-RNA, d-DNA and t-DNA molecules. Below right: configuration of DNA and RNA in the synaptic complex.
Fig. IS110.45B IS621 Synaptic Integration Complex (PDB ID:8WT6). Top: Structure of nucleic acids. The positions of the target (green, left) and donor (orange, right) base pairing with the bridge RNA are circled and enlarged (boxes) [19]. Middle: Schematic of the pairing model. Bottom: Simplified Cartoon of the RNA/DNA structures [34]. Bridge RNA is shown in dark blue, target DNA in green and donor DNA in brown. Left and right target and donor DNA is indicated (LT, RT, LD and RD respectively) as are the left and right Target and Donor and Donor guide sequences (LTG, RTG, LGD and RGD respectively). The active site serine 241 is shown as a yellow circle.

In addition to revealing a composite active site which positions the catalytic serine (Tnp) residues adjacent to the recombination sites in both target and donor DNA. Comparison of the three structures identified showed: strand cleavage of target and donor DNA at the composite active sites to generate 5′-phosphoserine covalent intermediates as found in other recombination systems such as Tn3 family transposon resolution and IS607 transposition; creation of a Holliday junction intermediate by strand exchange and rejoining using a 3’OH generated resulting from formation of the 5′-phosphoserine covalent intermediates; and resolution by second strand cleavage

Synaptic Complex Assembly

The synaptic complex is assembled from the two dimeric TnpIS621 complexes: monomers A and B form a dimer which interacts with TBL and t-DNA while C and D constitute a dimer which interacts with the DBL and d-DNA (shown schematically in Fig. IS110.46). The two dimers contact each other via their RuvC domains. The TnpIS621 monomer is folded into three domains (Fig. IS110.46 right): a coiled-coil domain, CC, containing two α-helices; a “transposase” domain, Tnp, including the active site serine 241; and a RuvC domain carrying the DEDD motif. Protomer dimerization between TnpIS621.A and TnpIS621.B and between TnpIS621.C and TnpIS621.D is mediated by the CC domain (Fig. IS110.46 left). Similar protein structural models were predicted for both IS110 (TnpISEc21) and IS1111 (TnpISEc11) family members [28] using AlphaFold. As might be expected, TBL and t-DNA and DBL and d-DNA are base paired (Fig. IS110.43A, bottom right; Fig. IS110.43B; Fig. IS110.44) and t- and b-DNA are bent into an X configuration. Both t- and d-DNA are cleaved bordering the CT core sequences (C8–T9; Fig. IS110.43B, Fig. IS110.44) using the conserved serine (S241; Fig. IS110.8.B) as the nucleophile and forming a covalent 5’-phosphoserine bond with T10 (Fig. IS110.44). Extra-helical bases A43 and A67 in TBL and A116 and A150 in DBL together with syn conformation G nucleotides G48 and G72 in TBL and G121 and G155 in DBL (Fig. IS110.44 middle and left) are highly conserved in IS110 family members and are recognized in the same way by the Tnp domain by all 4 TnpIS621 monomers.

Opening of the t-(target) and d-(donor) DNA Duplexes

The structure also explains how the t-(target) and d-(donor) DNA duplexes are destabilized to facilitate their recognition by b-RNA: clustered tyrosine and methionine residues within the Tnp domains wedge between a number of complementary nucleotides in both duplexes (Fig. IS110.44 middle) and mutation of these amino acids reduces recombination significantly.

Fig. IS110.46. Bridge RNA Interaction with Donor and Target. Bridge RNA is shown in dark blue, target DNA in green and donor DNA in brown. Left and right target and donor DNA is indicated (LT, RT, LD and RD respectively) as are the left and right Target and Donor and Donor guide sequences (LTG, RTG, LGD and RGD respectively). The active site serine 241 is shown as a yellow circle and labelled in a colored box according to the associated Tnp monomer. Left: Model from Durrant et al [19]. The core dinucleotides are within a box. Middle: Simplified Cartoon of the RNA/DNA structures [34]. Extra helical A and syn conformation G nucleotides are shown within blue elipses and their approximate positions indicated by red arrows. The approximate positions of the “wedge” amino acids (Y264, M265 and M268) are shown within colored elipses correspond to each associated monomer. Right: schematic of nucleic acid interactions observed in the structure. Red letters circled in blue indicate conserved extra-helical A and syn configured G. The boxed cartoon illustrated hydrogen bonding between the target and donor sequences.
Fig. IS110.47. TnpIS621 and the Synaptic Complex (PDB ID:8WT7). Right: Structure of monomer D. The structure shows three principal domains: the Tnp domain (yellow circle) showing the position of the catalytic serine 241; the RuvC domain (blue circle) showing the position of D11,102 and 105; and the coiled-coil domain composed of two a-helices. Left: Arrangement of the tetramer. The nucleic acids have been removed. Each monomer in the dimer of dimers is indicated. The figure shows the formation of A/B and C/D dimers via interaction of their coiled-coil domain (CC) and the hybrid or composite A/D and B/C catalytic centers within yellow circles. The acidic residues are shown as red dots and the catalytic serine as a small yellow circle.
Composite Active Sites.

The TnpIS621.B and TnpIS621.D loops carrying S241 interact with those carrying D102 (Fig. IS110.47 right) in TnpIS621.C and TnpIS621.A to form a composite active site between the A/B and the C/D dimer (Fig. IS110.43 left). On the other hand, the S241 loops of TnpIS621.A and TnpIS621.C are disordered and the TnpIS621.B and TnpIS621.D D102 loops have a different conformation to those in TnpIS621.A and TnpIS621.C which form part of the active site.

The TnpIS621 RuvC domain is therefore unusual since it does not act independently, as do other RuvC domains (e.g. IS200/IS605 family TnpB), but functions together with the Tnp domain (i.e. S241) in the composite active site. It was suggested that this arrangement may prevent adventitious DNA cleavage occurring before synaptic complex assembly, a characteristic of a number of other systems such as phage Mu (e.g. Williams et al [76]) and Tn5/IS50 (Protein structure and the transpososome [77] ). The RuvC domains also play a central role in synaptic complex formation since the two dimers contact each other through RuvC–RuvC interactions.

Fig. IS110.48 Recombination Steps in Integration. Target (green); Donor (orange); bridge RNA(blue); mismatched bases (lowercase); S241 (yellow circle) with accompanying colored box indicating which monomer is involved; cleavage point (red triangle); co-ordinates from 1-14 are shown. The “Handshake” are indicated by a red box. bases are indicated Left: b-RNA interaction with target DNA. Top and Bottom: t-DNA and d-DNA sequences. left (LTG) and right target guide (RTG) sequences (green in grey boxes). Right (RDG) and left donor guide (LDG) sequences (orange in grey boxes). The yellow boxes represent the sequences of the LE-RE containing circle junction. Blue letters show the core nucleotides. Lower case bold characters indicate the mismatches introduced into the sequences which lead to formation of stable complexes. Middle top: Target DNA and target loop RNA Interaction. Middle bottom: Donor DNA and donor loop RNA. First Strand Cleavage
"Hand shaking": additional secondary base pairing which facilitates first strand exchange.

This synaptic complex is, however, trapped in the prestrand-transfer step because of the mismatched base pairs in both t-DNA and d-DNA introduced to stabilize the complex (Fig. IS110.45A top; see also Fig.IS110.30A).

Close examination of the covariation signals obtained with a large number of IS621-related IS (e.g. Fig.IS110.30A) revealed weak additional signals which implied base-pairing potential of nt 6 and 7 of target DNA with the long-distant donor RDG (nt 166) and of nt 6 and 7 of donor DNA with the long-distant donor RTG (nt 81). This was called Handshake base pairing and the sequences were named Handshake guides (HSG). It was noted that they play a role in the first strand exchange reaction. Exchange in the wildtype situation increases the potential base pairing (Fig. IS110.48 and Fig.IS110.49 A). Measurement of full recombinants in vitro with wildtype b-RNA (Fig. IS110.42A) showed that in addition to robust recombination products, a significant proportion of cleavage products of the t- and d-DNA had occurred. A series of experiments were designed to examine the effects of Handshake nucleotide complementarity on strand exchange using modified b-RNA. Generating total complementarity of RTG-target and RDG-donor duplex HSG (i.e. prior to strand transfer; Pre-HSG; Fig. IS110.49 B) strongly favoured t- and d-DNA cleavage but eliminated detectable recombination in vitro, whereas modifying the HSG sequences to generate perfect complementarity after strand transfer (Post-HSG; Fig. IS110.49 C) strongly favored DNA recombination in vitro at the expense of d-DNA cleavage products. The “handshake” dinucleotide therefore clearly strongly influences the outcome of the reaction.

Fig. IS110.49. Modifying Target and Donor Complementarity: The Handshake Dinucleotide. Target (green); Donor (orange); bridge RNA(blue); mismatched bases (lowercase); S241 (yellow circle) with accompanying colored box indicating which monomer is involved; cleavage point (red triangle); co-ordinates from 1-14 are shown; mutated nucleotides are shown in red and new inter-strand bond are shown in red..The “Handshake” are indicated by a red box which, in the case of the strand exchange is extended to include the entire 4 nt that are transfered. A: Wildtype Sequences. Schematics of the TBL/DBL and tDNA/dDNA sequences used for cryo-EM analysis and in vitro recombination assays. B and C: pre- and post-HSB (handshake base-pairing) b-RNAs stabilize the synaptic complex in the pre- and post-strand exchange states, respectively. Mutated nucleotides in the pre- and post-HSB bRNAs and their complementary DNA nucleotides are highlighted [34].

To investigate the steps in the reaction, in addition to the synaptic complex assembled with the 7 mismatches in t- and d-DNA (Fig. IS100.48 left, top and bottom; Fig. IS110.50A), structures were resolved using both Pre-HSG b-RNA where recombination is blocked at the pre-strand transfer step (Fig. IS110.49 B; Fig. IS110.50B), and Post-HSG b-RNA where recombination is robust but cleavage is reduced (Fig. IS110.49 C; Fig. IS110.50C).

Fig. IS110.50A-C. Cryo-EM structure of the IS621 synaptic complex. A) PDB ID:8WT6. Synaptic Complex Stabilised by mismatches in t-and d-DNA.
Fig. IS110.50A-C. Cryo-EM structure of the IS621 synaptic complex. B) PDB ID:8WT7. Pre-HSB b-RNA structure. 1st strands of t- and d-DNA cleaved to form 5′-phosphoserine intermediates. HSGs in TBL and DBL form the expected base pairs with the t-DNA and d-DNA and impede 2nd-strand exchange.
Fig. IS110.50A-C. Cryo-EM structure of the IS621 synaptic complex. C) PDB ID:8WT8. Post-HSB b-RNA.

The cryo-EM structure of the post-HSB b-RNA (Fig. IS110.50C) synaptic complex reveal two states: a post 1st strand exchange trapping the Holliday Junction intermediate and a post strand exchange with HJ resolution. An expansion of PDB ID:8WT7 (Fig. IS110.50B) identifies a phophoserine DNA-Ser bond and the positioning of the DEDD cluster to co-oridinate an Mg2+ divalent cation. In one strucutre (Fig. IS110.51 left) the 1st strand transfer of the donor (at DBL) junction appears complete while that of the target (at TBL) is only partially rejoined while in the other (Fig. IS110.51 right) species, the 2nd strand of the donor (at DBL) junction has been cleaved and the 2nd target strand (at TBL) is only partially cleaved.

Fig. IS110.51 TBL–tDNA and DBL–dDNA post-strand exchange synaptic complexes. Target (green); Donor (orange); bridge RNA(blue); mismatched bases (lowercase); S241 (yellow circle) with accompanying colored boxes indicating which monomer is involved; cleavage (red triangle); partial 1st strand rejoining (left) and partial 2nd strand cleavage (right) (green triangles); red boxes indicate the transferred nucleotides. Left: Holliday junction intermediate state. Partial 1st strand rejoining. Right: Holliday junction resolution state. Partial 2nd strand cleavage of donor and cleavage of target [34].

These snapshots provide a detailed overall picture of the way in which the IS LE-RE junctions formed to generate circular transposition intermediates interact with their bridge RNAs as the donor DNA and how the bridge RNA interact with the target. Bridge RNA clearly orchestrates the apposition of IS junction and target DNA generating a defined structure.

The Serine/Threonine Question: A novel Ser Catalyzed Recombination Mechanism

While convincing data has been presented that the serine located in the C-terminal domain (Fig. IS110.8C) is a catalytic amino acid in the recombination reaction leading to integration [19][34], it was pointed out to us by Ruth Hall that this was replaced by the chemically related threonine in about 30% of the IS1111 group members in the ISfinder database (Fig. IS110.8C) (but only in 2% of IS110 group members). To our knowledge, threonine has not (yet) been implicated in the typical type of of serine site-specific recombination reaction. However, the majority of this population also carries a serine directly upstream from the threonine leading to the possibility that the active site residue has been exchanged for the neighboring serine. There are a number of cases without serine in this position and, if active in transposition, it would be interesting to investigate their recombination mechanism.

Although IS110-family transposition uses a serine as the nucleophile, this does not occur using a classic serine site-specific recombinase mechanism. This can be ruled out for a number of reasons: most obviously IS110 recombination involves the bridge or seek RNA whereas serine recombinase catalysis does not; the catalytic serine located at the N-terminal end in classic serine recombinases but in the C-terminal end in IS110 family transposases; the IS110 enzyme lacks some of the key conserved residues of the classic serine recombinases; classic serine recombinases are not associated with a DEDD constellation of amino acids as is observed in IS110 transposases; the serine nucleophile attacks “in trans” rather than “in cis”; and, finally, the configuration of the active site with a phosphoserine intermediate bond with the DEDD cluster providing a coordinated Mg2+ divalent metal cation shows an entirely novel geometry for conservative site-specific recombination (Fig. IS110.52). It remains to be determined whether, in this mechanistic context, threonine can be substituted for serine as the required nucleophile.

Fig. IS110.52. Cryo-EM structure of the IS621 synaptic complex. Active site (from PDB ID:8WT7) showing the covalent phospho-serine covalent linkage and the adjacent Mg2+ coordinated by the DEDD cluster (Image kindly supplied by M. Boocock)

Questions to be Answered

A DNA flank/core conundrum?

The structural and biochemical analyses [19][34] showed that bridge RNA synapses IS621 (IS110 group) aligning both donor and target dinucleotides while the transposase cleaves and catalyzes the strand transfers. While it is not difficult to imagine that different “core” lengths (e.g. IS117 and IS492 of the IS110 group or ISEc11 of the IS1111 group; Fig. IS110.20) could be similarly synapsed as, for example, in the xerC system (see Crozat et al., [78] and references therein). However, the notion of an alignment of donor and target “core” sequences in cases without directly repeated flanks (Fig. IS110.20C) suggests that the recombination reaction may be more accommodating. It is clearly not a “simple” site-specific recombination reaction such as occurs with the "classic" site-specific recombinases and invertases. To answer this question, it will be important to establish a structural model in these cases.

Mechanism Involved in the First Transposition Step: Circle Formation?

However, there are a number of important questions remaining not least, the mechanism by which the IS circular intermediate is generated. Formation using site-specific recombination would be expected to regenerate the original target site. Siddiquee et al., [28] were unable to detect such uninterrupted sequences with the PCR assay used to detect ISEc11 circle intermediates. This suggests that excision does not occur using a classical double-strand site-specific recombination mechanism. It remains possible that excision occurs using a single-strand recombination accompanied by a replicative step in a copy-out-paste-in mechanism similar to that used by the IS3 family and other IS families. None of the recent studies have addressed this step of the transposition process.

Long and short: How is IS1111 NCR RNA Generated: Processing?

It should be noted that the failure of Siddiquee et al.,[28] to identify full length Bridge RNAs may simply be due to the way in which the RNA species were generated: Durrant et al.,[19] generated Bridge RNA directly by transcription of a cloned RE-LE junction whereas Siddiquee et al., [28] defined the RNA from co-purification with the transposase. This raises the interesting question for both the IS110 and IS1111 groups of how the RNA which co-purifies with the transposase is produced. In the case of ISPa11, no specific NCR promoter was identified by inspection and it was suggested that the small RNA is generated from a longer transcript [28], possibly from the transposase mRNA.

This has been demonstrated in the case of the guide RNA from IS200/IS605 family members where the TnpB guide endonuclease is involved (see: IS200/IS605 family: RNA Nomenclature, Processing, Structure, Diversity and mode of function). It probably also occurs in generating the upstream RNA virulence repressor of IS200, arc200, from the tnpA mRNA (Fig. IS200.74) [79].

It would be interesting to determine whether the presence of the shorter seek RNA requires transposase catalytic activity and whether “full length” Bridge RNA can be processed by the transposase.

Is there a Biological Significance to the High Level of the shorter Seek RNA species?

The observation that the shorter sRNA species is the major RNA product which purifies with the transposase of both IS1111 group members (ISEc21, ISKpn4 and ISPa11; Fig. IS110.41, 42A, 42B) and IS110 group member, ISEc21; Fig. IS110.38) and that the longer RNA is significantly less abundant is intriguing. A trivial explanation would be that it has a higher affinity for the transposase than bridge RNA. The short RNA was not identified by Durrant et al., [19] presumably because their approach would not necessarily have detected such species. One notion would be that rather than a degradation product, the small seek RNA is in some way involved in IS circularization for example, by recognizing the two flanking segments of the target sequence. Another possibility is that it acts in trans to “prime” suitable targets in the host genome for recognition by the IS circle.

Additionally, is the long RNA carrying the LDG and RDG sequences required for integration or is it involved in assuring the formation of the IS circle? Do both short and long RNA have similar affinity for the transposase?

Possibility of regulation by asr9-like anti RNA?

An important consideration is the regulatory role and presence of anti-RNA such as asr9 found in ISPpu9 [31] in other IS110 family members. This, to our knowledge, has not received further attention. It should be noted that an upstream NCR (UTR) in the unrelated IS200 (see: IS200 Regulation and Salmonella Pathogenicity) is processed to become a repressor of transcription of certain Salmonella host virulence-associated genes [79]. Expression of an anti-RNA, art200, leads to RNA-anti-RNA interactions between complementary secondary structures in the NTR and degradation of transposase mRNA (including the 5’ processed NCR region). It therefore seems possible that, because of their similar organisation, IS110 family members might also be regulated in this way.

Acknowledgements

We would like to thank Anna Karls (University of Georgia) for early discussions concerning IS492 transposition, Matthew Durrant and Nicholas Perry (Arc Institute and UC Berkley, Berkley, USA) for providing information and figures concerning the structure and activities of Bridge RNA and for the phylogenetic tree, and Fernando Rojo (Centro Nacional de Biotecnología, CSIC, Madrid, Spain) for discussions concerning ISPpu9. We would also like to thank Ruth Hall, Sandro F. Ataide and Sally Partridge (The University of Sydney), Phoebe Rice (University of Chicago) and Martin Boocock (University of Glasgow) for providing Fig. IS110.52 and their comments and suggestions.

Bibliography

  1. Chater et al.. Physical and genetic analysis of IS110, a transposable element of Streptomyces coelicolor A3(2). Molecular & general genetics : MGG. 1985. 200. pp. 235-9. doi: 10.1007/BF00425429. PMID: 2993819.
  2. 2.0 2.1 Siguier et al.. ISfinder: the reference centre for bacterial insertion sequences. Nucleic acids research. 2006. 34. pp. D32-6. doi: 10.1093/nar/gkj014. PMID: 16381877.
  3. Hoover et al.. A Coxiella burnetti repeated DNA element resembling a bacterial insertion sequence. Journal of bacteriology. 1992. 174. pp. 5540-8. doi: 10.1128/jb.174.17.5540-5548.1992. PMID: 1324903.
  4. Vary et al.. Use of highly specific DNA probes and the polymerase chain reaction to detect Mycobacterium paratuberculosis in Johne's disease. Journal of clinical microbiology. 1990. 28. pp. 933-7. doi: 10.1128/jcm.28.5.933-937.1990. PMID: 2351737.
  5. Whipple et al.. Identification of restriction fragment length polymorphisms in DNA from Mycobacterium paratuberculosis. Journal of clinical microbiology. 1990. 28. pp. 2561-4. doi: 10.1128/jcm.28.11.2561-2564.1990. PMID: 1979332.
  6. Ritacco et al.. Use of IS901 and IS1245 in RFLP typing of Mycobacterium avium complex: relatedness among serovar reference strains, human and animal isolates. The international journal of tuberculosis and lung disease : the official journal of the International Union against Tuberculosis and Lung Disease. 1998. 2. pp. 242-51. PMID: 9526198.
  7. Kunze et al.. IS901, a new member of a widespread class of atypical insertion sequences, is associated with pathogenicity in Mycobacterium avium. Molecular microbiology. 1991. 5. pp. 2265-72. doi: 10.1111/j.1365-2958.1991.tb02157.x. PMID: 1685008.
  8. Ahrens et al.. Two markers, IS901-IS902 and p40, identified by PCR and by using monoclonal antibodies in Mycobacterium avium strains. Journal of clinical microbiology. 1995. 33. pp. 1049-53. doi: 10.1128/jcm.33.5.1049-1053.1995. PMID: 7615703.
  9. Kunze et al.. Biologically distinct subtypes of Mycobacterium avium differ in possession of insertion sequence IS901. Journal of clinical microbiology. 1992. 30. pp. 2366-72. doi: 10.1128/jcm.30.9.2366-2372.1992. PMID: 1328288.
  10. Collins et al.. Use of four DNA insertion sequences to characterize strains of the Mycobacterium avium complex isolated from animals. Molecular and cellular probes. 1997. 11. pp. 373-80. doi: 10.1006/mcpr.1997.0131. PMID: 9375297.
  11. Denison et al.. IS1111 insertion sequences of Coxiella burnetii: characterization and use for repetitive element PCR-based differentiation of Coxiella burnetii isolates. BMC microbiology. 2007. 7. pp. 91. doi: 10.1186/1471-2180-7-91. PMID: 17949485.
  12. Seshadri et al.. Complete genome sequence of the Q-fever pathogen Coxiella burnetii. Proceedings of the National Academy of Sciences of the United States of America. 2003. 100. pp. 5455-60. doi: 10.1073/pnas.0931379100. PMID: 12704232.
  13. Rozental et al.. Coxiella burnetii, the agent of Q fever in Brazil: its hidden role in seronegative arthritis and the importance of molecular diagnosis based on the repetitive element IS1111 associated with the transposase gene. Memorias do Instituto Oswaldo Cruz. 2012. 107. pp. 695-7. doi: 10.1590/s0074-02762012000500021. PMID: 22850965.
  14. Bartlett et al.. Variable expression of extracellular polysaccharide in the marine bacterium Pseudomonas atlantica is controlled by genome rearrangement. Proceedings of the National Academy of Sciences of the United States of America. 1988. 85. pp. 3923-7. doi: 10.1073/pnas.85.11.3923. PMID: 16593937.
  15. Bartlett & Silverman. Nucleotide sequence of IS492, a novel insertion sequence causing variation in extracellular polysaccharide production in the marine bacterium Pseudomonas atlantica. Journal of bacteriology. 1989. 171. pp. 1763-6. doi: 10.1128/jb.171.3.1763-1766.1989. PMID: 2537827.
  16. Partridge & Hall. The IS1111 family members IS4321 and IS5075 have subterminal inverted repeats and target the terminal inverted repeats of Tn21 family transposons. Journal of bacteriology. 2003. 185. pp. 6371-84. doi: 10.1128/JB.185.21.6371-6384.2003. PMID: 14563872.
  17. 17.0 17.1 17.2 17.3 17.4 17.5 17.6 Lauf et al.. Identification and characterisation of IS1383, a new insertion sequence isolated from Pseudomonas putida strain H. FEMS microbiology letters. 1999. 170. pp. 407-12. doi: 10.1111/j.1574-6968.1999.tb13401.x. PMID: 9933934.
  18. 18.0 18.1 18.2 18.3 18.4 18.5 18.6 18.7 Partridge & Hall. The IS1111 family members IS4321 and IS5075 have subterminal inverted repeats and target the terminal inverted repeats of Tn21 family transposons. Journal of bacteriology. 2003. 185. pp. 6371-84. doi: 10.1128/JB.185.21.6371-6384.2003. PMID: 14563872.
  19. 19.00 19.01 19.02 19.03 19.04 19.05 19.06 19.07 19.08 19.09 19.10 19.11 19.12 19.13 19.14 19.15 19.16 19.17 19.18 19.19 19.20 19.21 19.22 19.23 19.24 19.25 19.26 19.27 19.28 19.29 19.30 19.31 19.32 19.33 Durrant et al.. Bridge RNAs direct modular and programmable recombination of target and donor DNA. bioRxiv : the preprint server for biology. 2024. doi: 10.1101/2024.01.24.577089. PMID: 38328150.
  20. 20.00 20.01 20.02 20.03 20.04 20.05 20.06 20.07 20.08 20.09 20.10 Durrant et al.. Bridge RNAs direct programmable recombination of target and donor DNA. Nature. 2024. 630. pp. 984-993. doi: 10.1038/s41586-024-07552-4. PMID: 38926615.
  21. 21.0 21.1 21.2 21.3 21.4 21.5 21.6 21.7 Choi et al.. A novel IS element, IS621, of the IS110/IS492 family transposes to a specific site in repetitive extragenic palindromic sequences in Escherichia coli. Journal of bacteriology. 2003. 185. pp. 4891-900. doi: 10.1128/JB.185.16.4891-4900.2003. PMID: 12897009.
  22. 22.0 22.1 22.2 22.3 Tobiason et al.. Conserved amino acid motifs from the novel Piv/MooV family of transposases and site-specific recombinases are required for catalysis of DNA inversion by Piv. Molecular microbiology. 2001. 39. pp. 641-51. doi: 10.1046/j.1365-2958.2001.02276.x. PMID: 11169105.
  23. 23.0 23.1 Buchner et al.. Piv site-specific invertase requires a DEDD motif analogous to the catalytic center of the RuvC Holliday junction resolvases. Journal of bacteriology. 2005. 187. pp. 3431-7. doi: 10.1128/JB.187.10.3431-3437.2005. PMID: 15866929.
  24. Marrs et al.. Identification, cloning, and sequencing of piv, a new gene involved in inverting the pilin genes of Moraxella lacunata. Journal of bacteriology. 1990. 172. pp. 4370-7. doi: 10.1128/jb.172.8.4370-4377.1990. PMID: 1973927.
  25. 25.0 25.1 25.2 25.3 25.4 25.5 Choi et al.. A novel IS element, IS621, of the IS110/IS492 family transposes to a specific site in repetitive extragenic palindromic sequences in Escherichia coli. Journal of bacteriology. 2003. 185. pp. 4891-900. doi: 10.1128/JB.185.16.4891-4900.2003. PMID: 12897009.
  26. Lenich & Glasgow. Amino acid sequence homology between Piv, an essential protein in site-specific DNA inversion in Moraxella lacunata, and transposases of an unusual family of insertion elements. Journal of bacteriology. 1994. 176. pp. 4160-4. doi: 10.1128/jb.176.13.4160-4164.1994. PMID: 8021196.
  27. 27.0 27.1 27.2 27.3 Skaar et al.. Analysis of the Piv recombinase-related gene family of Neisseria gonorrhoeae. Journal of bacteriology. 2005. 187. pp. 1276-86. doi: 10.1128/JB.187.4.1276-1286.2005. PMID: 15687191.
  28. 28.00 28.01 28.02 28.03 28.04 28.05 28.06 28.07 28.08 28.09 28.10 28.11 28.12 28.13 28.14 28.15 28.16 28.17 28.18 28.19 28.20 28.21 28.22 28.23 28.24 28.25 28.26 28.27 Siddiquee et al.. A programmable seekRNA guides target selection by IS1111 and IS110 type insertion sequences. Nature communications. 2024. 15. pp. 5235. doi: 10.1038/s41467-024-49474-9. PMID: 38898016.
  29. Tetu & Holmes. A family of insertion sequences that impacts integrons by specific targeting of gene cassette recombination sites, the IS1111-attC Group. Journal of bacteriology. 2008. 190. pp. 4959-70. doi: 10.1128/JB.00229-08. PMID: 18487340.
  30. 30.0 30.1 30.2 30.3 Post & Hall. Insertion sequences in the IS1111 family that target the attC recombination sites of integron-associated gene cassettes. FEMS microbiology letters. 2009. 290. pp. 182-7. doi: 10.1111/j.1574-6968.2008.01412.x. PMID: 19025573.
  31. 31.00 31.01 31.02 31.03 31.04 31.05 31.06 31.07 31.08 31.09 31.10 31.11 Gómez-García et al.. Expression of the ISPpu9 transposase of Pseudomonas putida KT2440 is regulated by two small RNAs and the secondary structure of the mRNA 5'-untranslated region. Nucleic acids research. 2021. 49. pp. 9211-9228. doi: 10.1093/nar/gkab672. PMID: 34379788.
  32. 32.00 32.01 32.02 32.03 32.04 32.05 32.06 32.07 32.08 32.09 32.10 32.11 32.12 32.13 32.14 32.15 32.16 32.17 Elena Parés-Guillén, Luis Yuste, Fernando Rojo, Renata Moreno. The ISPpu9 insertion sequence of Pseudomonas putida KT2440 generates various circular intermediates enabling modular transposition. doi: https://doi.org/10.1101/2025.01.17.633520
  33. 33.0 33.1 Buchner et al.. Piv site-specific invertase requires a DEDD motif analogous to the catalytic center of the RuvC Holliday junction resolvases. Journal of bacteriology. 2005. 187. pp. 3431-7. doi: 10.1128/JB.187.10.3431-3437.2005. PMID: 15866929.
  34. 34.0 34.1 34.2 34.3 34.4 34.5 34.6 34.7 Hiraizumi et al.. Structural mechanism of bridge RNA-guided recombination. Nature. 2024. 630. pp. 994-1002. doi: 10.1038/s41586-024-07570-2. PMID: 38926616.
  35. Tobiason et al.. Multiple DNA binding activities of the novel site-specific recombinase, Piv, from Moraxella lacunata. The Journal of biological chemistry. 1999. 274. pp. 9698-706. doi: 10.1074/jbc.274.14.9698. PMID: 10092658.
  36. Duckett et al.. The structure of the Holliday junction, and its resolution. Cell. 1988. 55. pp. 79-89. doi: 10.1016/0092-8674(88)90011-6. PMID: 3167979.
  37. Ariyoshi et al.. Atomic structure of the RuvC resolvase: a holliday junction-specific endonuclease from E. coli. Cell. 1994. 78. pp. 1063-72. doi: 10.1016/0092-8674(94)90280-1. PMID: 7923356.
  38. Tizard et al.. p43, the protein product of the atypical insertion sequence IS900, is expressed in Mycobacterium paratuberculosis. Journal of general microbiology. 1992. 138 Pt 8. pp. 1729-36. doi: 10.1099/00221287-138-8-1729. PMID: 1326596.
  39. 39.0 39.1 39.2 39.3 39.4 Henderson et al.. Structural and functional analysis of the mini-circle, a transposable element of Streptomyces coelicolor A3(2). Molecular microbiology. 1989. 3. pp. 1307-18. doi: 10.1111/j.1365-2958.1989.tb00112.x. PMID: 2575701.
  40. 40.0 40.1 Henderson et al.. Transposition of IS117 (the Streptomyces coelicolor A 3 (2) mini-circle) to and from a cloned target site and into secondary chromosomal sites. Molecular & general genetics : MGG. 1990. 224. pp. 65-71. doi: 10.1007/BF00259452. PMID: 2177525.
  41. 41.0 41.1 Smokvina & Hopwood. Analysis of secondary integration sites for IS117 in Streptomyces lividans and their role in the generation of chromosomal deletions. Molecular & general genetics : MGG. 1993. 239. pp. 90-6. doi: 10.1007/BF00281606. PMID: 8389980.
  42. 42.0 42.1 Leskiw et al.. Discovery of an insertion sequence, IS116, from Streptomyces clavuligerus and its relatedness to other transposable elements from actinomycetes. Journal of general microbiology. 1990. 136. pp. 1251-8. doi: 10.1099/00221287-136-7-1251. PMID: 1700062.
  43. 43.0 43.1 43.2 Perkins-Balding et al.. Excision of IS492 requires flanking target sequences and results in circle formation in Pseudoalteromonas atlantica. Journal of bacteriology. 1999. 181. pp. 4937-48. doi: 10.1128/JB.181.16.4937-4948.1999. PMID: 10438765.
  44. Higgins et al.. Site-specific insertion of IS492 in Pseudoalteromonas atlantica. Journal of bacteriology. 2009. 191. pp. 6408-14. doi: 10.1128/JB.00771-09. PMID: 19684137.
  45. 45.0 45.1 45.2 45.3 Müller et al.. The inverted repeats of IS1384, a newly described insertion sequence from Pseudomonas putida strain H, represent the specific target for integration of IS1383. Molecular genetics and genomics : MGG. 2001. 265. pp. 1004-10. doi: 10.1007/s004380100495. PMID: 11523772.
  46. 46.0 46.1 46.2 Prosseda et al.. Plasticity of the P junc promoter of ISEc11, a new insertion sequence of the IS1111 family. Journal of bacteriology. 2006. 188. pp. 4681-9. doi: 10.1128/JB.00332-06. PMID: 16788177.
  47. Smokvina et al.. Transposition of IS117, the 2.5 kb Streptomyces coelicolor A3(2) 'minicircle': roles of open reading frames and origin of tandem insertions. Molecular microbiology. 1994. 12. pp. 459-68. doi: 10.1111/j.1365-2958.1994.tb01034.x. PMID: 8065263.
  48. Higgins et al.. Chromosomal context directs high-frequency precise excision of IS492 in Pseudoalteromonas atlantica. Proceedings of the National Academy of Sciences of the United States of America. 2007. 104. pp. 1901-6. doi: 10.1073/pnas.0608633104. PMID: 17264213.
  49. 49.0 49.1 49.2 49.3 49.4 Tobes & Pareja. Bacterial repetitive extragenic palindromic sequences are DNA targets for Insertion Sequence elements. BMC genomics. 2006. 7. pp. 62. doi: 10.1186/1471-2164-7-62. PMID: 16563168.
  50. Ramos-González et al.. Characterization of the Pseudomonas putida mobile genetic element ISPpu10: an occupant of repetitive extragenic palindromic sequences. Journal of bacteriology. 2006. 188. pp. 37-44. doi: 10.1128/JB.188.1.37-44.2006. PMID: 16352819.
  51. Fogg et al.. Sequence and functional-group specificity for cleavage of DNA junctions by RuvC of Escherichia coli. Biochemistry. 1999. 38. pp. 11349-58. doi: 10.1021/bi990926n. PMID: 10471285.
  52. He et al.. The IS200/IS605 Family and "Peel and Paste" Single-strand Transposition Mechanism. Microbiology spectrum. 2015. 3. doi: 10.1128/microbiolspec.MDNA3-0039-2014. PMID: 26350330.
  53. 53.0 53.1 Bachellier et al.. Bacterial interspersed mosaic elements (BIMEs) are a major source of sequence polymorphism in Escherichia coli intergenic regions including specific associations with a new insertion sequence. Genetics. 1997. 145. pp. 551-62. doi: 10.1093/genetics/145.3.551. PMID: 9055066.
  54. 54.0 54.1 Bachellier et al.. Bacterial interspersed mosaic elements (BIMEs) are present in the genome of Klebsiella. Molecular microbiology. 1993. 7. pp. 537-44. doi: 10.1111/j.1365-2958.1993.tb01144.x. PMID: 8459773.
  55. 55.0 55.1 Bachellier et al.. Structural and functional diversity among bacterial interspersed mosaic elements (BIMEs). Molecular microbiology. 1994. 12. pp. 61-70. doi: 10.1111/j.1365-2958.1994.tb00995.x. PMID: 8057840.
  56. Bachellier et al.. Short palindromic repetitive DNA elements in enterobacteria: a survey. Research in microbiology. 1999. 150. pp. 627-39. doi: 10.1016/s0923-2508(99)00128-x. PMID: 10673002.
  57. Nunvar et al.. Identification and characterization of repetitive extragenic palindromes (REP)-associated tyrosine transposases: implications for REP evolution and dynamics in bacterial genomes. BMC genomics. 2010. 11. pp. 44. doi: 10.1186/1471-2164-11-44. PMID: 20085626.
  58. Nunvar et al.. Evolution of REP diversity: a comparative study. BMC genomics. 2013. 14. pp. 385. doi: 10.1186/1471-2164-14-385. PMID: 23758774.
  59. 59.0 59.1 Ramos-González et al.. Characterization of the Pseudomonas putida mobile genetic element ISPpu10: an occupant of repetitive extragenic palindromic sequences. Journal of bacteriology. 2006. 188. pp. 37-44. doi: 10.1128/JB.188.1.37-44.2006. PMID: 16352819.
  60. Aranda-Olmedo et al.. Species-specific repetitive extragenic palindromic (REP) sequences in Pseudomonas putida. Nucleic acids research. 2002. 30. pp. 1826-33. doi: 10.1093/nar/30.8.1826. PMID: 11937637.
  61. 61.0 61.1 61.2 61.3 Tobes & Pareja. Bacterial repetitive extragenic palindromic sequences are DNA targets for Insertion Sequence elements. BMC genomics. 2006. 7. pp. 62. doi: 10.1186/1471-2164-7-62. PMID: 16563168.
  62. 62.0 62.1 Tetu & Holmes. A family of insertion sequences that impacts integrons by specific targeting of gene cassette recombination sites, the IS1111-attC Group. Journal of bacteriology. 2008. 190. pp. 4959-70. doi: 10.1128/JB.00229-08. PMID: 18487340.
  63. 63.0 63.1 Mazel. Integrons: agents of bacterial evolution. Nature reviews. Microbiology. 2006. 4. pp. 608-20. doi: 10.1038/nrmicro1462. PMID: 16845431.
  64. Hall et al.. Site-specific insertion of genes into integrons: role of the 59-base element and determination of the recombination cross-over point. Molecular microbiology. 1991. 5. pp. 1941-59. doi: 10.1111/j.1365-2958.1991.tb00817.x. PMID: 1662753.
  65. 65.0 65.1 Bouvier et al.. Structural features of single-stranded integron cassette attC sites and their role in strand selection. PLoS genetics. 2009. 5. pp. e1000632. doi: 10.1371/journal.pgen.1000632. PMID: 19730680.
  66. Cambray et al.. Integrons. Annual review of genetics. 2010. 44. pp. 141-66. doi: 10.1146/annurev-genet-102209-163504. PMID: 20707672.
  67. MacDonald et al.. Structural basis for broad DNA-specificity in integron recombination. Nature. 2006. 440. pp. 1157-62. doi: 10.1038/nature04643. PMID: 16641988.
  68. Olsen et al.. A novel IS element, ISMpa1, in Mycobacterium avium subsp. paratuberculosis. Veterinary microbiology. 2004. 98. pp. 297-306. doi: 10.1016/j.vetmic.2003.10.025. PMID: 15036538.
  69. Duval-Valentin et al.. Transient promoter formation: a new feedback mechanism for regulation of IS911 transposition. The EMBO journal. 2001. 20. pp. 5802-11. doi: 10.1093/emboj/20.20.5802. PMID: 11598022.
  70. Ton-Hoang et al.. Assembly of a strong promoter following IS911 circularization and the role of circles in transposition. The EMBO journal. 1997. 16. pp. 3357-71. doi: 10.1093/emboj/16.11.3357. PMID: 9214651.
  71. Lyras & Rood. Transposition of Tn4451 and Tn4453 involves a circular intermediate that forms a promoter for the large resolvase, TnpX. Molecular microbiology. 2000. 38. pp. 588-601. doi: 10.1046/j.1365-2958.2000.02154.x. PMID: 11069682.
  72. Sánchez-Hevia et al.. Influence of the Hfq and Crc global regulators on the control of iron homeostasis in Pseudomonas putida. Environmental microbiology. 2018. 20. pp. 3484-3503. doi: 10.1111/1462-2920.14263. PMID: 29708644.
  73. Seemayer et al.. CCMpred--fast and precise prediction of protein residue-residue contacts from correlated mutations. Bioinformatics (Oxford, England). 2014. 30. pp. 3128-30. doi: 10.1093/bioinformatics/btu500. PMID: 25064567.
  74. Perkins-Balding et al.. Excision of IS492 requires flanking target sequences and results in circle formation in Pseudoalteromonas atlantica. Journal of bacteriology. 1999. 181. pp. 4937-48. doi: 10.1128/JB.181.16.4937-4948.1999. PMID: 10438765.
  75. 75.0 75.1 Prosseda et al.. Plasticity of the P junc promoter of ISEc11, a new insertion sequence of the IS1111 family. Journal of bacteriology. 2006. 188. pp. 4681-9. doi: 10.1128/JB.00332-06. PMID: 16788177.
  76. Williams et al.. Organization and dynamics of the Mu transpososome: recombination by communication between two active sites. Genes & development. 1999. 13. pp. 2725-37. doi: 10.1101/gad.13.20.2725. PMID: 10541558.
  77. Naumann & Reznikoff. Trans catalysis in Tn5 transposition. Proceedings of the National Academy of Sciences of the United States of America. 2000. 97. pp. 8944-9. doi: 10.1073/pnas.160107997. PMID: 10908658.
  78. Crozat et al.. Resolution of Multimeric Forms of Circular Plasmids and Chromosomes. Microbiology spectrum. 2014. 2. doi: 10.1128/microbiolspec.PLAS-0025-2014. PMID: 26104344.
  79. 79.0 79.1 Ellis et al.. A transposon-derived small RNA regulates gene expression in Salmonella Typhimurium. Nucleic acids research. 2017. 45. pp. 5470-5486. doi: 10.1093/nar/gkx094. PMID: 28335027.

How to Cite ?

TnPedia Team. (2025). TnPedia: IS110/IS1111 Family of Prokaryotic Insertion Sequences. Zenodo. https://doi.org/10.5281/zenodo.15539859

DOI badge