IS Families/IS1202 family

From TnPedia
Revision as of 18:52, 2 July 2023 by TnCentral (talk | contribs)
Jump to navigation Jump to search
This chapter has appeared in a modified form as:  A subclass of the IS1202 family of bacterial insertion sequences targets XerCD recombination sites. Patricia Siguier, Philippe Rousseau, François Cornet, Michael Chandler. Plasmid 2023 Jun 9;127:102696. doi:10.1016/j.plasmid.2023.102696 [1].


The IS1202 Family

History

The founding member of this family, the 1,747 bp IS1202, identified in Streptococcus pneumoniae ��(1)�, is bordered by 23 bp imperfect inverted repeat sequences (IR), contains a single open reading frame sufficient to encode a 54.4-kDa polypeptide and is flanked by a 27 bp direct target repeat sequence (DR). IS1202 was initially classified in ISfinder as an emerging IS family (ISNCY – not classified yet) ��(2)� (See IS families table at: TnPedia) but further genome analysis ��(3–5)� identified over 150 related examples which, together, constitute the IS1202 family.

Many were identified in public databases by reiterative BLAST approaches ��(6)� with the primary transposase sequence of representative elements used as a query in a BLASTP ��(7)�. The collection of IS forms a coherent family, the IS1202 family, based on their transposase sequences ��(3,4)�. While certain have impacted some important properties of their hosts (e.g. deletion of the capsular polysaccharide locus (cps), a major Streptococcus pneumoniae virulence factor, key to survival in the blood ��(8)� and insertional inactivation of lipopolysaccharide gene, lpxA, resulting in colistin dependence possibly leading to colistin resistance ��(9)�), a potentially more interesting property is that a specific subclass of the family targets xrs (xer) recombination sites (Target-specificity of ISAba32 subgroup members), including the universal chromosome termination site, dif (see for example ��(10)�).

Organization, Phylogenetic Analysis and Identification of Three Major Subgroups.

Family members range from 1,320 to 1,990 bp in length with a single Tpase orf of between 400 and 500 amino acids long in a single reading frame (Fig. IS1202.1A-D). Several, (i.e. tISKpn21, tISKpn65, tISKpn63, tISShal2 and tISRel10) also carry a single passenger gene, annotated as “hypothetical protein”, which is unrelated in each case.

Fig.IS1202.1. General structure of members of the IS1202 family. A) The IS is shown as a blue box with blue triangles indicating the left and right inverted repeats, IRL and IRR. The transposase, Tnp, open reading frame is shown within the box as a white arrow to indicate the direction of expression. Flanking Direct target repeat sequences, DR, are indicated by light blue boxes. The DR length of the three IS1202 subgroups is shown below. Siguier et al 2022; 2023 PMID: 37302728. B), C) and D) Length distribution of ISAba32, ISTde1 and IS1202 subgroups.

The 166 examples fall into three principal subgroups defined by Tpase alignments (Fig. IS1202. 2). Each has been named after one of their members as: IS1202, ISTde1 and ISAba32 ��(3,4)�. Note that these groups are largely similar to those proposed by Harmer et al ��(5)� with slight differences presumably because those shown here are based on a larger IS library: the ISAba32 (61 examples;��(3,4)�) and ISAjo2 (16 examples; ��(5)�) groups are equivalent, whereas ISCARN52, ISCARN62 and ISCARN63 and several others which appear in the IS1202 subgroup of Harmer et al ��(5)� (10 examples) do not fall into the IS1202 subgroup of Siguier et al ��(3,4)� (38 examples;), rather they are included in the ISTde1 subgroup along with ISCARN112 and ISEsa1 (ISEsa1 subgroup; 2 examples; ��(5)�). A phylogenetic tree, rooted with members of the IS481 family is shown in Fig. IS1202.2. The four family members which also carry apparently unrelated passenger genes are not restricted to a single subgroup: for example, whereas tISKpn21, tISKpn65, tISKpn63, and tISRel10 all belong to the ISAba32 subgroup, tISShal2 belongs to the IS1202 subgroup.

Fig.IS1202.2. Phylogenetic Tree Based on Transposase Amino Acid Sequences of 166 IS. Relatively closely relate IS481 family members were used as an outgroup to root the tree. The IS names are colored coded according to the length of the DRs they generate: lavender, very long DR, 24-29 bp; red, long DR, 15-18 bp; blue bold, short DR, 5-6 bp. Names marked in green are IS which do not have DR in available sequences. Those shaded in blue are IS which are located next to a xrs site. There is a clear correlation between length of IR and xrs proximity. The blue arrows indicate individual IS which vary slightly from the overall pattern: (2) Sequences with no flanks ; (1), (4) sequences with DR which do not correspond to the group; (3) potential xrs. Siguier et al 2022; 2023 PMID: 37302728


Direct Target Repeat Length

Members of the three IS1202 family subgroups also generate DRs of three different lengths: Short, 5-6 bp (ISAba32); Medium, 15-18 bp (ISTde1); and Long 24-29 bp, (IS1202) (Fig. IS1202. 1 and 2). In the case of a few examples, no direct repeats are present. However, in many cases, other copies of the same IS identified elsewhere did exhibit DRs ��(3,4)�. The absence of DR in these cases could therefore simply be the result of intra-replicon recombination between two resident IS copies, leading to the separation of the flanking DR sequences, the result from genetic drift in the DR or, since all were found in genomes assembled from shotgun sequencing, by errors of assembly. Similar results were obtained by Harmer et al ��(5)�.

Transposase Signatures

A domain search (COG and HMMER/PFAM and a de novo search with MEME) revealed two major domains: an N-terminal helix-turn-helix (HTH) DNA-binding domain and a DDE-type RNase fold catalytic domain (Fig. IS1202. 3A).

Fig.IS1202.3. Schema of the transposase domains with the 6 conserved motifs. A) The six conserved motifs revealed by MEME are shown as a function of their position along the transposase. The N-terminal and C-terminal ends of the protein are indicated, as are the helix-turn-helix motif (HTH), the DDE triad and the two additional conserved D residues. The conserved residues of the DDE motif are indicated by vertical blue arrows. This is derived from the same 166 sequences mentioned in the legend of part B. B) Schema of the transposase domains with the 6 conserved motifs for each individual IS1202 subgroup. The DDE motif revealed by MEME are shown in blue boxes together with the conserved additional DD. The conserved residues of the DDE motif are indicated by vertical blue arrows. Siguier et al 2022; 2023 PMID: 37302728

Within the DDE domain, there are two highly conserved additional Aspartic acid (DD) residues between the two D and a glutamine (Q) seven residues C-terminal to the conserved E (Glutamic acid) instead of the characteristic K/R (Lysine/Arginine) ��(2,11)� as also noted by Harmer et al ��(5)�. The motifs surrounding the DDE triad are retained by each of the subgroups individually (Fig. IS1202. 3B).

Transposase alignment revealed two prominent group-specific indel sequences (Fig. IS1202.4): one of about 30 amino acids just before the catalytic domain (IS1202 subgroup), and a second smaller indel sequence of 1-7 amino acids (ISTde1 subgroup) between the second D and the E of the DDE domain. There was significant amino acid conservation in the larger indel sequence particularly at the N-terminal end (Fig. IS1202.5). All three subgroups also included a non-conserved C-terminal region (Fig. IS1202.4).

Fig.IS1202.4. Transposase Alignment using MAFFT and visualized with MSAviewer. 166 transposase sequences are included in the alignment. The position of the indels is indicated as well as that of the HTH and DDE motif together with the number of IS containing each indel in parentheses. Note that the indel sequence found in the ISTde1 group occurs within the DDE motif. The non-conserved C-terminal end of these proteins is clearly indicated. Amino acids are colored as proposed by Taylor (40)). Siguier et al 2022; 2023 PMID: 37302728


Fig.IS1202.5. The MAFFT alignments were visualized using SnapGene. The color code above indicates the degree of conservation (red, high; blue, low). The amino acid conservation is highlighted in yellow (see consensus). A: IS1202 group indel. B: ISTde1 group indel. Siguier et al 2022; 2023 PMID: 37302728


AlphaFold ��(12)� models (Fig. IS1202.6) revealed that the N-terminal HTH (Helix-Turn-Helix) domain, presumably involved in IR binding, is separated from the catalytic domain carrying the catalytic site by a poorly defined segment (blue arrows). The variable C-terminal segment, predicted to be -helical is also poorly defined (orange arrows) as a region of low or very low predictive confidence. For ISTde1, the insertion splits the DDE motif. In all three cases, the N-terminal HTH domain appears separated from the rest of the transposase by a region of low predictive confidence. These indels are correlated with mechanistic changes in transposition strictly associated with the behavior of the IS1202 subgroup in which they are found viz: change in xrs targeting and of the associated DR length.

The transposases of the IS1202 family appear related, although distantly, to that of the to IS481 and IS3 family particularly in their DDE domains (e.g. IS1202 transposase has 39% amino acid similarity to those of the ISPfr5 of the IS481 family) ��(11)�.

Fig.IS1202.6. Results of AlphaFold modeling. The figure shows the predicted structure of a representative example, IS1202, ISTde1 and ISAba32, of each of the three IS1202 subgroups using the NCBI accession number for each. The N- and C-terminal ends are shown where visible, as are the catalytic domains containing the RNase fold and the DDE motif (CD) and the probable N-terminal HTH DNA binding domain. The color scheme shows the degree of certainty of the different regions of the model: dark blue, high; light blue confident; yellow, low; orange very low. Red boxes indicate the position of the indels. Their positions on the scaffold of ISAba32 which include neither is indicated by a green arrow. Thin blue arrows show the poorly defined segment separating the N-terminal HTH (Helix-Turn-Helix) domain, presumably involved in IR binding, from the catalytic domain carrying the catalytic site. The thin orange arrows indicate the C-terminal segment, predicted to be α−helical but poorly defined as a region of low or very low predictive confidence. Siguier et al 2022; 2023 PMID: 37302728

Harmer et al ��(5)� suggest that IS1202 family transposases show some similarity to the phage Mu and Tn7 transposases although the resemblance of the DDE domain at the sequence level is not particularly strong (Fig. IS1202.7 top). However, HHpred analysis (Fig. IS1202. 7) indeed shows good structural similarities with Tn7 TnsB and MuA proteins as well as to the transposase of the eukaryotic mariner element Mos1. Harmer et al ��(5)� have also underlined the presence of two N-Ter HTH modules composing the probable DNA binding domain. These can be seen in the AlphaFold models of Fig. IS1202.6. (ISAba32: https://alphafold.ebi.ac.uk/entry/A0A5N5XUG9; ISTde1:https://alphafold.ebi.ac.uk/entry/Q73JR2; IS1202: https://alphafold.ebi.ac.uk/entry/Q54513).

It should be noted that a number of transposases exhibit tandem DNA binding domains: Mos1 also includes 2 N-terminal HTH modules ��(13)� as does the transposase of IS21 (Fig. IS21.7) ��(14)� whereas Tn7 TnsB carries N-Ter SH3 (beta-barrel) and HTH modules with a winged helix DBD further downstream (Fig. Tn7.2Fi).

Fig.IS1202.7. HHpred Analysis of Transposases of the founding member of each IS1202-family Subclasses.
Terminal Inverted IRs

Alignment of the left and right terminal inverted repeats of each IS subgroup (Fig. IS1202.8) shows that, like the IRs of most IS (see: TnPedia), IS1202 family IRs carry two well conserved domains: a terminal domain of three base pairs, which is recognized for cleavage, and an internal region which generally serves as a DNA recognition sequence for transposase binding. The terminal domain of both IRL and IRR of members of two subgroups (ISAba32 and ISTde1) begins with 5’-TGT-3’ (as do those of the IS3 and IS481 families ��(11)�) while those of the third subgroup, IS1202, are less conserved: IRR retains the conserved TGT, but the left end is less conserved (5’-Ta/gT-3’) (Fig. IS1202.9A,B and C).

Fig.IS1202.8. Alignment of IRL and IRR. The sequences of IS ends were aligned using WebLogo. They are defined by the direction of transcription of the transposase gene. IRL, by definition, is located on the 5′ side of the transposase orf. Top: Alignment of 166 left (IRL) and right (IRR) ends including all three IS1202 subgroups. Alignment of those of the individual subgroups are shown below: ISAba32, n=61; ISTde1, n=67; IS1202, n=38

All three subgroups have a second conserved region around position 20 of both IRL and IRR. This is somewhat more extensive for the IS1202 subgroup than for the other two subgroups. Not only does the ISAba32 subgroup carry a third relatively well conserved region further into the IR, but it exhibits a completely conserved C residue at position 9 of IRL. Close examination of the end sequences (Fig. IS1202.9A,B and C) revealed that the two conserved regions of the ISAba32 group represent two tandem and partially conserved direct repeats (also noted in reference ��(5)�) which, in other transposons and IS, constitute transposase binding sites (e.g Tn7: Tn7.1A and the related Tn402: Fig.Tn402.1; Fig.Tn7.1D IS21: Fig. IS21.3; Fig. IS21.7B). As shown in Fig. IS1202.8, these sequences are less conserved in the right end than in IRL. There is a “core” conserved heptanucleotide block 5’-AAATGTC-3’ with some variation in the initial 3 nucleotides. The IS1202 subgroup has two copies of the core sequence in IRL but only a single copy in IRR (Fig. IS1202.8; Fig. IS1202.9D). In this case the copy proximal to the IRL tip is frequently 5’-TAATGTC-3’ and can be extended slightly in both 5’ and 3’ directions. The third subgroup, ISTde1, has a slightly longer core repeat of 9 bp which, in many cases can be extended by a single base pair at the 5’ and 3’ ends (Fig. IS1202.9F and F).

Thus, in addition to DR length and differences in the transposase, the IR of each IS1202 subgroup have distinct features, family subgroups have distinct features in their terminal IRs.


Distribution.

The distribution of members of the three subgroups is quite different ��(3,4)�: ISAba32 subgroup members are found in plasmids and chromosomes and in unassembled shotgun sequences of mainly - (Acinetobacter), - (Burkholderia) and some -proteobacteria; the majority of ISTde1 subgroup members were identified in whole shotgun sequences, in a number of chromosomes but in only one plasmid. They are more widely dispersed and can be found in  and proteobacteria, Firmicutes, Armatimonadota, Acidobacteria, Atribacterota, Chloroflexota, Deltaproteobacteria, Elusimicrobiota, Gemmatimonadetes, Nitrospira and Synergistota; the IS1202 subgroup is found in whole shotgun sequences and also in assembled chromosomes. Members have yet been identified in plasmids. They are found in Firmicutes, Tenericutes and a Spirochaete ��(3,4)�.

Target-specificity of ISAba32 subgroup members

For each IS, at least one (and frequently several) insertion sites corresponding to insertions in different loci were annotated generating a library of 245 insertions with their flanking sequences ��(3,4)�.

Xrs sites

Xrs (Xer Recombination Sites) is the generic name for specific recombination sites found on chromosomes and plasmids, acted on by the XerC and XerD recombinases ��(10)�. XerCD recombine chromosome- and plasmid-borne xrs to resolve dimers formed by recombination between circular sister replicons (commonly known as dif sites for chromosomes and xer sites for plasmids).

Other xrs are used to integrate bacteriophages or genomic islands into chromosomes. More recently numerous xrs have been found flanking mobile genes in plasmids, thus inferred involved in their mobility ��(15–18)�. Mobile Genetic Elements inserted next to xrs have been repeatedly identified in plasmids of Acinetobacter baumannii often carrying repeated xrs (called pdif in these cases, because of their homology to the chromosomal xrs, dif) arranged in modules in which they flank one or a small number of genes, often including different clinically important carbapenemase-encoding bla-OXA genes ��(15–18)�. Similar structures have been identified in a number of additional bacterial genera and species ��(19,20)�.

Insertions abutting xrs

A number of IS1202-related IS had been identified abutting xrs in bacterial plasmids and chromosomes. Initially, these were observed in plasmids of Acinetobacter baumannii��(16,17,21)� but now include many bacterial genera and species such Klebsiella pneumoniae and Burkholderia cenocepacia ��(3,4)�.

Only members of the ISAba32 subgroup target xrs

A large number of IS insertions neighboring xrs sites have now been identified ��(3–5)� (Fig.IS1202.2, Fig.IS1202.10). They are invariably located 3-7 bp from the outer end of the XerC-arm of xrs, with the length corresponding to the DR (Fig.IS1202.10) and oriented with IRL next to the predicted XerC-binding arm of the xrs (Fig.IS1202.10, Fig.IS1202.11A).

Only members of the ISAba32 subgroup have been observed inserted next to xrs (Fig. IS1202.2, ��(3,4)�). In one study ��(3,4)� 61 of 166 IS1202 family members, all belonging to the ISAba32 subgroup had inserted at 5-6 bp from an xrs (Fig. IS1202.10Bii; Fig. IS1202.10A). In most cases, a DR of 5-6 bp was detected. The insertions included both full length and partial ISAba32 copies and in each partial copy, IRL (but not IRR) was conserved and the distance between the xrs and the partial IS was 5 or 6 bp. ��(5,16,17,21)�.

Fig.IS1202.10
many different xrs are targeted

Alignment of a number of different insertion sites of ISAba32 copies (78 different from a total of 128 insertions) showed that ISAba32 can target different xrs sites, as judged by the variation in their central regions (Fig.IS1202.11A). An additional example was identified for ISAjo2 where 3 copies were inserted next to 3 different xrs in Acinetobacter baumannii plasmid pAF-401 (Fig. IS1202.10Bi). Thus demonstrates that different ISAba32 subgroup members can target a number of different xrs sites.

Further alignments of a particular IS abutting particular xrs sites (Fig. IS1202.10C) showed that the central regions, where xrs recombination takes place, are not conserved, and the XerC and XerD binding sites are not identical. This argues against a direct recognition of the xrs sequence by the IS transposase.

ISAba32 subgroup members target chromosomal xrs (dif sites)

A number of chromosomally located insertion sites were identified (e.g. Fig. IS1202.12). The figure shows a cumulative chromosome GC skew plot indicating positions of the replication origin and terminus ��(22)�. Most of these insertion sites are next to chromosomal dif sites located close to the replication terminus indicating that insertion can target xrs acting in chromosome dimer resolution. The examples shown are: ISBcen27 present in two of the threeBurkholderia cenocepacia MC0-3 chromosomes and also at two positions in one of the Burkholderia cenocepacia PC184 Mulks chromosomes; tISKpn21 which occurs in several Klebsiella and Serratia strains; as well as other IS in Bradyrhizobium and in a number of chromosomes of various Acinetobacter ��(3,4)�.

Fig.IS1202.11
Sequential multiple targeted insertions at xrs

A number of xrs sites are abutted by several IS together with their accompanying DR (Fig. IS1202.10C) indicating that the xrs serves as a target for successive IS insertions. This is the case for K. pneumonia ARLG-3226 which carries two tandem tISKpn21 copies at dif (although the distal tISKpn21 copy is missing a flanking DR) (Fig. IS1202.10Civ) in addition to a third copy at an xrs site at some distance. A second example is Klebsiella pneumoniae isolate 307 plasmid P1 (OX030709) (Fig. IS1202.10Cv) which carries 4 tISKpn21 copies each separated by the same 5 bp DR. If these insertions were targeted to the xrs site, it implies that the xrs proximal IS copy was the last to arrive since the xrs distal copy would not have targeted xrs if it arrived after the proximal copy. The third example from an unnamed plasmid in Acinetobacter junii strain ZM06 (CP077416) is more complex (Fig. IS1202.10Cvi): there is one copy each of ISAba54 and ISAju2 separated by the 5 bp DR. There is also a second copy of ISAbA54 inserted next to a second xrs with a 5 bp DR whose sequence is different from the others (marked *). Note that there is a third non-contiguous xrs copy in this plasmid. This does not have an adjacent IS insertion. This example shows that different IS can target the same xrs and the structure again implies that the xrs-proximal IS arrived last.

Targeting mechanisms

The mechanism of ISAba32 subgroup xrs targeting is at present a matter of speculation. In view of the sequence diversity of the xrs sites targeted, it seems unlikely that ISAba32 subgroup transposases directly recognize DNA sequence. Targeting could be the result of direct transposase interactions with the XerC and or XerD proteins themselves or to a direct recognition of xrs architecture. Neither is it clear why insertion is directional i.e. that it is always IRL which abuts the xrs XerC arm: clearly IRR is less well conserved than IRL, particularly in the internal region (Fig. IS1202.8, Fig. IS1202.9A,B,C).

One possible advantage of targeted insertion to xrs sites is that insertion could increase expression of a downstream gene either by forming a hybrid promoter ��(23)�(see TnPedia) or by providing a mobile promoter (e.g. ISEcp1; ��(24)� IS1380 family). The IS orientation with respect to neighboring orfs is, however, often not always compatible with this. Another possibility is that they are a safe haven as in the case of insertion of the Tn7 transposon directed by an attTn7 sequence downstream from the highly conserved glmS gene (see ��(25)�).

It is important to point out that some xrs identified in enterobacteria are flanked on their XerC-side by specific regions (called 'accessory regions') containing binding sites for various accessory proteins which serve an architectural role and control XerCD-mediated recombination ��(26–28)�. As a particular DNA structure, accessory regions might be targeted by the IS. We do not know at present whether the targeted xrs possess such flanking elements that have so far been described only in enterobacteria.

Fig.IS1202.12

In addition, targeting may inactivate possible accessory region-mediated control, damaging dimer resolution at these sites. However, the fact that ISs target plasmid-borne pdif-cassettes and chromosome dif sites, neither of which is predicted to use this kind of control, argues against this possibility.

Alternatively, the IS could act as accessory sites themselves, facilitating formation of the appropriate topology required for recombination. That the targeted xrs sites in cassettes are probably active is supported by the observation that they are recognized by Acinetobacter baumannii XerC and XerD in vitro.(P. Rouseau pers. Comm.)

Transposition Mechanism

Little is known concerning the transposition mechanism of this IS family. However, there are two reports in which circular IS copies have been identified. Hudson et al ��(29)� identified circular copies of tISKpn21, a member of the ISAba32 subgroup, during analysis of the antibiotic resistance genes of a clinical Klebsiella pneumoniae carbapenem resistant isolate carrying the metallo -lactamase, bla-NDM-1. MiSeq reads were found where tISKpn21 ends were linked, and separated by 5-bp identical to the direct repeat. PCR was used to rule out that this was due to tandem tISKpn21 copies in the host genome. This product is typical of the circular intermediates generated by a number of IS families by a mechanism called copy-out-paste-in ��(30)� in which one IS end attacks the other, several base pairs into the flanking donor DNA.

Moreover, the circle appeared to be derived from a resident plasmid copy of tISKpn21 which has different 5 bp target flanks (possibly resulting from inter IS recombination): Only the left end flank was observed between the IS ends in the circle, suggesting that the right end preferentially attacks the left during circularization (rather than the left end attacking the right as stated by these authors)(see ��(31,32)�). The second example was described by Nielsen et al ��(33)� in a study of Sphingobium herbicidovorans MH in which they identified a circular form of an IS related to IS1202 with abutted IRL and IRR ends. Unfortunately, the DNA sequence of the collection of circles is not available in an assembled form and it is therefore not possible to identify the IS1202-related IS.

It should be noted that in IS which have adopted the copy-out-paste-in mechanism, there is generally an outward-facing -35 promoter element located in the right end and an inward-facing -10 element in the left end. This results in the temporary formation of a strong promoter in the circular intermediate which permits high levels of transposase expression (for review see ��(30)�). Inspection of the IR sequences suggests that this may be case for tISKpn21 and may be a general property of the ISAba32 subgroup.

It remains to be seen whether the copy-out-paste-in transposition pathway is a general mechanism adopted by the entire IS1202 family.

Aknowledgements

This chapter is based on the results of Siguier et al 2022, 2023 ��(3,4)� and the input of P. Siguier, P. Rouseau and F. Cornet is gratefully acknowledged.

Bibliography

  1. Siguier P, Rousseau P, Cornet F, Chandler M . A subclass of the IS1202 family of bacterial insertion sequences targets XerCD recombination sites. - Plasmid: 2023 Jun 9, 127;102696 [PubMed:37302728] [DOI]