Difference between revisions of "IS Families/IS1202 family"
Line 11: | Line 11: | ||
=====Organization, Phylogenetic Analysis and Identification of Three Major Subgroups.===== | =====Organization, Phylogenetic Analysis and Identification of Three Major Subgroups.===== | ||
Family members range from 1,320 to 1,990 bp in length with a single Tpase ''orf'' of between 400 and 500 amino acids long in a single reading frame (Fig. IS1202.1A-D). Several, (i.e. tIS''Kpn21'', tIS''Kpn65'', tIS''Kpn63'', tIS''Shal2'' and tIS''Rel10'') also carry a single passenger gene, annotated as “hypothetical protein”, which is unrelated in each case. | Family members range from 1,320 to 1,990 bp in length with a single Tpase ''orf'' of between 400 and 500 amino acids long in a single reading frame (Fig. IS1202.1A-D). Several, (i.e. tIS''Kpn21'', tIS''Kpn65'', tIS''Kpn63'', tIS''Shal2'' and tIS''Rel10'') also carry a single passenger gene, annotated as “hypothetical protein”, which is unrelated in each case. | ||
+ | [[File:Fig.IS1202.1.png|center|thumb|860x860px|'''Fig.IS1202.1.''' ]] | ||
The 166 examples fall into three principal subgroups defined by Tpase alignments (Fig. IS1202. 2). Each has been named after one of their members as: [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202'',] [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTde1 IS''Tde1''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISAba32 IS''Aba32''] ��(3,4)�. Note that these groups are largely similar to those proposed by Harmer et al ��(5)� with slight differences presumably because those shown here are based on a larger IS library: the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISAba32 IS''Aba32''] (61 examples;��(3,4)�) and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISAjo2 IS''Ajo2''] (16 examples; ��(5)�) groups are equivalent, whereas [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISCARN52 IS''CARN52''], [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISCARN62 IS''CARN62''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISCARN63 IS''CARN63''] and several others which appear in the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''] subgroup of Harmer et al ��(5)� (10 examples) do not fall into the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''] subgroup of Siguier et al ��(3,4)� (38 examples;), rather they are included in the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTde1 IS''Tde1''] subgroup along with [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISCARN112 IS''CARN112''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISEsa1 IS''Esa1''] ([https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISEsa1 IS''Esa1''] subgroup; 2 examples; ��(5)�). A phylogenetic tree, rooted with members of the [[IS Families/IS481 family|IS''481'' family]] is shown in Fig. IS1202.2. The four family members which also carry apparently unrelated passenger genes are not restricted to a single subgroup: for example, whereas tIS''Kpn21'', tIS''Kpn65'', tIS''Kpn63'', and tIS''Rel10'' all belong to the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISAba32 IS''Aba32''] subgroup, tIS''Shal2'' belongs to the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''] subgroup. | The 166 examples fall into three principal subgroups defined by Tpase alignments (Fig. IS1202. 2). Each has been named after one of their members as: [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202'',] [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTde1 IS''Tde1''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISAba32 IS''Aba32''] ��(3,4)�. Note that these groups are largely similar to those proposed by Harmer et al ��(5)� with slight differences presumably because those shown here are based on a larger IS library: the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISAba32 IS''Aba32''] (61 examples;��(3,4)�) and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISAjo2 IS''Ajo2''] (16 examples; ��(5)�) groups are equivalent, whereas [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISCARN52 IS''CARN52''], [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISCARN62 IS''CARN62''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISCARN63 IS''CARN63''] and several others which appear in the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''] subgroup of Harmer et al ��(5)� (10 examples) do not fall into the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''] subgroup of Siguier et al ��(3,4)� (38 examples;), rather they are included in the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTde1 IS''Tde1''] subgroup along with [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISCARN112 IS''CARN112''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISEsa1 IS''Esa1''] ([https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISEsa1 IS''Esa1''] subgroup; 2 examples; ��(5)�). A phylogenetic tree, rooted with members of the [[IS Families/IS481 family|IS''481'' family]] is shown in Fig. IS1202.2. The four family members which also carry apparently unrelated passenger genes are not restricted to a single subgroup: for example, whereas tIS''Kpn21'', tIS''Kpn65'', tIS''Kpn63'', and tIS''Rel10'' all belong to the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISAba32 IS''Aba32''] subgroup, tIS''Shal2'' belongs to the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''] subgroup. | ||
+ | [[File:Fig.IS1202.2.png|center|thumb|860x860px|'''Fig.IS1202.2.''']] | ||
+ | <br /> | ||
=====Direct Target Repeat Length===== | =====Direct Target Repeat Length===== | ||
Line 19: | Line 22: | ||
======Transposase Signatures====== | ======Transposase Signatures====== | ||
A domain search ([https://ecoliwiki.org/colipedia/index.php/Clusters_of_Orthologous_Groups_(COGs) COG] and [http://hmmer.org/ HMMER]/[http://pfam.xfam.org/ PFAM] and a ''de novo'' search with [https://meme-suite.org/meme/tools/meme MEME]) revealed two major domains: an N-terminal [[wikipedia:Helix-turn-helix|helix-turn-helix (HTH)]] DNA-binding domain and a DDE-type [[wikipedia:Ribonuclease_H|RNase fold]] catalytic domain (Fig. IS1202. 3A). | A domain search ([https://ecoliwiki.org/colipedia/index.php/Clusters_of_Orthologous_Groups_(COGs) COG] and [http://hmmer.org/ HMMER]/[http://pfam.xfam.org/ PFAM] and a ''de novo'' search with [https://meme-suite.org/meme/tools/meme MEME]) revealed two major domains: an N-terminal [[wikipedia:Helix-turn-helix|helix-turn-helix (HTH)]] DNA-binding domain and a DDE-type [[wikipedia:Ribonuclease_H|RNase fold]] catalytic domain (Fig. IS1202. 3A). | ||
+ | [[File:Fig.IS1202.3.png|center|thumb|680x680px|'''Fig.IS1202.3.''']] | ||
Within the DDE domain, there are two highly conserved additional Aspartic acid ('''DD''') residues between the two '''D''' and a glutamine ('''Q''') seven residues C-terminal to the conserved '''E''' (Glutamic acid) instead of the characteristic '''K/R''' (Lysine/Arginine) ��(2,11)� as also noted by Harmer et al ��(5)�. The motifs surrounding the DDE triad are retained by each of the subgroups individually (Fig. IS1202. 3B). | Within the DDE domain, there are two highly conserved additional Aspartic acid ('''DD''') residues between the two '''D''' and a glutamine ('''Q''') seven residues C-terminal to the conserved '''E''' (Glutamic acid) instead of the characteristic '''K/R''' (Lysine/Arginine) ��(2,11)� as also noted by Harmer et al ��(5)�. The motifs surrounding the DDE triad are retained by each of the subgroups individually (Fig. IS1202. 3B). | ||
Transposase alignment revealed two prominent group-specific indel sequences (Fig. IS1202.4): one of about 30 amino acids just before the catalytic domain ([https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''] subgroup), and a second smaller indel sequence of 1-7 amino acids ([https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTde1 IS''Tde1''] subgroup) between the second '''D''' and the '''E''' of the DDE domain. There was significant amino acid conservation in the larger indel sequence particularly at the N-terminal end (Fig. IS1202.5). All three subgroups also included a non-conserved C-terminal region (Fig. IS1202.4). | Transposase alignment revealed two prominent group-specific indel sequences (Fig. IS1202.4): one of about 30 amino acids just before the catalytic domain ([https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''] subgroup), and a second smaller indel sequence of 1-7 amino acids ([https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTde1 IS''Tde1''] subgroup) between the second '''D''' and the '''E''' of the DDE domain. There was significant amino acid conservation in the larger indel sequence particularly at the N-terminal end (Fig. IS1202.5). All three subgroups also included a non-conserved C-terminal region (Fig. IS1202.4). | ||
+ | [[File:Fig.IS1202.4.png|center|thumb|860x860px|'''Fig.IS1202.4''']] | ||
+ | <br /> | ||
+ | [[File:Fig.IS1202.5.png|center|thumb|860x860px|Fig.IS1202.5]] | ||
+ | |||
[https://alphafold.ebi.ac.uk/ AlphaFold] ��(12)� models (Fig. IS1202.6) revealed that the N-terminal [[wikipedia:Helix-turn-helix|HTH]] ([[wikipedia:Helix-turn-helix|Helix-Turn-Helix]]) domain, presumably involved in '''IR''' binding, is separated from the catalytic domain carrying the catalytic site by a poorly defined segment (blue arrows). The variable C-terminal segment, predicted to be -helical is also poorly defined (orange arrows) as a region of low or very low predictive confidence. For [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTde1 IS''Tde1''], the insertion splits the DDE motif. In all three cases, the [[wikipedia:Helix-turn-helix|N-terminal HTH domain]] appears separated from the rest of the transposase by a region of low predictive confidence. These indels are correlated with mechanistic changes in transposition strictly associated with the behavior of the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''] subgroup in which they are found ''viz'': change in ''xrs'' targeting and of the associated '''DR''' length. | [https://alphafold.ebi.ac.uk/ AlphaFold] ��(12)� models (Fig. IS1202.6) revealed that the N-terminal [[wikipedia:Helix-turn-helix|HTH]] ([[wikipedia:Helix-turn-helix|Helix-Turn-Helix]]) domain, presumably involved in '''IR''' binding, is separated from the catalytic domain carrying the catalytic site by a poorly defined segment (blue arrows). The variable C-terminal segment, predicted to be -helical is also poorly defined (orange arrows) as a region of low or very low predictive confidence. For [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTde1 IS''Tde1''], the insertion splits the DDE motif. In all three cases, the [[wikipedia:Helix-turn-helix|N-terminal HTH domain]] appears separated from the rest of the transposase by a region of low predictive confidence. These indels are correlated with mechanistic changes in transposition strictly associated with the behavior of the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''] subgroup in which they are found ''viz'': change in ''xrs'' targeting and of the associated '''DR''' length. | ||
The transposases of the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''] family appear related, although distantly, to that of the to [[IS Families/IS481 family|IS''481'']] and [[IS Families/IS3 family|IS''3'' family]] particularly in their DDE domains (e.g. [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''] transposase has 39% amino acid similarity to those of the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISPfr5 IS''Pfr5''] of the [[IS Families/IS481 family|IS''481'' family]]) ��(11)�. | The transposases of the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''] family appear related, although distantly, to that of the to [[IS Families/IS481 family|IS''481'']] and [[IS Families/IS3 family|IS''3'' family]] particularly in their DDE domains (e.g. [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''] transposase has 39% amino acid similarity to those of the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISPfr5 IS''Pfr5''] of the [[IS Families/IS481 family|IS''481'' family]]) ��(11)�. | ||
+ | [[File:Fig.IS1202.6.png|center|thumb|860x860px|Fig.IS1202.6]] | ||
− | Harmer et al ��(5)� suggest that [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''] family transposases show some similarity to the [[wikipedia:Bacteriophage_Mu|phage Mu]] and [[Transposons families/Tn7 family|Tn''7'' transposases]] although the resemblance of the DDE domain at the sequence level is not particularly strong (Fig. IS1202.7 top). However, [https://toolkit.tuebingen.mpg.de/tools/hhpred HHpred analysis] (Fig. IS1202. 7) indeed shows good structural similarities with [[Transposons families/Tn7 family|Tn''7'' TnsB]] and [[wikipedia:Bacteriophage_Mu|MuA]] proteins as well as to the transposase of the eukaryotic [https://www.uniprot.org/uniprot/Q7JQ07 mariner element ''Mos1'']. Harmer et al ��(5)� have also underlined the presence of two N-Ter [[wikipedia:Helix-turn-helix|HTH]] modules composing the probable DNA binding domain. These can be seen in the [https://alphafold.ebi.ac.uk/ AlphaFold] models of Fig. IS1202.6. ([https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISAba32 IS''Aba32'']: https://alphafold.ebi.ac.uk/entry/A0A5N5XUG9; [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTde1 IS''Tde1'']:https://alphafold.ebi.ac.uk/entry/Q73JR2; [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202'']: https://alphafold.ebi.ac.uk/entry/Q54513). | + | Harmer et al ��(5)� suggest that [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''] family transposases show some similarity to the [[wikipedia:Bacteriophage_Mu|phage Mu]] and [[Transposons families/Tn7 family|Tn''7'' transposases]] although the resemblance of the DDE domain at the sequence level is not particularly strong (Fig. IS1202.7 top). However, [https://toolkit.tuebingen.mpg.de/tools/hhpred HHpred analysis] (Fig. IS1202. 7) indeed shows good structural similarities with [[Transposons families/Tn7 family|Tn''7'' TnsB]] and [[wikipedia:Bacteriophage_Mu|MuA]] proteins as well as to the transposase of the eukaryotic [https://www.uniprot.org/uniprot/Q7JQ07 mariner element ''Mos1'']. Harmer et al ��(5)� have also underlined the presence of two N-Ter [[wikipedia:Helix-turn-helix|HTH]] modules composing the probable DNA binding domain. These can be seen in the [https://alphafold.ebi.ac.uk/ AlphaFold] models of Fig. IS1202.6. ([https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISAba32 IS''Aba32'']: https://alphafold.ebi.ac.uk/entry/A0A5N5XUG9; [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTde1 IS''Tde1'']:https://alphafold.ebi.ac.uk/entry/Q73JR2; [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202'']: https://alphafold.ebi.ac.uk/entry/Q54513). |
It should be noted that a number of transposases exhibit tandem DNA binding domains: ''[https://www.uniprot.org/uniprot/Q7JQ07 Mos1]'' also includes 2 N-terminal [[wikipedia:Helix-turn-helix|HTH modules]] ��(13)� as does the transposase of [[IS Families/IS21 family|IS''21'']] (Fig. IS21.7) ��(14)� whereas [[Transposons families/Tn7 family|Tn''7'' TnsB]] carries N-Ter [[wikipedia:SH3_domain|SH3 (beta-barrel)]] and [[wikipedia:Helix-turn-helix|HTH modules]] with a winged helix DBD further downstream (Fig. Tn7.2Fi). | It should be noted that a number of transposases exhibit tandem DNA binding domains: ''[https://www.uniprot.org/uniprot/Q7JQ07 Mos1]'' also includes 2 N-terminal [[wikipedia:Helix-turn-helix|HTH modules]] ��(13)� as does the transposase of [[IS Families/IS21 family|IS''21'']] (Fig. IS21.7) ��(14)� whereas [[Transposons families/Tn7 family|Tn''7'' TnsB]] carries N-Ter [[wikipedia:SH3_domain|SH3 (beta-barrel)]] and [[wikipedia:Helix-turn-helix|HTH modules]] with a winged helix DBD further downstream (Fig. Tn7.2Fi). | ||
+ | [[File:Fig.IS1202.7.png|center|thumb|860x860px|Fig.IS1202.7]] | ||
======Terminal Inverted IRs====== | ======Terminal Inverted IRs====== | ||
Alignment of the left and right terminal inverted repeats of each IS subgroup (Fig. IS1202.8) shows that, like the IRs of most IS (see: [[General Information/IS Organization|TnPedia]]), [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''] family IRs carry two well conserved domains: a terminal domain of three base pairs, which is recognized for cleavage, and an internal region which generally serves as a DNA recognition sequence for transposase binding. The terminal domain of both '''IRL''' and '''IRR''' of members of two subgroups ([https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISAba32 IS''Aba32''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTde1 IS''Tde1'']) begins with 5’-'''TGT'''-3’ (as do those of the [[IS Families/IS3 family|IS''3'']] and [[IS Families/IS481 family|IS''481'' families]] ��(11)�) while those of the third subgroup, [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''], are less conserved: '''IRR''' retains the conserved '''TGT''', but the left end is less conserved (5’-Ta/gT-3’) (Fig. IS1202.9A,B and C). | Alignment of the left and right terminal inverted repeats of each IS subgroup (Fig. IS1202.8) shows that, like the IRs of most IS (see: [[General Information/IS Organization|TnPedia]]), [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''] family IRs carry two well conserved domains: a terminal domain of three base pairs, which is recognized for cleavage, and an internal region which generally serves as a DNA recognition sequence for transposase binding. The terminal domain of both '''IRL''' and '''IRR''' of members of two subgroups ([https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISAba32 IS''Aba32''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTde1 IS''Tde1'']) begins with 5’-'''TGT'''-3’ (as do those of the [[IS Families/IS3 family|IS''3'']] and [[IS Families/IS481 family|IS''481'' families]] ��(11)�) while those of the third subgroup, [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''], are less conserved: '''IRR''' retains the conserved '''TGT''', but the left end is less conserved (5’-Ta/gT-3’) (Fig. IS1202.9A,B and C). | ||
+ | [[File:Fig.IS1202.8.png|center|thumb|860x860px|Fig.IS1202.8]] | ||
All three subgroups have a second conserved region around position 20 of both '''IRL''' and '''IRR'''. This is somewhat more extensive for the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''] subgroup than for the other two subgroups. Not only does the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISAba32 IS''Aba32''] subgroup carry a third relatively well conserved region further into the '''IR''', but it exhibits a completely conserved '''C''' residue at position 9 of '''IRL'''. Close examination of the end sequences (Fig. IS1202.9A,B and C) revealed that the two conserved regions of the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISAba32 IS''Aba32''] group represent two tandem and partially conserved direct repeats (also noted in reference ��(5)�) which, in other transposons and IS, constitute transposase binding sites (e.g [[Transposons families/Tn7 family|Tn''7'']]: Tn7.1A and the related [[Transposons families/Tn402 family|Tn''402'']]: Fig.Tn402.1; Fig.Tn7.1D [[IS Families/IS21 family|IS''21'']]: Fig. IS21.3; Fig. IS21.7B). As shown in Fig. IS1202.8, these sequences are less conserved in the right end than in '''IRL'''. There is a “core” conserved heptanucleotide block 5’-'''AAATGTC'''-3’ with some variation in the initial 3 nucleotides. The [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''] subgroup has two copies of the core sequence in '''IRL''' but only a single copy in '''IRR''' (Fig. IS1202.8; Fig. IS1202.9D). In this case the copy proximal to the '''IRL''' tip is frequently 5’-TAATGTC-3’ and can be extended slightly in both 5’ and 3’ directions. The third subgroup, [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTde1 IS''Tde1''], has a slightly longer core repeat of 9 bp which, in many cases can be extended by a single base pair at the 5’ and 3’ ends (Fig. IS1202.9F and F). | All three subgroups have a second conserved region around position 20 of both '''IRL''' and '''IRR'''. This is somewhat more extensive for the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''] subgroup than for the other two subgroups. Not only does the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISAba32 IS''Aba32''] subgroup carry a third relatively well conserved region further into the '''IR''', but it exhibits a completely conserved '''C''' residue at position 9 of '''IRL'''. Close examination of the end sequences (Fig. IS1202.9A,B and C) revealed that the two conserved regions of the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISAba32 IS''Aba32''] group represent two tandem and partially conserved direct repeats (also noted in reference ��(5)�) which, in other transposons and IS, constitute transposase binding sites (e.g [[Transposons families/Tn7 family|Tn''7'']]: Tn7.1A and the related [[Transposons families/Tn402 family|Tn''402'']]: Fig.Tn402.1; Fig.Tn7.1D [[IS Families/IS21 family|IS''21'']]: Fig. IS21.3; Fig. IS21.7B). As shown in Fig. IS1202.8, these sequences are less conserved in the right end than in '''IRL'''. There is a “core” conserved heptanucleotide block 5’-'''AAATGTC'''-3’ with some variation in the initial 3 nucleotides. The [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''] subgroup has two copies of the core sequence in '''IRL''' but only a single copy in '''IRR''' (Fig. IS1202.8; Fig. IS1202.9D). In this case the copy proximal to the '''IRL''' tip is frequently 5’-TAATGTC-3’ and can be extended slightly in both 5’ and 3’ directions. The third subgroup, [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTde1 IS''Tde1''], has a slightly longer core repeat of 9 bp which, in many cases can be extended by a single base pair at the 5’ and 3’ ends (Fig. IS1202.9F and F). | ||
Thus, in addition to DR length and differences in the transposase, the IR of each [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''] subgroup have distinct features, family subgroups have distinct features in their terminal IRs. | Thus, in addition to DR length and differences in the transposase, the IR of each [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1202 IS''1202''] subgroup have distinct features, family subgroups have distinct features in their terminal IRs. | ||
+ | |||
=====Distribution.===== | =====Distribution.===== |
Revision as of 18:30, 2 July 2023
This chapter has appeared in a modified form as: A subclass of the IS1202 family of bacterial insertion sequences targets XerCD recombination sites. Patricia Siguier, Philippe Rousseau, François Cornet, Michael Chandler. Plasmid 2023 Jun 9;127:102696. doi:10.1016/j.plasmid.2023.102696 [1].
Contents
The IS1202 Family
History
The founding member of this family, the 1,747 bp IS1202, identified in Streptococcus pneumoniae ��(1)�, is bordered by 23 bp imperfect inverted repeat sequences (IR), contains a single open reading frame sufficient to encode a 54.4-kDa polypeptide and is flanked by a 27 bp direct target repeat sequence (DR). IS1202 was initially classified in ISfinder as an emerging IS family (ISNCY – not classified yet) ��(2)� (See IS families table at: TnPedia) but further genome analysis ��(3–5)� identified over 150 related examples which, together, constitute the IS1202 family.
Many were identified in public databases by reiterative BLAST approaches ��(6)� with the primary transposase sequence of representative elements used as a query in a BLASTP ��(7)�. The collection of IS forms a coherent family, the IS1202 family, based on their transposase sequences ��(3,4)�. While certain have impacted some important properties of their hosts (e.g. deletion of the capsular polysaccharide locus (cps), a major Streptococcus pneumoniae virulence factor, key to survival in the blood ��(8)� and insertional inactivation of lipopolysaccharide gene, lpxA, resulting in colistin dependence possibly leading to colistin resistance ��(9)�), a potentially more interesting property is that a specific subclass of the family targets xrs (xer) recombination sites (Target-specificity of ISAba32 subgroup members), including the universal chromosome termination site, dif (see for example ��(10)�).
Organization, Phylogenetic Analysis and Identification of Three Major Subgroups.
Family members range from 1,320 to 1,990 bp in length with a single Tpase orf of between 400 and 500 amino acids long in a single reading frame (Fig. IS1202.1A-D). Several, (i.e. tISKpn21, tISKpn65, tISKpn63, tISShal2 and tISRel10) also carry a single passenger gene, annotated as “hypothetical protein”, which is unrelated in each case.
The 166 examples fall into three principal subgroups defined by Tpase alignments (Fig. IS1202. 2). Each has been named after one of their members as: IS1202, ISTde1 and ISAba32 ��(3,4)�. Note that these groups are largely similar to those proposed by Harmer et al ��(5)� with slight differences presumably because those shown here are based on a larger IS library: the ISAba32 (61 examples;��(3,4)�) and ISAjo2 (16 examples; ��(5)�) groups are equivalent, whereas ISCARN52, ISCARN62 and ISCARN63 and several others which appear in the IS1202 subgroup of Harmer et al ��(5)� (10 examples) do not fall into the IS1202 subgroup of Siguier et al ��(3,4)� (38 examples;), rather they are included in the ISTde1 subgroup along with ISCARN112 and ISEsa1 (ISEsa1 subgroup; 2 examples; ��(5)�). A phylogenetic tree, rooted with members of the IS481 family is shown in Fig. IS1202.2. The four family members which also carry apparently unrelated passenger genes are not restricted to a single subgroup: for example, whereas tISKpn21, tISKpn65, tISKpn63, and tISRel10 all belong to the ISAba32 subgroup, tISShal2 belongs to the IS1202 subgroup.
Direct Target Repeat Length
Members of the three IS1202 family subgroups also generate DRs of three different lengths: Short, 5-6 bp (ISAba32); Medium, 15-18 bp (ISTde1); and Long 24-29 bp, (IS1202) (Fig. IS1202. 1 and 2). In the case of a few examples, no direct repeats are present. However, in many cases, other copies of the same IS identified elsewhere did exhibit DRs ��(3,4)�. The absence of DR in these cases could therefore simply be the result of intra-replicon recombination between two resident IS copies, leading to the separation of the flanking DR sequences, the result from genetic drift in the DR or, since all were found in genomes assembled from shotgun sequencing, by errors of assembly. Similar results were obtained by Harmer et al ��(5)�.
Transposase Signatures
A domain search (COG and HMMER/PFAM and a de novo search with MEME) revealed two major domains: an N-terminal helix-turn-helix (HTH) DNA-binding domain and a DDE-type RNase fold catalytic domain (Fig. IS1202. 3A).
Within the DDE domain, there are two highly conserved additional Aspartic acid (DD) residues between the two D and a glutamine (Q) seven residues C-terminal to the conserved E (Glutamic acid) instead of the characteristic K/R (Lysine/Arginine) ��(2,11)� as also noted by Harmer et al ��(5)�. The motifs surrounding the DDE triad are retained by each of the subgroups individually (Fig. IS1202. 3B).
Transposase alignment revealed two prominent group-specific indel sequences (Fig. IS1202.4): one of about 30 amino acids just before the catalytic domain (IS1202 subgroup), and a second smaller indel sequence of 1-7 amino acids (ISTde1 subgroup) between the second D and the E of the DDE domain. There was significant amino acid conservation in the larger indel sequence particularly at the N-terminal end (Fig. IS1202.5). All three subgroups also included a non-conserved C-terminal region (Fig. IS1202.4).
AlphaFold ��(12)� models (Fig. IS1202.6) revealed that the N-terminal HTH (Helix-Turn-Helix) domain, presumably involved in IR binding, is separated from the catalytic domain carrying the catalytic site by a poorly defined segment (blue arrows). The variable C-terminal segment, predicted to be -helical is also poorly defined (orange arrows) as a region of low or very low predictive confidence. For ISTde1, the insertion splits the DDE motif. In all three cases, the N-terminal HTH domain appears separated from the rest of the transposase by a region of low predictive confidence. These indels are correlated with mechanistic changes in transposition strictly associated with the behavior of the IS1202 subgroup in which they are found viz: change in xrs targeting and of the associated DR length.
The transposases of the IS1202 family appear related, although distantly, to that of the to IS481 and IS3 family particularly in their DDE domains (e.g. IS1202 transposase has 39% amino acid similarity to those of the ISPfr5 of the IS481 family) ��(11)�.
Harmer et al ��(5)� suggest that IS1202 family transposases show some similarity to the phage Mu and Tn7 transposases although the resemblance of the DDE domain at the sequence level is not particularly strong (Fig. IS1202.7 top). However, HHpred analysis (Fig. IS1202. 7) indeed shows good structural similarities with Tn7 TnsB and MuA proteins as well as to the transposase of the eukaryotic mariner element Mos1. Harmer et al ��(5)� have also underlined the presence of two N-Ter HTH modules composing the probable DNA binding domain. These can be seen in the AlphaFold models of Fig. IS1202.6. (ISAba32: https://alphafold.ebi.ac.uk/entry/A0A5N5XUG9; ISTde1:https://alphafold.ebi.ac.uk/entry/Q73JR2; IS1202: https://alphafold.ebi.ac.uk/entry/Q54513).
It should be noted that a number of transposases exhibit tandem DNA binding domains: Mos1 also includes 2 N-terminal HTH modules ��(13)� as does the transposase of IS21 (Fig. IS21.7) ��(14)� whereas Tn7 TnsB carries N-Ter SH3 (beta-barrel) and HTH modules with a winged helix DBD further downstream (Fig. Tn7.2Fi).
Terminal Inverted IRs
Alignment of the left and right terminal inverted repeats of each IS subgroup (Fig. IS1202.8) shows that, like the IRs of most IS (see: TnPedia), IS1202 family IRs carry two well conserved domains: a terminal domain of three base pairs, which is recognized for cleavage, and an internal region which generally serves as a DNA recognition sequence for transposase binding. The terminal domain of both IRL and IRR of members of two subgroups (ISAba32 and ISTde1) begins with 5’-TGT-3’ (as do those of the IS3 and IS481 families ��(11)�) while those of the third subgroup, IS1202, are less conserved: IRR retains the conserved TGT, but the left end is less conserved (5’-Ta/gT-3’) (Fig. IS1202.9A,B and C).
All three subgroups have a second conserved region around position 20 of both IRL and IRR. This is somewhat more extensive for the IS1202 subgroup than for the other two subgroups. Not only does the ISAba32 subgroup carry a third relatively well conserved region further into the IR, but it exhibits a completely conserved C residue at position 9 of IRL. Close examination of the end sequences (Fig. IS1202.9A,B and C) revealed that the two conserved regions of the ISAba32 group represent two tandem and partially conserved direct repeats (also noted in reference ��(5)�) which, in other transposons and IS, constitute transposase binding sites (e.g Tn7: Tn7.1A and the related Tn402: Fig.Tn402.1; Fig.Tn7.1D IS21: Fig. IS21.3; Fig. IS21.7B). As shown in Fig. IS1202.8, these sequences are less conserved in the right end than in IRL. There is a “core” conserved heptanucleotide block 5’-AAATGTC-3’ with some variation in the initial 3 nucleotides. The IS1202 subgroup has two copies of the core sequence in IRL but only a single copy in IRR (Fig. IS1202.8; Fig. IS1202.9D). In this case the copy proximal to the IRL tip is frequently 5’-TAATGTC-3’ and can be extended slightly in both 5’ and 3’ directions. The third subgroup, ISTde1, has a slightly longer core repeat of 9 bp which, in many cases can be extended by a single base pair at the 5’ and 3’ ends (Fig. IS1202.9F and F).
Thus, in addition to DR length and differences in the transposase, the IR of each IS1202 subgroup have distinct features, family subgroups have distinct features in their terminal IRs.
Distribution.
The distribution of members of the three subgroups is quite different ��(3,4)�: ISAba32 subgroup members are found in plasmids and chromosomes and in unassembled shotgun sequences of mainly - (Acinetobacter), - (Burkholderia) and some -proteobacteria; the majority of ISTde1 subgroup members were identified in whole shotgun sequences, in a number of chromosomes but in only one plasmid. They are more widely dispersed and can be found in and proteobacteria, Firmicutes, Armatimonadota, Acidobacteria, Atribacterota, Chloroflexota, Deltaproteobacteria, Elusimicrobiota, Gemmatimonadetes, Nitrospira and Synergistota; the IS1202 subgroup is found in whole shotgun sequences and also in assembled chromosomes. Members have yet been identified in plasmids. They are found in Firmicutes, Tenericutes and a Spirochaete ��(3,4)�.
Target-specificity of ISAba32 subgroup members
For each IS, at least one (and frequently several) insertion sites corresponding to insertions in different loci were annotated generating a library of 245 insertions with their flanking sequences ��(3,4)�.
Xrs sites
Xrs (Xer Recombination Sites) is the generic name for specific recombination sites found on chromosomes and plasmids, acted on by the XerC and XerD recombinases ��(10)�. XerCD recombine chromosome- and plasmid-borne xrs to resolve dimers formed by recombination between circular sister replicons (commonly known as dif sites for chromosomes and xer sites for plasmids).
Other xrs are used to integrate bacteriophages or genomic islands into chromosomes. More recently numerous xrs have been found flanking mobile genes in plasmids, thus inferred involved in their mobility ��(15–18)�. Mobile Genetic Elements inserted next to xrs have been repeatedly identified in plasmids of Acinetobacter baumannii often carrying repeated xrs (called pdif in these cases, because of their homology to the chromosomal xrs, dif) arranged in modules in which they flank one or a small number of genes, often including different clinically important carbapenemase-encoding bla-OXA genes ��(15–18)�. Similar structures have been identified in a number of additional bacterial genera and species ��(19,20)�.
Insertions abutting xrs
A number of IS1202-related IS had been identified abutting xrs in bacterial plasmids and chromosomes. Initially, these were observed in plasmids of Acinetobacter baumannii��(16,17,21)� but now include many bacterial genera and species such Klebsiella pneumoniae and Burkholderia cenocepacia ��(3,4)�.
Only members of the ISAba32 subgroup target xrs
A large number of IS insertions neighboring xrs sites have now been identified ��(3–5)� (Fig.IS1202.2, Fig.IS1202.10). They are invariably located 3-7 bp from the outer end of the XerC-arm of xrs, with the length corresponding to the DR (Fig.IS1202.10) and oriented with IRL next to the predicted XerC-binding arm of the xrs (Fig.IS1202.10, Fig.IS1202.11A).
Only members of the ISAba32 subgroup have been observed inserted next to xrs (Fig. IS1202.2, ��(3,4)�). In one study ��(3,4)� 61 of 166 IS1202 family members, all belonging to the ISAba32 subgroup had inserted at 5-6 bp from an xrs (Fig. IS1202.10Bii; Fig. IS1202.10A). In most cases, a DR of 5-6 bp was detected. The insertions included both full length and partial ISAba32 copies and in each partial copy, IRL (but not IRR) was conserved and the distance between the xrs and the partial IS was 5 or 6 bp. ��(5,16,17,21)�.
many different xrs are targeted
Alignment of a number of different insertion sites of ISAba32 copies (78 different from a total of 128 insertions) showed that ISAba32 can target different xrs sites, as judged by the variation in their central regions (Fig.IS1202.11A). An additional example was identified for ISAjo2 where 3 copies were inserted next to 3 different xrs in Acinetobacter baumannii plasmid pAF-401 (Fig. IS1202.10Bi). Thus demonstrates that different ISAba32 subgroup members can target a number of different xrs sites.
Further alignments of a particular IS abutting particular xrs sites (Fig. IS1202.10C) showed that the central regions, where xrs recombination takes place, are not conserved, and the XerC and XerD binding sites are not identical. This argues against a direct recognition of the xrs sequence by the IS transposase.
ISAba32 subgroup members target chromosomal xrs (dif sites)
A number of chromosomally located insertion sites were identified (e.g. Fig. IS1202.12). The figure shows a cumulative chromosome GC skew plot indicating positions of the replication origin and terminus ��(22)�. Most of these insertion sites are next to chromosomal dif sites located close to the replication terminus indicating that insertion can target xrs acting in chromosome dimer resolution. The examples shown are: ISBcen27 present in two of the threeBurkholderia cenocepacia MC0-3 chromosomes and also at two positions in one of the Burkholderia cenocepacia PC184 Mulks chromosomes; tISKpn21 which occurs in several Klebsiella and Serratia strains; as well as other IS in Bradyrhizobium and in a number of chromosomes of various Acinetobacter ��(3,4)�.
Sequential multiple targeted insertions at xrs
A number of xrs sites are abutted by several IS together with their accompanying DR (Fig. IS1202.10C) indicating that the xrs serves as a target for successive IS insertions. This is the case for K. pneumonia ARLG-3226 which carries two tandem tISKpn21 copies at dif (although the distal tISKpn21 copy is missing a flanking DR) (Fig. IS1202.10Civ) in addition to a third copy at an xrs site at some distance. A second example is Klebsiella pneumoniae isolate 307 plasmid P1 (OX030709) (Fig. IS1202.10Cv) which carries 4 tISKpn21 copies each separated by the same 5 bp DR. If these insertions were targeted to the xrs site, it implies that the xrs proximal IS copy was the last to arrive since the xrs distal copy would not have targeted xrs if it arrived after the proximal copy. The third example from an unnamed plasmid in Acinetobacter junii strain ZM06 (CP077416) is more complex (Fig. IS1202.10Cvi): there is one copy each of ISAba54 and ISAju2 separated by the 5 bp DR. There is also a second copy of ISAbA54 inserted next to a second xrs with a 5 bp DR whose sequence is different from the others (marked *). Note that there is a third non-contiguous xrs copy in this plasmid. This does not have an adjacent IS insertion. This example shows that different IS can target the same xrs and the structure again implies that the xrs-proximal IS arrived last.
Targeting mechanisms
The mechanism of ISAba32 subgroup xrs targeting is at present a matter of speculation. In view of the sequence diversity of the xrs sites targeted, it seems unlikely that ISAba32 subgroup transposases directly recognize DNA sequence. Targeting could be the result of direct transposase interactions with the XerC and or XerD proteins themselves or to a direct recognition of xrs architecture. Neither is it clear why insertion is directional i.e. that it is always IRL which abuts the xrs XerC arm: clearly IRR is less well conserved than IRL, particularly in the internal region (Fig. IS1202.8, Fig. IS1202.9A,B,C).
One possible advantage of targeted insertion to xrs sites is that insertion could increase expression of a downstream gene either by forming a hybrid promoter ��(23)�(see TnPedia) or by providing a mobile promoter (e.g. ISEcp1; ��(24)� IS1380 family). The IS orientation with respect to neighboring orfs is, however, often not always compatible with this. Another possibility is that they are a safe haven as in the case of insertion of the Tn7 transposon directed by an attTn7 sequence downstream from the highly conserved glmS gene (see ��(25)�).
It is important to point out that some xrs identified in enterobacteria are flanked on their XerC-side by specific regions (called 'accessory regions') containing binding sites for various accessory proteins which serve an architectural role and control XerCD-mediated recombination ��(26–28)�. As a particular DNA structure, accessory regions might be targeted by the IS. We do not know at present whether the targeted xrs possess such flanking elements that have so far been described only in enterobacteria.
In addition, targeting may inactivate possible accessory region-mediated control, damaging dimer resolution at these sites. However, the fact that ISs target plasmid-borne pdif-cassettes and chromosome dif sites, neither of which is predicted to use this kind of control, argues against this possibility.
Alternatively, the IS could act as accessory sites themselves, facilitating formation of the appropriate topology required for recombination. That the targeted xrs sites in cassettes are probably active is supported by the observation that they are recognized by Acinetobacter baumannii XerC and XerD in vitro.(P. Rouseau pers. Comm.)
Transposition Mechanism
Little is known concerning the transposition mechanism of this IS family. However, there are two reports in which circular IS copies have been identified. Hudson et al ��(29)� identified circular copies of tISKpn21, a member of the ISAba32 subgroup, during analysis of the antibiotic resistance genes of a clinical Klebsiella pneumoniae carbapenem resistant isolate carrying the metallo -lactamase, bla-NDM-1. MiSeq reads were found where tISKpn21 ends were linked, and separated by 5-bp identical to the direct repeat. PCR was used to rule out that this was due to tandem tISKpn21 copies in the host genome. This product is typical of the circular intermediates generated by a number of IS families by a mechanism called copy-out-paste-in ��(30)� in which one IS end attacks the other, several base pairs into the flanking donor DNA.
Moreover, the circle appeared to be derived from a resident plasmid copy of tISKpn21 which has different 5 bp target flanks (possibly resulting from inter IS recombination): Only the left end flank was observed between the IS ends in the circle, suggesting that the right end preferentially attacks the left during circularization (rather than the left end attacking the right as stated by these authors)(see ��(31,32)�). The second example was described by Nielsen et al ��(33)� in a study of Sphingobium herbicidovorans MH in which they identified a circular form of an IS related to IS1202 with abutted IRL and IRR ends. Unfortunately, the DNA sequence of the collection of circles is not available in an assembled form and it is therefore not possible to identify the IS1202-related IS.
It should be noted that in IS which have adopted the copy-out-paste-in mechanism, there is generally an outward-facing -35 promoter element located in the right end and an inward-facing -10 element in the left end. This results in the temporary formation of a strong promoter in the circular intermediate which permits high levels of transposase expression (for review see ��(30)�). Inspection of the IR sequences suggests that this may be case for tISKpn21 and may be a general property of the ISAba32 subgroup.
It remains to be seen whether the copy-out-paste-in transposition pathway is a general mechanism adopted by the entire IS1202 family.
Aknowledgements
This chapter is based on the results of Siguier et al 2022, 2023 ��(3,4)� and the input of P. Siguier, P. Rouseau and F. Cornet is gratefully acknowledged.
Bibliography
- ↑ Siguier P, Rousseau P, Cornet F, Chandler M . A subclass of the IS1202 family of bacterial insertion sequences targets XerCD recombination sites. - Plasmid: 2023 Jun 9, 127;102696 [PubMed:37302728] [DOI]