Difference between revisions of "IS Families/IS607 family"
Line 1: | Line 1: | ||
====General==== | ====General==== | ||
− | The first member of what is now called the IS''607'' family was IS''1535'' which had been identified as one of a collection of IS in the genome of ''[[wikipedia:Mycobacterium_tuberculosis|Mycobacterium tuberculosis]]'' H37Rv | + | The first member of what is now called the IS''607'' family was IS''1535'' which had been identified as one of a collection of IS in the genome of ''[[wikipedia:Mycobacterium_tuberculosis|Mycobacterium tuberculosis]]'' H37Rv <ref name=":0"><nowiki><pubmed>10220167</pubmed></nowiki></ref>. Indeed, this study identified six different but related IS elements, IS''1535'' to IS''1539'' and IS''1602'', which, at the time, were grouped into a new IS family called the IS''1535'' family. |
− | IS''607'' itself (Fig. IS607.1) was first identified by subtractive hybridization of a collection of ''[[wikipedia:Helicobacter_pylori|Helicobacter pylori]]'' strains from a number of geographic locations and characterized by the [https://profiles.ucsd.edu/douglas.berg Berg lab] | + | IS''607'' itself (Fig. IS607.1) was first identified by subtractive hybridization of a collection of ''[[wikipedia:Helicobacter_pylori|Helicobacter pylori]]'' strains from a number of geographic locations and characterized by the [https://profiles.ucsd.edu/douglas.berg Berg lab] <ref name=":1"><nowiki><pubmed>PMC110970</pubmed></nowiki></ref>. Further tests showed that it was widespread and present about 20% of ''[[wikipedia:Helicobacter_pylori|H. pylori]]'' strains worldwide. It is 2027bp long and had a similar organization to IS''1535''. A second member of this family, IS''609'' (IS''Hp609''), is also present in many ''[[wikipedia:Helicobacter_pylori|H. pylori]]'' strains <ref name=":2"><nowiki><pubmed>PMC524915</pubmed></nowiki></ref>. |
====Distribution==== | ====Distribution==== | ||
− | Apart from being widespread geographically in ''[[wikipedia:Helicobacter|Helicobacter]]'' strains | + | Apart from being widespread geographically in ''[[wikipedia:Helicobacter|Helicobacter]]'' strains <ref name=":1" /> and in ''[[wikipedia:Mycobacterium|Mycobacteria]]'' <ref name=":0" /><ref><nowiki><pubmed>9634230</pubmed></nowiki></ref>. Full length IS''609'' was widely distributed in many ''[[wikipedia:Helicobacter|Helicobacter]]'' strains isolated from Africa, the Americas, Europe and India but only 1% East Asian strains <ref name=":2" />. IS''607'' family members have been subsequently found in a wide range of bacterial species, including cyanobacteria <ref><nowiki><pubmed>21576885</pubmed></nowiki></ref>, and archea <ref><nowiki><pubmed>PMC1847376</pubmed></nowiki></ref> and related sequences have been found in eukaryotic genomes and viruses �[7,8]�, probably primarily through horizontal DNA transfer events. |
One major problem in identifying IS''60''7 family members is that, like members of the IS''200''/IS''605'' family, there are IS''607'' family members which have lost either ''orfA'' or, like IS''200'' itself, ''orfB''. In the absence of clear sequence signatures which define the IS ends or of empty sites, it is difficult to define such reduced IS''607'' derivatives. | One major problem in identifying IS''60''7 family members is that, like members of the IS''200''/IS''605'' family, there are IS''607'' family members which have lost either ''orfA'' or, like IS''200'' itself, ''orfB''. In the absence of clear sequence signatures which define the IS ends or of empty sites, it is difficult to define such reduced IS''607'' derivatives. | ||
====Organization==== | ====Organization==== | ||
− | Most full-length IS''607'' family members are between 1900 and 2150 bp long (Fig. IS607.1) and carry two overlapping ''orfs'' where the stop codon from the upstream ''orf'' overlaps the start codon of the downstream ''orf'' suggesting that expression of the downstream gene is translationally coupled to that of the upstream gene | + | Most full-length IS''607'' family members are between 1900 and 2150 bp long (Fig. IS607.1) and carry two overlapping ''orfs'' where the stop codon from the upstream ''orf'' overlaps the start codon of the downstream ''orf'' suggesting that expression of the downstream gene is translationally coupled to that of the upstream gene <ref name=":0" /> (see �[9]�) although IS''609'' was reported to encode to additional small ''orfs'' upstream of ''tnpA <ref name=":1" />''<ref name=":2" />. |
− | It was also noted that the product of the upstream ''orf'', TnpA, shared similarity with serine- site-specific recombinases (SR) while that of the downstream frame, TnpB, showed weak similarity with ''TnpB'' of other IS such as IS''1136'' and IS''891'' �[1,10,11]� which do not include an upstream frame similar to ''orfA'' and which, due to the characteristic potential secondary structures at their ends, have been placed in the IS''200''/IS''605'' family. | + | It was also noted that the product of the upstream ''orf'', TnpA, shared similarity with serine- site-specific recombinases (SR) while that of the downstream frame, TnpB, showed weak similarity with ''TnpB'' of other IS such as IS''1136'' and IS''891'' <ref name=":0" />�[1,10,11]� which do not include an upstream frame similar to ''orfA'' and which, due to the characteristic potential secondary structures at their ends, have been placed in the IS''200''/IS''605'' family. |
The terminal DNA sequences of family members are not related although they carry several short directly repeated sequences at each end which appear to be helically phased �[12,13]� and a may contain a longer imperfect inverted repeat sequence near the ends but at different distances �[12,13]� (Fig. IS607.1 and IS607.3). Fig. IS607.2 shows the collection of ends described in reference �[12]� in green and reference �[13]� in black bold, It was noted that many, but not all, IS''607'' family members in [https://isfinder.biotoul.fr/ ISfinder] carry a trinucleotide repeated at each end (shown in reference �[12]�, Fig. IS607.2 ) which may be involved in the transposition reaction �[12]�. The absence of these in some examples may simply be due to an incorrect definition of the IS ends. | The terminal DNA sequences of family members are not related although they carry several short directly repeated sequences at each end which appear to be helically phased �[12,13]� and a may contain a longer imperfect inverted repeat sequence near the ends but at different distances �[12,13]� (Fig. IS607.1 and IS607.3). Fig. IS607.2 shows the collection of ends described in reference �[12]� in green and reference �[13]� in black bold, It was noted that many, but not all, IS''607'' family members in [https://isfinder.biotoul.fr/ ISfinder] carry a trinucleotide repeated at each end (shown in reference �[12]�, Fig. IS607.2 ) which may be involved in the transposition reaction �[12]�. The absence of these in some examples may simply be due to an incorrect definition of the IS ends. | ||
====Mechanism==== | ====Mechanism==== | ||
− | IS''607'' was observed to transpose in ''[[wikipedia:Escherichia_coli|Escherichia coli]]'', and the sequence of the IS-target junctions showed that IS''607'' inserted into a GG dinucleotide (Fig. IS607.1; see also Fig. IS607.2). Insertion sometimes generated a 2 bp target repeat but in other cases, no target repeat was observed. Mutational studies demonstrated that TnpA was required for transposition | + | IS''607'' was observed to transpose in ''[[wikipedia:Escherichia_coli|Escherichia coli]]'', and the sequence of the IS-target junctions showed that IS''607'' inserted into a GG dinucleotide (Fig. IS607.1; see also Fig. IS607.2). Insertion sometimes generated a 2 bp target repeat but in other cases, no target repeat was observed. Mutational studies demonstrated that TnpA was required for transposition ''<ref name=":1" />'' but, as in the case in members of the IS''200''/IS''605'' family TnpB is not. IS''609'' was found to insert in many chromosomal sites and, in contrast to IS''607'', to have a preference for insertion with its left end next to a trinucleotide, TAT <ref name=":2" />. |
− | As pointed out by [https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-4-24 Boocock and Rice] �[12]�, the IS''607'' transposase is unusual for a serine site-specific recombinase because most such enzymes require extensive sequence-specificity for all recombining partners �[14,15]�. As might be expected for a transposase whose function is to optimise the number of potential target sites, IS''607'' TnpA does not | + | As pointed out by [https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-4-24 Boocock and Rice] �[12]�, the IS''607'' transposase is unusual for a serine site-specific recombinase because most such enzymes require extensive sequence-specificity for all recombining partners �[14,15]�. As might be expected for a transposase whose function is to optimise the number of potential target sites, IS''607'' TnpA does not <ref name=":2" />[3,16]�. Integration does not require a large conserved target sequence. |
Serine recombinases are generally of two types: small (smSR) which orchestrate highly regulated recombination reactions such as transposon Tn''3'' transposon family resolution or inversion of invertible DNA segments such as Hin and Gin �[17–19]�, and large SR often involved with phage integration �[20]�; Both carry their DNA binding domains at the C-terminus rather than TnpA of IS607 family members with their DNA binding domain at the N-terminus. | Serine recombinases are generally of two types: small (smSR) which orchestrate highly regulated recombination reactions such as transposon Tn''3'' transposon family resolution or inversion of invertible DNA segments such as Hin and Gin �[17–19]�, and large SR often involved with phage integration �[20]�; Both carry their DNA binding domains at the C-terminus rather than TnpA of IS607 family members with their DNA binding domain at the N-terminus. |
Revision as of 20:38, 4 July 2020
General
The first member of what is now called the IS607 family was IS1535 which had been identified as one of a collection of IS in the genome of Mycobacterium tuberculosis H37Rv [1]. Indeed, this study identified six different but related IS elements, IS1535 to IS1539 and IS1602, which, at the time, were grouped into a new IS family called the IS1535 family.
IS607 itself (Fig. IS607.1) was first identified by subtractive hybridization of a collection of Helicobacter pylori strains from a number of geographic locations and characterized by the Berg lab [2]. Further tests showed that it was widespread and present about 20% of H. pylori strains worldwide. It is 2027bp long and had a similar organization to IS1535. A second member of this family, IS609 (ISHp609), is also present in many H. pylori strains [3].
Distribution
Apart from being widespread geographically in Helicobacter strains [2] and in Mycobacteria [1][4]. Full length IS609 was widely distributed in many Helicobacter strains isolated from Africa, the Americas, Europe and India but only 1% East Asian strains [3]. IS607 family members have been subsequently found in a wide range of bacterial species, including cyanobacteria [5], and archea [6] and related sequences have been found in eukaryotic genomes and viruses �[7,8]�, probably primarily through horizontal DNA transfer events.
One major problem in identifying IS607 family members is that, like members of the IS200/IS605 family, there are IS607 family members which have lost either orfA or, like IS200 itself, orfB. In the absence of clear sequence signatures which define the IS ends or of empty sites, it is difficult to define such reduced IS607 derivatives.
Organization
Most full-length IS607 family members are between 1900 and 2150 bp long (Fig. IS607.1) and carry two overlapping orfs where the stop codon from the upstream orf overlaps the start codon of the downstream orf suggesting that expression of the downstream gene is translationally coupled to that of the upstream gene [1] (see �[9]�) although IS609 was reported to encode to additional small orfs upstream of tnpA [2][3].
It was also noted that the product of the upstream orf, TnpA, shared similarity with serine- site-specific recombinases (SR) while that of the downstream frame, TnpB, showed weak similarity with TnpB of other IS such as IS1136 and IS891 [1]�[1,10,11]� which do not include an upstream frame similar to orfA and which, due to the characteristic potential secondary structures at their ends, have been placed in the IS200/IS605 family.
The terminal DNA sequences of family members are not related although they carry several short directly repeated sequences at each end which appear to be helically phased �[12,13]� and a may contain a longer imperfect inverted repeat sequence near the ends but at different distances �[12,13]� (Fig. IS607.1 and IS607.3). Fig. IS607.2 shows the collection of ends described in reference �[12]� in green and reference �[13]� in black bold, It was noted that many, but not all, IS607 family members in ISfinder carry a trinucleotide repeated at each end (shown in reference �[12]�, Fig. IS607.2 ) which may be involved in the transposition reaction �[12]�. The absence of these in some examples may simply be due to an incorrect definition of the IS ends.
Mechanism
IS607 was observed to transpose in Escherichia coli, and the sequence of the IS-target junctions showed that IS607 inserted into a GG dinucleotide (Fig. IS607.1; see also Fig. IS607.2). Insertion sometimes generated a 2 bp target repeat but in other cases, no target repeat was observed. Mutational studies demonstrated that TnpA was required for transposition [2] but, as in the case in members of the IS200/IS605 family TnpB is not. IS609 was found to insert in many chromosomal sites and, in contrast to IS607, to have a preference for insertion with its left end next to a trinucleotide, TAT [3].
As pointed out by Boocock and Rice �[12]�, the IS607 transposase is unusual for a serine site-specific recombinase because most such enzymes require extensive sequence-specificity for all recombining partners �[14,15]�. As might be expected for a transposase whose function is to optimise the number of potential target sites, IS607 TnpA does not [3][3,16]�. Integration does not require a large conserved target sequence.
Serine recombinases are generally of two types: small (smSR) which orchestrate highly regulated recombination reactions such as transposon Tn3 transposon family resolution or inversion of invertible DNA segments such as Hin and Gin �[17–19]�, and large SR often involved with phage integration �[20]�; Both carry their DNA binding domains at the C-terminus rather than TnpA of IS607 family members with their DNA binding domain at the N-terminus.
The reaction pathway of well characterized smSR has been studied in exquisite detail. It involves a recombinase tetramer and both DNA recombination partners with multiple recombinase binding sites forming a precise architectural structure. However, the N-terminal DNA binding domain in IS607 family transposases would be expected to be incompatible with tetramer formation on the present structural models �[12,13,18,19,21]�.
DNA at the recombining sites is cleaved by attack of the conserved catalytic serine nucleophile to generate a 3 ′ hydroxyl and a covalent 5’ phospho-serine protein-DNA intermediate. Following cleavage, the two recombinase subunits of the tetramer attached to the 5’ DNA ends (the cutting dimer�[12]�) are rotated 180° by the other two (the “rotating” dimer �[12]�) �[22–28]�. The “opposing” 3 ′ hydroxyl groups then attack the phosphoserine linkages of the rotated partners to reform a phosphodiester bond and complete the strand transfer reaction. The GG dinucleotide at the transposon termini (Fig. IS607.2, blue underlined) and the invariant GG at the insertion target sites of these IS, might represent the 2 bp “overlap” sequence observed at the recombination crossover site for other serine recombinases �[12,13]�.
Like many IS, it is thought that IS607 transposases using a double strand closed circular intermediate as judged by the presence of a LE-RE junction identified by PCR in an E. coli transposition system (Grindley, personal communication and cited in ref �[12]�) in which the IS is excised from its donor site with both ends joined and the donor DNA is resealed �[12]�. This model is reasonable in view of the type of transposase involved but remains to be formally validated.
The Johnson laboratory �[13]� has investigated the transposases of three related IS607 family members: IS607 itself, IS1535, and ISC1926 from Sulfolobus islandicus, a hyperthermophilic archea �[29]�. Using a lambda hop assay �[30]� they confirmed that IS607 indeed inserts between the G residues in a GG dinucleotide target; that transposition requires an intact TnpA, mutation of the active site serine (vertical arrow in Fig. IS607.2) with glycine eliminated activity. While TnpA promoted transposition in this assay, a construct supplying TnpA and TnpB did not and it was concluded that TnpB inhibits transposition. This is similar to the observed effect of the related TnpB on transposition of the IS200/IS605 family member, ISDra2 �[31]�.
Purified TnpA from all three IS, IS607 itself, IS1535, and ISC1926 was able to bind to the cognate IS ends in gel mobility shift assays (EMSA) �[13]�. More extensive studies, directed to TnpA of IS1535 (TnpAIS1535) due to its more robust binding activity, using EMSA, DNase footprinting and exonuclease digestion, indicated that: it binds cooperatively to multiple sites (the directly repeated sequences) in LE and can bind a second LE to form a paired end complex (PEC); TnpA “nucleation” occurs over four of the helically-phased 9 bp direct repeat LE motifs (Fig. IS607.2); LE motifs (a) through part of (d) are required for PEC formation, but efficiency is improved by including “non-specific” DNA both within LE and at the IS-host DNA junction; RE is a poor substrate for TnpA binding, possibly resulting from the lower number of repeat motifs; TnpA covered only the two repeated RE motifs and supported only a low level of RE-RE or RE-LE PEC formation; although protected from exonuclease digestion, no clear footprint could be detected on RE indicating a lower binding affinity �[13]�. Unpublished data from these authors citing binding studies with TnpAIS607 and TnpAIS1926 however, suggested that this is not a general property of all members of the IS607 family since neither exhibited such large differences in the capacity to bind LE and RE ends of their cognate elements.
Further functional analysis of TnpAIS1535 revealed that removal of the 50 N-terminal amino acids eliminated binding activity �[13]�, confirming its role in TnpA function �[12]�. When well conserved residues in the catalytic domain (* in Fig. IS607.3) were substituted for cysteine, the capacity of the mutant proteins to generate PECs under oxidizing (cross linked) or reducing conditions was identical. This was interpreted as showing that there are no large-scale conformational changes in the catalytic domain on binding. However, a similar substitution in the C-terminal HTH domain (# in Fig. IS607.3) eliminated PEC formation when oxidised as did deletion of this region, suggesting that binding involves conformational changes in this domain.
Structural analysis also underlined the difference between IS607 family TnpA and other serine recombinases. Structures were solved for the combined catalytic and C-terminal HTH domains of TnpAIS1535 and of TnpAIS1926 �[13]� (Fig. IS607.3) and for TnpAISC1904 from Sulfolobus solfataricus P2 with similar results �[12]�. The structures were found to contain either one (TnpAIS1535) or two TnpAIS1926 dimers. The topology of the catalytic domains was identical to that of the catalytic core of the smSR catalytic domain. However, whereas the dimer interface of smSR occurs between the long helices at the C-terminal end of the catalytic domain, that of both TnpAIS1535 and of TnpAIS1926 is located between entirely different helices in the catalytic domain (Fig. IS607.4). It was proposed that this C-terminal HTH domain must be displaced to permit DNA to enter the active site. Moreover, since the DNA binding domain is located on the opposite side of the dimer to the active site, it seemed probable that cleavage occurs in trans (where a molecule bound to one recombining pair cleaves the other partner) �[13]� as suggested in the model proposed by Boocock and Rice �[12]�, a characteristic of many transposable elements (see �[32–34]� for reviews). The unusual dimer structure originally noted for TnpAISC1904 led to an elegant detailed mechanistic model �[12]�. In the Boocock-Rice model, which addressed both integration and excision of the proposed circular IS intermediate, a TnpA tetramer forms a complex with both recombining DNA partners. For integration the tetramer assembles on the IS circle LE-RE junction (Fig. IS607.4) while in excision, it is proposed to assemble from dimers bound a each end during stnapsis (Fig. IS607.5) at it is proposed that binding of the dimer occurs to LE (or RE) in the circle junction using TnpA dimer (B) using its DNA binding domain. For catalysis to occur, the active site must be “demasked” (Fig. IS607.4 top). The TnpAISC1904 structure indicated that the C-terminal HTH domain must be able to move to render the catalytic center accessible to DNA �[12]� and this was confirmed by crosslinking experiments indicating that the C-terminal HTH domain must be able to move for PEC formation however, cross linking of the catalytic domain did not affect PEC formation. A second dimer is then proposed to bind to the RE (or LE) side of the junction leading to binding of the insertion site and its engagement in the catalytic site. The model proposes that the free DNA binding DNA of a single dimer is not capable of binding DNA non-specifically on its own but two such domains in the tetramer are proficient in non-specific DNA binding and the tetramer bound to the IS junction can thus accommodate the relatively non-specific target DNA.
To understand the mechanism of IS607 family transposition in detail (for example at which stage in the pathway the HTH configuration change occurs, the stoichiometry of the transpososome, how the different repeated sequence elements bind TnpA) will require further structural studies using DNA-protein cocrystals.
It should be noted that there is as yet no experimental evidence describing cleavage or strand transfer in vitro.