General Information/IS Organization

From TnPedia
Revision as of 18:42, 3 June 2020 by TnCentral (talk | contribs)
Jump to navigation Jump to search

General

In addition to being small, insertion sequences are genetically compact (Fig.1.25.1). They generally encode no functions other than those involved in their mobility although individual members of several families which include additional genes are now being identified. IS-encoded functions include factors required in cis, in particular recombinationally active DNA sequences that define the ends of the element together with an enzyme, the transposase (Tpase), which recognises and processes these ends. The Tpase is generally encoded by a single or perhaps two, open reading frames and consumes nearly the entire length of the element.

Fig 1.25.1. General organization of IS ends.General organization of classical IS elements. The green box represents the IS element. Terminal inverted repeats (IRL and IRR) are shown in red. A single open reading frame (yellow) is shown within the IS. It stretches the entire length of the element and, although not always the case, is shown here to terminate within IRR. The indigenous Tpase promoter is located (by convention) in IRL. Transcription is from left to right. The arrows show that the protein acts on the ends of the element. The domain structure of the IRs is indicated by A (the region recognized by Tpase that is involved in cleavage) and B (the region to which Tpase binds in a sequence-specific way). XXX represents the short direct target repeat sequence that is duplicated during the insertion event.

Terminal inverted repeats

With several notable exceptions (the IS91, IS110 and IS200/605 families; TABLE Characteristics of IS families) the majority of ISs exhibit short terminal IR of between 10 and 40 bp. In those cases examined experimentally, the IRs can be divided into two functional domains (Fig.1.16.1). Domain “b” includes the two or three terminal base pairs (Fig.1.26.1), and is involved in the cleavages and strand transfer reactions leading to transposition of the element. Domain “a” is positioned within the IR and is involved in Tpase binding[1][2][3][4][5][6][7].

A similar organisation has also been proposed for the transposon Tn3[8] and for the related γδ transposon[9]. The simple single terminal Tpase binding sites of ISs are to be contrasted with the multiple and asymmetric protein binding sites observed in the case of bacteriophage Mu[10] and transposons Tn7[11], and probably Tn552[12][13]. Multiple protein binding sites are also a characteristic of the complex En/Spm and Ac elements of maize (see [14][15] (Fig.1.26.2). It is worth noting that members of the IS21 family also carry multiple repeated sequences at both ends which may also represent Tpase binding sites[16][17].

By accommodating different binding patterns at each end, such an arrangement can provide a functional distinction between the ends either in the assembly or in the activity of the synaptic complex. In addition, indigenous IS promoters are often located partially within the IR sequence upstream of the Tpase gene, by convention IRL. This arrangement may provide a mechanism for autoregulation of Tpase synthesis by Tpase binding. Binding sites for host specified proteins are also often found within or close to the terminal IRs and these proteins may play a role in modulating transposition activity or Tpase expression.

Fig 1.26.1. An early analysis of the domain Structure of IS ends.The DNA signals recognized for cleavage at each IS end are shown in red. Those required for transposase binding are shown in green. Often, the sequence of the region between these two domains is not conserved.
Fig 1.26.2. Most common Transposon Ends.

Domain structure of transposases

Fig 1.27.1. Domain organisation of transposases of the DDE family. The relative positions of the potential ZF, HTH, LZ and the ‘DDEK/R’ catalytic motif are indicated from left to right as light blue boxes. The figure illustrates the N-terminal and C-terminal extension of the different transposase examples. (a) Classical IS1 with frameshift. The position of the frameshift window which is used to generate InsAB is indicated. (b) IS1 without frameshift and the ISMhu11 group showing the deletion of the ZF, the C-terminal extension, and the increased spacing between the second (d) and (e) residues. (c) The IS1595 family showing the classical IS1595 group and the IS1016 group which does not carry the N-terminal ZF. (d) The IS3 family including members with and without the translational frameshift. (e) The closely related IS481 family which lack the N-terminal HTH domain and exhibit an additional C-terminal domain.

A general pattern for the functional organisation of Tpases appears to be emerging from the increasing number which have been analysed. Many can be divided into topologically distinct structural domains and, although several regions of the protein may contribute to a given function, the isolated domains themselves often exhibit a distinct function. The sequence-specific DNA binding activities of the proteins are generally located in the N-terminal region while the catalytic domain is often localised towards the C-terminal end: IS1[18][19]; IS30[20];Mu, (see [21][22]; Tn3[23][24]; IS50[25]; IS903[26]; IS911[27]; for a review see [28] (Fig.1.27.1). One functional interpretation of this arrangement for prokaryotic elements is that it may permit interaction of a nascent protein molecule with its target sequences on the IS thus coupling expression and activity. This notion is reinforced by the observation that the presence of the C-terminal region of the IS50, IS10 and IS911 Tpases appears to mask the DNA binding domain and reduce binding activity[29][30] possibly by masking the DNA binding domain. This arrangement might favor activity of the protein in cis, a property shared by several Tpases (see Activity in cis). Similar masking appears to occur with the IS1 (D. Zerbib and M. Chandler, unpublished) and the IS911[31][32] Tpases. In several cases these domains are assembled into a single protein from consecutive orfs by translational frameshifting (Programmed translational frameshifting). In the case of IS911, it has been demonstrated that transposase binding to the IS ends occurs as the protein is translated[33].

One exception to this is the transposase of the IS110 family which encodes a DEDD transposase closely related to the RuvC Holiday resolvase (see [34][35]) and in which the catalytic domain appears to precede the DNA binding domain.

In addition to functional domains for DNA binding and catalysis, many, if not all transposases have the capacity to generate multimeric forms essential for their activity (see [36][37]). This is true of prokaryotic elements such as bacteriophage Mu (see [38]), IS50[39], IS911[40][41][42], IS608 and ISDra2[43][44] (but apparently not IS10[45], and of eukaryotic elements such as the retroviruses (see [46][47][48][49]) whose integrase (IN) (transposase) appears to be a dimer of dimers both with and without DNA bound[50][51][52][53] as does the purified P element transposase[54], the mariner-like element, Mos1[55][56][57] and hermes (which appears to be an octomer)[58][59][60]. With the results of an increasing number of structural studies of these types of enzyme, it will be of great interest to compare the overall similarities of equivalent functional domains as has been recently possible with the catalytic domains of retroviral integrases, Mu transposase and other polynucleotidyl transferases such as the Holiday resolvase, RuvC and RnaseH (see [61][62]). One particular type of structure, a leucine zipper, clearly plays a vital role in multimerization of a number of transposases[63][64][65][66][67][68][69][70].

Finally, several DDE transposases exhibit what may be thought of as an “orphan” domain located between the second D and final E of the DDE motif[71] (Table transposases examined by secondary structure prediction programs). These can be either largely α-helical or β-stranded. It is likely that these insertion domains play subtle roles in the chemical pathways involved in the transposition of their cognate transposable elements. The largely β-stranded IS50 (Tn5) associated insertion domain (Major DDE transposition pathways - Fig.1.8.3)) interacts with and assists in stabilising a hairpin-loop structure at the transposon end formed as an intermediate during transposition[72][73][74] and therefore performs a crucial function. Another type of insertion domain is present at the same topological position in the eukaryotic Hermes transposase[75]. This long insertion domain is entirely α-helical. In contrast to Tn5 whose hairpin intermediate is formed on the transposon end, Hermes transposes using an intermediate in which a hairpin is formed on the flanking donor DNA[76][77]. It appears that the insertion domain assists stabilisation of the Hermes hairpin intermediate. A similar α-helical insertion has been identified in the VDJ recombinase, RAG1[78] which also generates a hairpin intermediate similar to that of Hermes. In Hermes, the domain also assists in forming transposase multimers.

Direct target repeats

Fig 1.28.1. The origin of the direct target repeats.The green box represents the IS element. Terminal inverted repeats (IRL and IRR) are shown in red, target DNA in grey and the target sequence in black. The 3'OH at the generated by transposase catalyzed hydrolysis attack the short target sequence in a staggered manner resulting in integration and generation of a short gap at each end of the integrated transposon. This then undergoes gap repair to generate the final product in which the target sequence is duplicated at each end.

Another general feature of IS elements is that, on insertion, most generate short directly repeated sequences (DR) of the target DNA flanking the IS. The attack of each DNA strand at the target site by one of the two transposon ends in a staggered way during insertion provides an explanation for this observation. The DR is generated by repair of the “gap” between cleavage sites (Fig.1.28.1).

The length of the DR, generally between 2 and 14 bp, is characteristic for a given element and a given element will generally generate a duplication of fixed length. This is determined by the architecture of the transposition complex or transpososome which imposes constraints on the distance between cleavages on each strand of the target DNA[79][80][81].

However, certain ISs have been shown to generate DRs of atypical length at a low frequency, presumably reflecting small variations in the geometry of the transpososome (see [82]). Although some notable exceptions exist in which there is a systematic absence of DRs (either within a given family or in several independent transposition events of a given element), care should be taken in interpreting the absence of DRs in isolated cases.

A lack of DRs can simply result from homologous inter- or intra-molecular recombination between two IS elements, each with a different DR sequence. This would result in a hybrid element carrying one DR of each parent. It can also arise from the formation of adjacent deletions resulting from duplicative intramolecular transposition. In this case, a single copy of the DR is located on each of the reciprocal deletion products (see for example [83][84] or, more recently in a clinical context, [85].

Three IS, IS1549, IS1634 and IS1630, have been identified which appear to generate long DRs of quite variable length[86][87][88]. Two, IS1549 and IS1634, are distantly related to the IS4 family and one, IS1630, belongs to the IS30 family. The mechanism involved in generating such long DRs is at present unknown. However, it seems reasonable to propose that the target DNA may be able to form a loop within the transpososome and thereby bringing somewhat distant phophodiester bonds into close proximity.






Bibliography

  1. <pubmed>8898394</pubmed>
  2. <pubmed>2161528</pubmed>
  3. <pubmed>2551675</pubmed>
  4. <pubmed>6306482</pubmed>
  5. <pubmed>2832849</pubmed>
  6. <pubmed>28776821</pubmed>
  7. <pubmed>11352577</pubmed>
  8. <pubmed>2155858</pubmed>
  9. <pubmed>7723015</pubmed>
  10. <pubmed>6094016</pubmed>
  11. <pubmed>8556868</pubmed>
  12. <pubmed>2170815</pubmed>
  13. <pubmed>7828593</pubmed>
  14. <pubmed>8556865</pubmed>
  15. <pubmed>8556866</pubmed>
  16. <pubmed>11125167</pubmed>
  17. <pubmed>9729608</pubmed>
  18. <pubmed>2553980</pubmed>
  19. <pubmed>2162466</pubmed>
  20. <pubmed>2154486</pubmed>
  21. <pubmed>8556870</pubmed>
  22. <pubmed>8646783</pubmed>
  23. <pubmed>8382339</pubmed>
  24. <pubmed>8080658</pubmed>
  25. <pubmed>8289277</pubmed>
  26. <pubmed>9417930</pubmed>
  27. <pubmed>8950268</pubmed>
  28. <pubmed>10547692</pubmed>
  29. <pubmed>8057357</pubmed>
  30. <pubmed>8412678</pubmed>
  31. <pubmed>9761671</pubmed>
  32. <pubmed>11352577</pubmed>
  33. <pubmed>22195971</pubmed>
  34. <pubmed>15866929</pubmed>
  35. <pubmed>12897009</pubmed>
  36. <pubmed>10547692</pubmed>
  37. <pubmed>10677279</pubmed>
  38. <pubmed>8805293</pubmed>
  39. <pubmed>7958902</pubmed>
  40. <pubmed>9761671</pubmed>
  41. <pubmed>11352577</pubmed>
  42. <pubmed>10677279</pubmed>
  43. <pubmed>16209952</pubmed>
  44. <pubmed>20890269</pubmed>
  45. <pubmed>8565068</pubmed>
  46. <pubmed>7526778</pubmed>
  47. <pubmed>1322888</pubmed>
  48. <pubmed>12446721</pubmed>
  49. <pubmed>15718297</pubmed>
  50. <pubmed>12446721</pubmed>
  51. <pubmed>17157316</pubmed>
  52. <pubmed>19609359</pubmed>
  53. <pubmed>19229293</pubmed>
  54. <pubmed>17644523</pubmed>
  55. <pubmed>8913752</pubmed>
  56. <pubmed>17565190</pubmed>
  57. <pubmed>19766564</pubmed>
  58. <pubmed>16041385</pubmed>
  59. <pubmed>16511103</pubmed>
  60. <pubmed>25036632</pubmed>
  61. <pubmed>8696976</pubmed>
  62. <pubmed>8548793</pubmed>
  63. <pubmed>9761671</pubmed>
  64. <pubmed>10677279</pubmed>
  65. <pubmed>2147779</pubmed>
  66. <pubmed>8647395</pubmed>
  67. <pubmed>22032517</pubmed>
  68. <pubmed>8643520</pubmed>
  69. <pubmed>19416360</pubmed>
  70. <pubmed>8520113</pubmed>
  71. <pubmed>23217365</pubmed>
  72. <pubmed>10884228</pubmed>
  73. <pubmed>15102449</pubmed>
  74. <pubmed>17176076</pubmed>
  75. <pubmed>16041385</pubmed>
  76. <pubmed>15616554</pubmed>
  77. <pubmed>30239795</pubmed>
  78. <pubmed>15616554</pubmed>
  79. <pubmed>16181782</pubmed>
  80. <pubmed>23217365</pubmed>
  81. <pubmed>21439812</pubmed>
  82. <pubmed>26104718</pubmed>
  83. <pubmed>6314502</pubmed>
  84. <pubmed>7489730</pubmed>
  85. <pubmed>26060276</pubmed>
  86. <pubmed>10601219</pubmed>
  87. <pubmed>9495740</pubmed>
  88. <pubmed>9973360</pubmed>