Difference between revisions of "IS Families/IS110 family"
|Line 40:||Line 40:|
=====Insertion specificity and target secondary structures=====
=====Insertion specificity and target secondary structures=====
Revision as of 17:57, 2 August 2020
IS110 was originally identified in 1985 in Streptomyces coelicolor A3(2) as an element present in a derivative of bacteriophage phiC31 carrying a selectable viomycin resistance gene. The phage was deleted for its attachment site and therefore unable to lysogenise its host. The presence of IS110 enabled the phage to integrate using homologous recombination with resident IS110 copies in the chromosome .
There are over 330 examples of IS110 family members from nearly 130 bacterial and archaeal species. The Tpases of several have been identified in various sequenced bacterial genomes although the ends of most of these elements have not been defined and are therefore not included in ISfinder. Members such as the Mycobacterium paratuberculosis-specific IS900 and IS901 and the Coxiella burnetti IS1111  have been used as a highly specific marker for the precise strain identification (e.g. ).
The family includes two subgroups which, it has been suggested, may represent two distinct families : IS110 and IS1111. Members of the IS1111 sub-group are distinguished from those of the IS110 group principally by the presence of small (7 to 17 bp) sub-terminal IRs (Fig. IS110.1 and Fig. IS110.XX). Perhaps one of the better studied is the IS110 group member, IS492, from Pseudomonas atalantica originally identified by its activity in extracellular polysaccharide production (eps): inactivating the gene by insertion and reactivating by excision .
Members of the family carry a DEDD transposase and, at present is the only IS family known to encode this type of enzyme. DEDD transposases are related to the RuvC Holliday junction resolvase . The Tpase is closely related to the Piv and MooV invertases from Moraxella lacunata / M. bovis  and Neisseria gonorrhoeae  (Fig. IS110.2). Piv catalyses inversion of a DNA segment permitting expression of a type IV pilin. Intriguingly, early studies revealed that the transposase of one IS, IS621, clustered within the piv clade (Fig. IS110.2 top) and the IS carries ends with similarities to those of the 26 bp pilin gene inversion sequences  (Fig. IS110.2 bottom). It should be noted that several piv-like genes (irg1-8 for invertase-related gene) have been identified in Neisseria gonorrhoeae strain FA1090 . None could complement either the Moraxella lacunata Piv or IS492 transposase and inactivation of all eight genes and overexpression of one copy of each failed to show an effect on pilin variation, DNA transformation or repair. Furthermore, analysis of DNA flanking the coding sequences support the hypothesis that the Piv homologues are indeed transposases for two new IS110 family members, ISNgo2 and ISNgo3. ISNgo2 (irg3, 4, 5, 6 and 8) is present in multiple copies in N. gonorrhoeae while ISNgo3 (irg7 and also closely related to pivNM1) is found in single copy in N. gonorrhoeae and in duplicate copies Neisseria meningitidis . However, neither has yet been formally shown to transpose. Care should therefore be exercised in distinguishing between IS110 family transposases and functional piv genes.
One major difference in the organization of IS110 family members and the inversion systems is that in the piv system, the recombinase is located outside the invertible segment, while in the IS110 family, it is located within the IS element .
Although the Tpases of the IS110 and IS1111 groups are very similar to each other, more detailed analysis shows that they generally separate into two distinct groups delineating the IS110 members from those of the IS1111 group (Fig. IS110.3) and a segment containing a mixture of both IS subgroups. It is possible that the few IS110 elements found within the IS1111 group and the IS1111 elements within the IS110 group have been misclassified.
Some family members have been reported to generate small Direct Repeats (DRs) while others do not. However, in most cases where flanking DR occur, the data can be interpreted to show that one DR copy is present in the target while the second copy belongs to the IS and is transmitted via a circular transposition intermediate suggesting that integration is sequence-targeted.
Members (Fig. IS110.4) vary between 1136 bp and 1558 bp, with most clustered in the 1450 bp size range. The length distribution of the IS110 group is more disperse than that of the IS1111 group. The organization of IS110 family members is quite different from that of IS with DDE transposases: they do not contain the typical terminal IRs of the DDE IS and do not generally generate flanking target DRs on insertion. This implies that their transposition occurs using a different mechanism to that of DDE IS. A single long, relatively well conserved, reading frame is present and shows some clusters of conservation within the N- and C-terminal portions. One characteristic which distinguishes IS110 family members from all other elements whose Tpases exhibit a predicted RNase fold is that the predicted catalytic domain of their DEDD Tpases is located N-terminal to the DNA binding domain  (Fig. IS110.1). In the DDE Tpases it is generally located downstream towards the C-terminal end of the protein. The alignment shown in Fig. IS110.5, based on 149 IS110 and 187 IS1111 group members, shows that the N-terminal catalytic domain of both IS110 and IS1111 groups share significant identities. The probable C-terminal DNA binding domains of the two groups vary somewhat from each other (Fig. 110.6). Those of the IS1111 group show significant conservation compared with IS110 group members, perhaps reflecting the different types of ends carried by each group.
It has proved difficult to determine the activity of these Tpases in detail in vitro. Transposition of IS with DEDD Tpases may be unusual and involve Holliday Junctions (HJ) intermediates  which must be resolved using a RuvC-like mechanism . This type of recombination would be consistent with the close relationship between DEDD Tpases and the Piv/MooV invertases which presumably resolve HJ structures during inversion . The difference in domain organization between the DEDD and DDE Tpases reinforces the idea that the two IS types possess a different transposition mechanism.
Few data are available concerning enzymatic activities of the putative Tpases of this family of elements: the IS900 Tpase has been detected by immunological methods in the Mycobacterium paratuberculosis host  and IS492 Tpase has been purified and appears to exhibit DNA cleavage activity specific for the ends of the element (Perkins-Balding and Glasgow, pers. comm.) but there yet no published information.
However, several members of this family from both the IS110 and IS1111 groups produce double strand circular transposon intermediates (e.g. IS492:; ISPa11 ; ISEc11  ; IS117 ; IS1383 . It should be noted that although, like other IS families, such circles are almost certainly transposition intermediates and, where examined, their formation requires transposase expression, IS110 family transposon circles could simply be generated by site-specific recombination rather than by the copy-out-paste-in mechanism adopted by families such as the IS3 family.
That the circles may be transposition intermediates was suggested by the observation that Streptomyces coelicolor IS117 was initially demonstrated in a circular form which integrates at a frequency two orders of magnitude higher than when cloned as a "linear" copy . For IS117/IS116 (IS110) , IS492 (IS110) , IS1383 (IS1111) , ISEc11 (IS1111) , IS4321/IS5075 (IS1111)  and ISPa11 (IS1111) , DNA fragments carrying abutted IS ends were detected by PCR analysis in vivo and the structures confirmed by nucleotide sequencing. Their appearance was dependent on an intact Tpase gene and their nucleotide sequence is consistent with the formation of a circular form of the element.
Henderson et al, 1989 were perhaps the first to suggest that this family used site-specific recombination to transpose. IS117, originally identified as a “mini” circle shows a 2/3 base pair identity between the circle junction and its specific site of insertion into the host chromosome  (Fig. IS110.7). Transposition was often found to result in tandem dimer inserts, behavior which might indicate some type of rolling circle insertion mechanism such as observed in the case of the IS91 family elements.
Another member of the IS110 group, IS492, clearly undergoes Tpase dependent precise excision to regenerate a functional eps gene in Pseudomonas atlantica (Fig.IS110.8A). The inserted IS copy is flanked by 5 bp directly repeated sequences (5’-CTTGT-3’) (Fig.IS110.8B). The circle junction carries a single copy of this sequence (Fig.IS110.8C) as does the empty target site. This suggested that one copy is carried by the IS and is required for activity. Sequential deletion of the ends of (Fig.IS110.8D) clearly showed that the pentanucleotide and/or sequences immediately upstream were required for excision. On the other hand, a sequence 5’-GTTT-3’ located upstream in those insertions analyzed (Fig.IS110.9) was not required for excision. It is possible that they are needed for circle integration.
Similar flanking sequences have also been identified in insertions of IS900, IS901, IS902, IS116, IS1110, and IS2112 (Fig. IS110.10) and IS621 was also shown to have a flanking sequence, in this case a dinucleotide, CT .
The ends of IS1111 group members differ from those of the IS110 group by including short subterminal IRs. IS1383 was identified as flanking insertions into each end of the IS5 family member, IS1384  and was also shown to generate IS circle junctions (Fig. IS110.11A). Like most members of this group, IRL is located further from the IS tip than is IRR. In this case IRL is preceded by the sequence 5’-agatgg-3’ (lower case indicates the IS end sequences upstream and downstream of IRL and IRR respectively). The insertions into the ends of IS1384 had occurred into a resident AG(A) sequence and excision to form the circle junction appeared to have occurred by recombination between the resident AG(A) and the terminal aga at the left end of IS1383 . This this is compatible with a site-specific recombination mechanism in IS1383 transposition. A similar arrangement was observed for a second IS1111 group member, ISEc11 , where a flanking tetranucleotide AAAT also appeared as part of the circle junction (Fig. IS110.11B) and it has also been argued that this is compatible with a site (sequence)-specific recombination transposition mechanism . However, in two additional cases from the Hall lab, IS4321/IS5075 and ISPa11, no such “micro-homologies” were detected  (Fig. IS110.11C and D). However, it should be noted that transposon circles are generated in vivo and analyzed by PCR. Since there may be a number of copies of the IS in the host genome, this might compromise the sequence of the PCR product.
The number of fully studied examples of IS1111 group members is limited, it is possible that the flanking “micro-homologies” observed for IS1383 and ISEc11 are chance occurrences and that excision and insertion of IS1111 members is truly mechanistically different from those of IS110 group members and that their division into separate families is justified.
Transient Promoter Formation: the circle junction
Like many other IS which use double strand circular intermediates, circle formation results in the assembly of a junction promoter formed from a -35 promoter element in the right end oriented outwards and a -10 promoter element in the left end oriented inwards �[39–41]�. For the IS110 family, this was originally identified in circular forms of IS492 �� (Fig. IS110.12). A list compiled of many IS1111 group IS �� and in silico construction of IS circle junctions indicated that all had the capacity to generate probable promoters. Due to small variations in the distance of the subterminal IRs from the probable end of the IS, some were separated by 10 bp and some by 9 bp. A notable observation is that while the -35 promoter elements are located entirely within the right IS end, the -10 promoter element was not located entirely within the left end but was composed of sequences from both the left and right ends and was only assembled on circle formation. However, unlike the IS492 junction promoter which appears to be significantly stronger than the lacUV5 promoter �� and the junction promoters of ISEc11 and a naturally occurring derivative, ISEc11p which are also functional ��, few of these have been examined for activity.
Insertion specificity and target secondary structures
The particular insertion specificities of the IS110 family has been mentioned in the context of the mechanism of transposition and is often one factor in making definition of the IS ends difficult. However, one characteristic of insertion of this family of IS is that they prefer sequences with the propensity to form secondary structures. This is consistent with the fact that the transposases are similar to the RuvC and the RuvC endonuclease is involved in resolving branched Holiday junctions during recombination (e.g.��).
For example, IS621 insertions were observed to be flanked by a CT dinucleotide ��. On further examination this was shown to be a dinucleotide located at the foot of Rep sequences in the host Escherichia coli genome (Fig. IS110.13A). REP sequences are small Repeated Extragenic Palindromic sequences often present in many hundreds of copies in bacterial genomes and which play a variety of structural and regulatory roles �[43–49]�. Both Z1 and Z2 Rep �[44–46]� sequences are used as targets and all 10 copies of IS621 in the E. coli ECO28 genome were found in this position in resident Rep sequences ��.
There are at least six other examples of this type of “structural” insertion specificity (Fig. IS110.2). All 8 copies of ISPpu10 were identified in short REP sequences of Pseudomonas putida KT2440 �[50,51]� and a cloned ISPpu10 derivative was shown experimentally to transpose into this REP target �� (IS110.13B). Eight (of 8) copies of a related IS, ISPup9, were identified in the same REP sequence at the same position but inserted in the opposite orientation (i.e. on the opposite strand)�� (Fig. IS110.13B) while 4/4 examples of ISRm19 were identified in a REP sequence of Rhizobium meliloti (Fig. IS110.C). Similarly, ISPa11 of the IS1111 group inserts specifically into a Pseudomonas aeruginosa REP (6 examples) �� and one example from Partridge and Hall (2003)�� (Fig. IS110.D).
Two types of Insertion have been described �� are of two types. In type 1, the IS inserts at the same position within the REP whereas type 2 insertions occur adjacent to a REP. Most IS110 family members exhibit type I insertion patterns in all examples identified. However, one IS, ISPsy7 exhibited type II insertion pattern but only in 6/10 examples and a second unspecified IS from Neisseria meningitidis MC58 was also reported to exhibit a type II pattern in 3/5 cases examined ��. It is possible that this N. meningitidis IS is the same as that described by Skaar et al. ��.
At least six different members of the IS1111 subgroup (ISKpn4, ISPa21, ISPst6, ISUnCu1 = ISPa62, ISAvX1 = ISAzvi12 and ISPa25) show a preference for another type of target which can assume structured a configuration, the attC sequences of integrons �[53,54]�. IS which insert into attC sequences are grouped into a specific clade (Fig. IS110.2) ��. The integron attC is central to integration of circular integron cassettes �� and had been called “59 base pair element” �� but can vary considerably in length ��. Studies from the Mazel lab have shown that attC sequences form foldback structures (Fig. IS110.14 top) with imperfect matches in which extrahelical bases are involved in driving the direction of the excision and integration reactions �[55,57–59]�. Integration of IS1111 group members appears to occur at a specific position on these attC foldback sequences (Fig. IS110.14).
Other IS of this family also appear to insert into conserved target sequences: IS1533 occurs in 84 copies in Leptospira borgpetersenii and inserts into a partially conserved sequence (ttAGACAAAA[IS1533]TATCAGagcc-gtct--aaa); ISRfsp2 from Roseiflexus sp RS-1, present in 40 copies in the host genome, is flanked by the sequence, CTCtGCGaaCGCtGCGc[ISRfsp2]CTCtGCGGtg (Fig. IS110.15) while ISMpa1 from Mycobacterium avium subsp. Paratuberculosis is flanked by the consensus CCAGN0–1CTA[ISMpa1]GCCN0–6GCCG ��.