Haloarchaea and Haloviruses| MIKE DYALL-SMITH'S RESEARCH
Haloquadratum walsbyi: limited diversity in a global pondMike Dyall-Smith, Friedhelm Pfeiffer, Kathrin Klee, Peter Palm, Karin Gross, Stephan Schuster, Markus Rampp & Dieter Oesterhelt
This study offers new insights into how extremely halophilic archaea evolve and spread around the world, but it also has an interesting story behind it. How was it that I left Australia to work in Germany for 3 years (2008-2010) on such a project? In late 2007 I received an unexpected request. Dieter Oesterhelt, director of the Department of Membrane Biochemistry, Max-Planck-Institute (MPI), invited me to join his department as a visiting professor. He needed an expert on Haloquadratum walsbyi for a project to sequence the Australian isolate (the type strain, C23T), and compare it to the previously sequenced Spanish strain (HBSQ001). My lab had been instrumental in cultivating this organism, and the offer came at an opportune time for me. It led, via a series of most fortunate (and again unexpected) events, to my initial 6 month contract being extended to 3 years, working in one of the most prestigious institutions, with wonderful scientific colleagues, in one of the most livable cities in the world. Completing the genome sequence of Hqr. walsbyi strain C23T and closing all replicons took the first 6 months, but we spent another 2.5 years with a thorough annotation (manually checked and cross-checked), error correction (454 makes mistakes!), experimental work, comparative analyses, and writing it all up. By another stroke of good luck, the two strains we compared were extraordinarily similar in sequence but different enough to identify the origins of most of the genetic changes between them, providing an astonishingly detailed insight into their evolution. Friedhelm's hunt for all the mobile genetic elements, from transposons down to small repeat sequences of ~ 60 bp, was a tour-de-force that not only documented the frequency, nature and diversity of these elements but also revealed a number of curious, underlying modes of recombination that await experimental investigation. The study provides an invaluable basis for future work on the global dispersal and evolution of Haloquadratum. And apart from the fascinating science, my family joined me in Munich, my son went to the local school (Feodor-Lynen Gymnasium) and learned to speak fluent German, my wife met many of the leading dermatologists in Munich and assisted them to get their research published, I took up Jiu-Jitsu under Andreas Schoppe, spoke at many universities and conferences (including the famous Castle Ringberg, owned by the MPI), we all became great fans of Germany, travelled to many parts of Europe, and we made many new friends. I brought back an addiction to German pop-music, from Xavier Naidoo to the famous rappers Bushido and Sido, but back to the main story...
A new start, a new country, a new language: I flew from Melbourne to Munich on Feb. 2008, from a hot Australian summer to a freezing German winter, and was met at the airport by an old friend, Valery Tarasov, a Russian colleague who had previously worked in my lab in Melbourne for several months. By a stroke of luck, he had shifted from Pushchino, just outside Moscow, to work in the Oesterhelt department. Meeting again in Munich proved once again that the world is small. Having come with only two words of German, Bitte and Danke, Valery helped me negotiate the public transport system to the MPI at Martinsried. There I checked-in and picked up my keys to the on-campus guesthouse, an amazing 4-story high accomodation centre (shown behind me in the top picture of this page) a short walk from the main entrance to the Institute. I could work 24 hr a day, 7 days a week - paradise for a research scientist.
The MPI in Martinsried is massive, with over 1000 employees and about every type of equipment known to modern science. After meeting with Dieter and the other project participants (Friedhelm, Peter and Kathrin), I was given a bench in the 'Oe labor', and I started preparing media and inoculating cultures. The department occupied 2 stories, had its own library, tea-room (Halocafe), many labs and equipment rooms, but finding things was difficult because all the rooms looked the same. Fortunately I had many people around who were happy to assist, including Bettina, Susanne, Valery, Peter, Kati, Tanja, Rita and Doron. I rapidly learned two more words of German, Drücken and Ziehen. These were written on all the doors, and if you didn't know their meaning, you could get a bit frustrated, as they mean push & pull!
Hqr. walsbyi is a little tricky to grow and has a long generation time, so it took me a few months to obtain enough cells to extract DNA and send it to Stephan Schuster (Penn. State Univ., USA) for sequencing. I remember Peter Palm helping me to pack the precious DNA solution in a large box filled with dry ice, and to fill in all the appropriate forms for shipping via Fedex. A few weeks later, the genome sequence came back - in 220 contigs, which had to be stitched together by bridging PCRs. Friedhelm Pfeiffer (bioinformaticist) was able to use the published Spanish genome sequence as a scaffold to align the C23T contigs, and then PCR primers could be made (Kathrin helping here) to make connecting PCRs that were sequenced in the core facility. Needless to say, with over 200 contigs to match, I did a lot of PCRs and sequencing reactions.Friedhelm would put the sequences into the ordered scaffold of contigs to systematically join them all up. The final EMBL/Genbank files don't tell you all the work behind them.
One becomes two: the puzzle of plasmid PL6: Plasmid extracts of strain C23T showed a plasmid band of about 6 kb, so I expected a plasmid of this size. However the 454 contigs we had were confusing, with similar but not identical sequences that did not assemble to the correct size. Friedhelm realised the contigs represented two, closely related and similarly sized plasmids, and sorted the pieces into two groups that I could then put together by PCR-sequencing. The two plasmids, PL6A and PL6B do not contain any genes or sequences that give clues to the mode of replication, and so represnt a totally new type of replicon. I found homologous plasmids in other strains of Haloquadratum that I had isolated from different parts of Australia, so they must be common. They also seem to have a connection to virus/plasmid gene clusters that are widely distributed in the genomes of other haloarchaea, which we call ViPREs, for Virus and Plasmid-Related Elements. We still don't know how it is that two closely related plasmids exist in the same strain, but the fact they are multicopy and small make them very attractive replicons for designing cloning/expression vectors suitable for Haloquadratum.
Plasmid PL100 goes AWOL. Another story relates to a much larger plasmid of ~ 100 kb (PL100) carried by strain C23T. It was not represented in the sequence data returned from Stephan Schuster because it was not in the DNA we had sent him! The plasmid had been lost from the culture that was selected for DNA extraction. After some detective work screening other cultures, we found one that still retained PL100. This had to be PCR-amplified in sections and sequenced by primer-walking, and since it ended up being 100 kb, some time was needed to close the circle. This incident really brought home to me the problem of plasmid loss in laboratory cultures, and you wonder how many other microbes with full chromosome sequences deposited in Genbank are lacking plasmids because of loss in culture.
Nothings perfect: certainly not 454 data? I thought 454 sequencing was superbly accurate given the massive coverage (tens to hundreds of reads across every base position). However, when the Spanish genome sequence was compared to the Australian strain C23T, there were a number of pseudogenes present in C23T that were complete ORFs in the other strain, and the base change producing the pseudogene in C23T was a lower quality sequencing position. I checked these positions by PCR-sequencing, finding that many were errors. These corrections restored 42 'pseudogenes' tocomplete open reading frames. I believed this would be the end of the story, but when I did some proteomic analyses, I found proteins of C23T that were not supposed to be expressed at all because they were annotated as pseudogenes. Again, PCR-sequencing resolved 7 such cases as sequencing errors. I have now become much more sceptical about these types of pseudogenes in 454 sequenced genomes. Give me clear, colourful, BigDye chromatograms any day.
The chromosome: amazing conservation of gene order: The genome of strain C23T turned out to be similar in size (3.1 Mb) and extremely similar in sequence to the Spanish strain. One of the real surprises was just how similar they were. It was expected that inversions or rearrangements of the chromosomes would occur more frequently than base changes, but across the 84% of shared/core sequence between the two strains (which was 98.6% identical in nucleotide sequence), there was not a single case of inversion or rearrangement! I am not sure if this is some kind of record given they were independent isolates, recovered using different methods in different labs, and from very different locations - 12,363 km apart. The inference from this is that the strains had hardly diverged, and so shared a common ancestor in the very recent past. This would not be surprising if they were 'normal' Bacteria, able to live in common environments such as soil or seawater, but Haloquadratum has very delicate cells. They do not form spores, lyse in fresh water, and only live in waters with saturating salt concentrations (10x that of seawater). Hypersaline lakes and saltern crystallizer ponds are isolated bodies of water, often separated by considerable distances. How could Haloquadratum move so quickly over such enormous distances and survive? Two likely means of global dispersal would be migratory birds, which often feed and nest in coastal salterns, or by minute salt crystals carried on air currents. Somebody half-jokingly suggested to me that haloarchaeal researchers may be to blame, but we do not sample in one country and fly directly to another to contaminate the other pond, whereas the many thousands of migratory birds essentially do this every year. Recently, it has been shown that Haloquadratum can be preserved inside salt crystals for more than 2 years (Gramain et al. 2011), greatly increasing the possibility of natural dispersal over large distances, either by wind or salt deposits on the exterior surfaces of birds. Much of the work up to this point I presented at a departmental retreat at the MPIs very own castle, Schloss Ringberg, at the end of 2008. This was a very memorable place, and I was honoured by being allocated the master bedroom. [pictures]
A global salt pond: The strain-specific sequences of the two genomes provided even more evidence for a rapid dispersal between Spain and Australia. CRISPRs are hypervariable regions of prokaryotic chromosomes that collect short sequences from foreign DNA elements (viruses and plasmids) and use them to defend against future invasions by the same elements. A CRISPR sequence in the Australian Haloquadratum strain exactly matched part of an endonuclease gene found only in the Spanish strain. The most likely explanation for this correspondence is that the Australian strain had, at some time in the recent past, been invaded by DNA containing the endonuclease gene, and had sampled some of it to integrate into its CRISPR region. This means that DNA like that of the Spanish strain is commonly found in Australia, probably in other strains of Haloquadratum. In the other direction, three CRISPR sequences of the Spanish strain closely matched protein-encoding genes found only in the Australian strain. A similar, but inverse scenario applies here, inferring that DNA like that of the Australian strain is found in Spain. There were also CRISPR sequences of the Australian strain that matched virus DNA (metavirome) sequences from the Santa Pola saltern in Spain, showing that the same viruses were present in both Spain and Australian salt lakes. Finally, a 15 kb environmental DNA sequence recovered from the Spanish saltern (but not present in the sequenced spanish Hqr. walsbyi strain HBSQ001), exactly matched a region of the C23T genome, again indicating that other strains of Hqr. walsbyi in Spain carried sequences almost identical to strain C23T. In summary, the two strains are seeing the same DNA sequences around them, both in other Haloquadratum and in haloviruses.
From the viewpoint of Haloquadratum, it is as if there is little difference between living in Spain or Australia, as the cells and viruses they interact with at the two locations are much the same. It is like they live in the same pond: a global salt pond. This would be essentially true if dispersal is very rapid and world-wide, homogenising the global population by rapid transfers of viable cells (and viruses). But even assuming a rapid and regular cross-inoculation of salt lakes, it is still curious to me that this one species is so dominant. Hqr. walsbyi appears to be the only species of the genus, yet it is found in salt lakes around the world, and often makes up to 80% or more of the total microbial population. By any measure, this is amazingly successful. How is it that this single species, with relatively little divergence between strains, is so comprehensively dominant? It is not that there is no competition: many other types of haloarchaea live in salt lakes, and the most common of these have many described species (e.g. Haloarcula, Halorubrum). What is so special about Haloquadratum?
The big differences: Genomic Islands (GI) and Divergent Regions (DR): While 84 % of the two genome sequences (strains HBSQ001 and C23T) were highly similar and retained the same gene order (synteny), the remaining 16% differed between the two. These were not evenly spread around the genomes but formed twelve distinct regions named, for want of better words, divergent regions (DR). Four of these corresponded to hypervariable regions, so called Genomic Islands, that had been previously identified by Cuadros-Orellana et al. (2007) when they compared the Spanish Hqr. walsbyi genome to Haloquadratum-like environmental sequences recovered from the Santa Pola saltern, the same place from which strain HBSQ001 was isolated. The present study differs in that we compared two entire genomes from isolates that had been recovered from ponds separated by over 12,000 km. The Divergent Regions are a mix of indels (a region has been lost or gained by one strain compared to the other), true divergence (e.g. the same gene has diverged in situ in the genome), and replacements (where two different sequences occupy the same position in the shared sequence of the two strains). Replacement DRs are intriguing because the mechanism underlying the precise swapping of genetic material is not clear. The most dramatic example is a large, 104 kb sequence in strain HBSQ001 that is replaced by a tiny, 429 bp of unrelated sequence in strain C23T; yet they sit between the same bases in the shared or 'backbone' sequence. Replacements show no obvious border sequences that indicate how this process of switching could occur. A puzzle for future study.
Repeat-Mediated Deletions: small repeats cause deletions: Indels (=insertions or deletions) were a common difference between the strains, and in many cases it could be seen that these represented deletion events rather than insertions. For example, the removal of part of a coding sequence in a widely conserved protein gene. Looking at the original sequence (in the undeleted strain) consistently showed small repeat sequences at the termini, usually < 20 bp long, and some probably down to 4 bp. In the deletion strain there was only one residual copy of this repeat, consistent with a "micro-patch" homologous recombination event between the direct repeats. The deletions range in size from 10 bp up to 34 kb! Classical homologous recombination shows a lower limit of about 50 bp, and becomes much more efficient above this, but the evidence from Haloquadratum indicates there is some other mechanism that operates at much shorter repeat lengths, well below the 50 bp limit. Once Friedhelm detected this in Haloquadratum, he could find similar 'repeat-mediated deletions' in Hbt. salinarum - so it may be a common and widespread phenomenon that has been previously unrecognised. This was one of the unanticipated outcomes of the genome comparison, and was a consequence of just the right amount of variation between the two strains plus a lot of hard work over a long time.
An overload of parasites: mobile genetic elements (MGEs): Both strains of Haloquadratum carry about 530 mobile genetic elements ranging in size and complexity from well-known IS elements carrying transposase ORFs down to small repeat sequences (SMRs), some with inverted terminal repeats related to known IS elements (MITES), and others without (PATES). In many published genomic sequence studies, the borders of IS elements are not determined, only the transposase ORFs are annotated, and smaller 'repeat sequences' may not be annotated at all. One is always left wondering how many mobile elements are really present in a genome and what they might be doing. Because the two strains of Haloquadratum were so similar in sequence, it was possible to track the movements of all mobile elements, including those with no encoded transposase, and this allowed their true borders - their termini - to be determined. The process was painstakingly slow for the smaller, less conserved elements with fuzzy termini, but the results provided incredible detail and uncovered two significant processes. One was the targeting of certain elements by other mobile genetic elements, and although this has been documented in other microbes, the extent and specificity with which this was occurring in Haloquadratum was only able to be realised once all the MGEs had been identified and their borders mapped. Not an easy task.
A parasite detox system? The second process was the frequent occurrence of internal deletions occurring within MGEs, removing their transposase gene and so crippling them unless transposase is supplied in trans from a complete copy. These deletions could be seen to occur by the repeat-mediated deletion mechanism described above, and the high frequency of these events in MGEs led to the suggestion that such deletions may act as a defence system for the host, allowing potentially dangerous IS elements to be inactivated. Even more attractive for the cell would be the complete removal of transposons, leaving not a trace. This would be possible via the small repeats they make upon insertion: the target-site duplications. The strange IS605-type transposons and probable derivatives, PATES, could also be removed via their near terminal short repeats. It looks like a continual battle between inserting parasitic DNA and a cellular weeding system that tries to pull them out again, as cleanly as possible.
End of an Era: The Oesterhelt department (Membrane Biochemistry) closed at the end of 2010, two years later than was originally expected. This was my home for three incredible years. The Haloquadratum genome paper was written and submitted before I left Germany, and my family and I flew out of Munich in early January 2011. From an icy winter we went back to the heat of Melbourne, and not long afterwards we drove north up to Wagga Wagga in NSW, which was even hotter, and more humid. The manuscript was accepted for publication in PLos ONE, and published on June 20, 2011. It is the longest journal paper I have ever been an author on, and it is just as well there are now electronic journals that have no page limits. Thanks to Dieter and everyone in the department for making it such a friendly and productive time.
Some memorable and motivational music to end: Alles wird gut (by Bushido, a famous german rapper).