University of Liverpool
Functional and Comparative Genomics
So Neil…what do you consider yourself in this field?
“Biologists would call me a Bioinformatician…
I call myself a Geneticist…
Bioinformaticians would call me…dangerous.”
So I always enjoy just being in the room with Neil Hall, you never know what he’s going to say next; he’s the that sort of perfect subtle humor that has you laughing immediately over three words he’s just said off-handedly.
He chatted with me over dinner at the meat dungeon which was quite a feat at times because of all the excited conversation happening around us. But I managed to ask him about his research and learn a few surprising things in the process…
So Neil by training is a biologist and considers himself a geneticist. Though around 1997 the government institute he was working for ran into an employee ‘reduction’ phase and gave scientists like Neil re-training budgets which they could use to place themselves elsewhere. Neil decided to get a master’s that taught him computing and programming (in Perl!…ya! Perl has yet another supporter), and ended up taking a position at the Sanger Institute which had an incredible amount of resources.
Neil has studied a variety of organisms including Plasmodium, Trypanosomes, Entamoeba, Arcobacter butzleri, Salmonella and Wheat. Yes, wheat a publication that resulted from a carshare (carpooling for you American’s) that he had with a plant scientist. What do you get when you put a plant scientist and a geneticist in a car commuting for multiple days on end? A grant proposal on sequencing the genome of wheat and a Nature publication… Don’t ever underestimate the power of carpooling (carsharing).
Today we explore his work on Plasmodium, specifically his work on Plasmodium chabaudi and Plasmodium berghei which are model malaria species in mice.
Hall et al., 2005. A Comprehensive survey of the Plasmodium life cycle by genomic, transcriptomic, and proteomic analyses. Science. 307: 82.
“Rodent malaria parasite species provide model systems that allow issues to be addressed that are impossible with the human-infectious species Plasmodium falciparum and P. vivax. Three closely related species, P. chabaudi, P. yoelii, and P. berghei, are in common use in the laboratory. Comparative sequencing and analysis of the genomes of such model species, in addition to the complete genome sequence of P. falciparum, provide insights into the evolution of Plasmodium genes and gene families.”
The parasite alternates between morphologically related invasive statges and replicative stages with a single phase of sexual development that mediates transmission between the human host and mosquito vector. They sought to integrate genomic sequence analysis, with transcriptome and (old school) proteomics to categorize protein expression, analyze regulatory mechanisms for genes expression and identification of species-gene families and genes under selective pressure.
What they did:
- Partial shotgun sequencing of the genomes of P. chabaudi and P. berghei.
- Inferred orthologous genes using bidirectional BLAST searching
- Manually examined orthology of gene models…basically the genes were identified based on the P. falciparum genome. They aligned the sequences of 50-100 genes and funneled those known ‘real’ genes into an algorithm so the algorithm could ‘learn’ or rather ‘build’ what it expected a real gene to ‘look like’ based on the known P. falciparum genome. From there they could feel it the data they had and it would predict genes.
“It was imperfect. Back then, sequencing was heinously expensive so you got your data and basically just muscled it, you got ‘in the zone’ and just made it happen, there were times I had 50-100 genome browsers just open and you had to painstakingly go through each one. Back then there were offices filled with people whose job it was to click on a gene and determine manually if that was a gene or not (dramatic pause) *sigh* –hypothetical protein, and then they’d move on. I remember thinking while we were all doing this analysis, the slow painful way, that ‘there’s got to be a better way!’ Now we have programs like Prokka or MAKER that basically do the algorithms we did in our heads back then. I still look at alignments sometimes and can tell splicing sites just by looking at them…we called it ‘Using the Force.”
Don’t get too excited though…Neil is unwilling to open up the NeilSpliceCaller algorithm (NSCalg) and come over to your desk and predict your genes from NGS data alignments…aw shucks.
- Calculated dN/dS across the genome between orthologs of P. chabaudi and P. berghei
- Measured codon volatility
- Transcriptomics from three time points during the young and mature trophozoites and 2 time points from the immature and mature schizonts as well as from purified immature and mature gametocytes.
- Data was analyzed using multidimensional protein identification technology
Ya…I was wondering what that was too…Pub citation: Washburn, MP et al., 2001. Large scale analysis of the yeast proteome by multidimensional protein identification technology. Nature Biotechnol (It uses an algorithm called SEQUEST).
What they got:
- Assemblies of ~17 and 18 MB for P. chabaudi and P. berghei, respectively.
- 4391 genes predicted from the combination of all three rodent parasites (P. chabaudi, P. yoelii, and P. berghei)
- The genes they found/predicted were distributed across the ‘core’ regions of the 14 chromosomes (per the P. falciparum reference genome)
- Using the force resulted in identification of 109 additional orthologs.
- Orthologous genes pairs were under purifying selection
- The genes annotated with the highest dN/dS include many expected to play a role in host-parrasite interactions
- 1836 proteins were detected using the multidimenstional analysis, 136 expressed in at least four of the five stages analyzed.
- Enzymes of the tricarboxylic acid cycle, oxidative phosphorylation, and many other mitochondrial proteins were up-regulated in the gametocyte when compared to the asexual blood stages and were even more abundant in the ookinete
- There was evidence of clear state-specific expression of different members of protein families detected.
- Detected 472 proteins in replicative stages (blood and oocysts)
- Over half of the proteins detected in the proteome analysis were detected only in one stage only, suggesting stage-specific specialization is substantial; however many of these are also strategy-specific and in many cases invasion-specific.
“Often we would detect transcripts in the transcriptome that we couldn’t find in the proteome analysis. What turned out to be happening is that transcripts would be built up in the stage prior to host transfer and not turned into protein until the next stage after they’d transferred hosts. If you think about it it’s in the parasites best interest, as colonizing a new host and ramping up takes quite a bit of time so by having transcripts stored up for the next (invasion) stage they could immediately make protein and continue on their way. This is supported in the posttrascriptional gene silencing…where transcripts are produced only ‘when needed'”
In the paper they call this the ‘transcripts on the go’ model.
All in all…they found that posttrascriptional gene silencing via translational repression is a major mechanism regulating gene expression in Plasmodium. The study allowed insights to be made on genome evolution, expression of multigene families and mechanisms of post transcriptional gene regulation. This work is important as a model system for studying the orthologous features of human malaria parasites.
At one point during our conversation Ander’s came over and went to grab Neil’s beer that he’d just ordered asking if it was his (as he was waiting for his beer still). Neil protectively grabbed the beer saying ‘Careful!’ –Never part a Brit from his beer.
So you’ve worked on a variety of organisms as we all know…which one is your favorite?
- “It would have to be the slime mold Dictyostelium they have to be just about the coolest organism around. They are capable of being unicellular or multicellular and when the cells join up and move they look like a slug. And you can’t do anything with it! It doesn’t culture, can’t store it and it’s one of those most neglected human pathogens in my opinion.” Here’s a super awesome video I found depicting this behavior:
Do you program?
- Badly, yes, I can hack away at python and do some perl but I would by no means call myself a programmer.
Do you have any advice for the biologists attempting to ‘be’ computer scientists as well to accomplish the research in their field?
- “I tried being a ‘hard core programmer’ but realized in the end that it just wasn’t me. I am a biologist, I am a geneticist. So while I think it is important to know and understand that side of your research, to be capable if you need to do it, to have a healthy respect and appreciation for it…I think it’s also important for you to remember: ‘You are a Biologist‘ You don’t have to be a computer scientist but you do need to be able to be independent and competent and able to speak other ‘languages’ if necessary to accomplish your research and work with collaborators.”
3 words or a small phrase to sum up your feelings about your field Comparative genomics/genetics?
“Technologically Advanced Guesswork!“
Who was the driving influence or inspiration in your development as a scientist?
- “Bart Barrell who appears prominently in the history of the Sanger Institute. He worked with Fred Sanger and they sequenced the first PhiX 174 genome. He was a very knowledgeable man who was more than an administrator and grant writer, he did his own research and he was inspiring.”
Tangent from Neil…
“So I’d just been hired by the Sanger institute and had just finished all their training modules about building security and whatnot and I saw this man wandering down the halls, he was an old man and I was wondering is he lost? Should I call security? And I almost did, thankfully I was in my bosses office at the time and he had a picture of Fred Sanger in prominent view which I saw and promptly made the connection. But yes…for a moment there I almost called security on Fred Sanger…at the Sanger Institute.”
So there you have it! Our Neil Hall…sequencer of genomes, friend to parasites the world over and the occasional wheat plant with an amazing breadth of knowledge; we are truly excited to have him here speaking at the workshop!
Any parting words Neil?
“Um…well, I do all this and I’ve got four children; oh and I tweet…my daughter follows me on twitter”
The Pacbio appears to be eating the engineer pic.twitter.com/M61rwbwkyY
— Neil Hall (@neilhall_uk) January 14, 2014
Well done Neil, well done.