Sex, Drugs and Genomics with Neil Hall…

Final Session of Workshop on Genomics 2014 and it started well…with a black screen and a finicky projector, Story via Storify! (PS. this was my first attempt at using Storify…quite fun actually):

You think Daniel was kidding with the whole Sex, Drugs and genomics…he wasn’t, Neil was worried with his talk being the last talk of the workshop, it was after lunch people were tired and what not so he decided to helpfully sum up the ending sections of his talk in case they nodded off…


To learn more about the interpretive dance of Neil Hall and what happened during his first week at the Sanger Institute check out his faculty highlight.

Neil Hall
University of Liverpool

Topic: Genomics of Pathogens: Positive Selection and Disruptive Technology

First off, Neil, as appreciative as he was of Chris’ talk, which came right before his; he felt the need to end the workshop on less of a gloom and doom note as he put it and sought to give us a light at the end of the tunnel. Additionally, he said everything Chris mentioned…his group has done at some point so next time Chris should get it right and mention their work, because if someone did something wrong his lab did it wrong first.

Already we could tell this was going to be a good session…

He mentions Anthony P. Fejes’ blog which is quite good and his particular post of What is a Bioinformatician…and Anthony had his own take. Neil also showed us where he lay on the scheme of bioinformatician…




He mentions that nowadays all sequencing experiments have the same general workflow:



And now science using sequencing can be hypothesis driven.

The Red Queen Hypothesis:

“Now, here you see,” Said the Red Queen to Alice, “it takes all the running you can do to keep in the same place.” ~Lewis Carrol, Through the Looking Glass

In terms of disease and evolution we can talk about polymorphism vs divergence. Polymorphism within a species can be used to generate null hypotheses for divergence between species (ie. Dn/Ds = Pn/Ps; MacDonald-Krieitman). Theoretically positive selection should spread rapidly and become fixed leading to Dn/Ds > Pn/Ps whereas when you have purifying selection you would have Dn/Ds < Pn/Ps.

When you look at Psuedomonas and phage, they will co-evolve over time if you leave them alone long enough in culture because phage adapts to host and hosts then adapt to the phage leading to this back and forth evolutionary dance.

With the advent of next gen sequencing you can now sequence replicate populations to high coverage, especially for bacteria and virus who have orders of magnitude smaller genomes than that of eukaryotes. You can then measure frequencies of variants arising by mutation and subsequently selected and you can ask:

  • What is the rate molecular evolution with and without the the ‘Red Queen’?
  • Diversity and divergence between populations?
  • What genes might be under selection?

In the work of Patternson et al., 2012 looking at hosts and phage, they found with replicates that they followed a similar trajectory away from the ancestral sequence as they adapted to laboratory conditions. Replicates evolved similarly within a treatment but differently in different treatments. They also showed independent evolution within each replicate and a higher rate in the coevolved than evolved treatment. they found positive selection t be stronger in the co-evolved phage and co-evolved phage show twice the genetic distance from the ancestor. They also found phage attachment proteins in the tail contain the most diversity within and between populations and suggest that there is fluctuating selection within populations.

Onto Malaria…

Neil decided to go into some background about NGS and the study of malaria as everyone looked young enough to perhaps not remember the days before next gen sequencing.

  • 300-500 million cases, 1.5-2 million deaths/yr…basically malaria kills a child every 40 seconds.
  • Resistance to chloroquinine is present in almost all endemic countries and resistance is developing to most new antimalarias being used now
  • There is no practical vaccine available
  • Caused by the parasite Plasmodium spp. which is a eukaryotic single celled organism with a complex life cycle.
  • The challenge with the parasite is that is keeps changing to evade the human immune response and so persists for a long time.
  • Cytoadherence and antigenic variation are crucial to pathogenicity in P.falicparum
  • The parasite expressed proteins that are exported to the red blood cell surface (PfEMP, encoded by var genes) they then mediate interaction with host endothelial cells causing cytoadherence. This cytoadherence causes cerebral malaria and can lead to death
  • Var and Rif genes are also antigenic causing immune response essentially pricking the immune system and these genes can change antigenically = antigenic variation.
Before the days of Next Gen Sequencing
Before the days of Next Gen Sequencing

Back then it took a small army to get a genome sequenced.

see all the colored bars? Each one curated by a human being in an exhaustive process.
see all the colored bars? Each one curated by a human being in an exhaustive process.

This particular genome was a collaboration between the Sanger Institute, TIGR, Stanford, Oxford and NMRI each institution taking a chromosome and sequencing it. They used YAC cloning and shotgun sequencing they then aligned the contigs and closed gaps using PCR/primer walking. Painstaking process as compared to what we can do today. Additionally, every gene had to be manually curated and verified so there were cubicles of people at these institutes whose job it was to check ORF calls, examine them against PFAM databases and annotate the gene. They ended up with a smaller genome than they expected…22,853,764 bp; it was AT rich, 52% was coding sequence, it had a minimal set of tRNAs, 90% of the genes had not been previously described and 50% have no homologues. They were able to iron out chromosome structure as well as adhesive domains. The found that all the genes involved in antigenicity and adherence were concentrated around the telomeres of the chromosome structure. Why?

  • Telomeres are hot spots for recombination
  • Telomere’s co-localize in the nucleus and non-homologous recombination can occur
  • Lots of gene conversion can occur + generation of new genes.

Also of interest is P. falicparum has less transcription factors than you might expect given its genome size; either they are highly diverged or other mechanisms play a major role in regulation of gene expression. Why would so few regulators be needed?

targets of drug therapies, vaccines
targets of drug therapies, vaccines

Well, Plasmodium does have a predetermined life cycle, it does not have nutritional choices and lives in a defined, stable and nutrient rich environment; it’s not competing with other species and though it’s environments in switching hosts change or are varied they occur sequentially always in the same order so the parasite knows what is coming next.

They believed Malaria evolved via a Pac Man type of evolution where a eukaryotic cell at a cyanobacteria leading to a chloroplast forming a sort of ancient algae; this construct was then eaten by a bigger eukaryote and the algae forms a symbiotic relationship with the host and over time evolves into an apicoplast, DNA from the apicoplast migrates to the nucleus and you have the modern day Plasmodium. Analysis of the apicoplasst showed ~10% of nuclear genes may be imported into the apicoplast (identified via transit peptide), phylogenetic analysis of 333 putative plastid targeted proteins showed 26 plastid derived, 35 mitochondrial and 85 possibly mitochondrial.

“A range of herbicides that target plastid metabolism of undesired plants are also parasiticidal, making them potential new leads for antimalarial drugs” ~Kalanon and McFadden 2012

So what has the genome achieved?

  • A detailed understanding of the molecular basis antigenic variation
  • A metabolic map for drug target identification
  • A list of genes targeted to the apicoplast a known drug target.

Comparative genomics of Plasmodium:

  • Host jumping is rare
  • Most Plasmodium are highly adapted to a single host.


When we look at the genes further we have 3 types:

  • Homologs: genes that are similar due to related ancestry
  • Orthologs: Homologous genes separated due to a speciation event.
  • Paralogs: Homologous genes separated by a duplication event

For Plasmodium falciprum:

  • reciprocal orthologs were identified using BLAST cutoff of 50
  • 5268 genes identified
  • 4391 have orthologs in other species
  • Another 109 orthologs were identified using synteny
  • 736 have no orthologs
  • There was clustering of telomeric gene families and gene family expansion was species specific (mostly occurring at the telomeres). Some genes which are single copy in one species are multicopy in another and presumably have acquired new functions.
  • They found positive selection on plasmodium genes to be stronger in the mosquito vector via dN/dS analysis.
  • They looked at trends to see if we selection acted differently
  • They looked at different expressions in proteins.
  • MEME: program that IDs motifs then uses that to search back against genome to pull out more genes.

They also found translational control. In general transcription is relatively slow which is highly inconvenient when you change hosts so quickly. But the parasite knows what’s coming next and if you know what is coming up you can prepare for it by storing up transcripts which can be quickly converted to protein upon entering the next environment; therefore P. falciparum regulates its translation so that not all transcripts are translated as soon as they are made.


Population Genomics of Plasmodium

  • They compared 470 genomes 18 strains total from around the world. Measured total variation within species. Diversity is strongly association with host interaction.
  • MalariaGen Project: characterize variation in P. falciparum, establish repository of popoulation genomic data, 1685 samples 25 location 17 countries, map phenotypes to drug resistance.
  • They found minor alleles dominate especially in Africa, you have more low freq alleles
  • Many derived alleles assoc with NS mutation
  • There was a large separation between continents, separation of population is areas of low transmission, and more structure in areas of low transmission, hence less structure endemic areas.


  • Quantification of within host diversity showed an inbreeding coefficient could be estimated via heterzygosity within a host (Fws), this value is probably related to transmission rates and human population distribution and inbreeding can lead to drug resistance.
  • To identify drug resistance loci you can use traditional genetic methods (crosses), allele selection and genome wide studies for selective sweeps. Selective sweeps are not easy to identify but you can do genetic crosses.
  • It’s also worth noting that allele selection which you do with rodent malaria parasites; drug resistant loci in rodent malaria aren’t the same loci you get in human malaria loci.

  • Long range linkage disequilibrium (LD) and low diversity are a marker for drug resistance, but there is a danger in this type of analysis when it comes to population structuring.


Entamoeba histolytica

Neil is a one man soap box for studies on Entamoeba:

“My background is genetics, it’s all I want to do…I want to turn this human pathogen into a model system for studying genetics.”

  • It is a very neglected tropical disease, in fact, if you look it up in Wikipedia in neglected tropical diseases…it’s not even there!  So it’s time to do some genetics from scatch.
  • Simpler life cycle
  • Waterborne disease
  • 80-90% of a people infected are asymptomatic…why?
  • It is a diverse genus of commensals  and some parasites

Findings from the genome that was constructed:

  • Lost a lot of metabolic functions ie. mitochondria
  • Expansion in environmental sensing genes
  • Predatory life cycle has to respond to nutritional changes
  • Has unusual metabolic pathways because we can’t find things it’s supposed to have
  • Lots of HGT with bacteria, not surprising since it eats bacteria.
  • What about virulence? some strains more virulent tha others in model infections

We hypothesize that there is an underlying genetic cause of virulence…no microsatellites and no genetic studies have been performed.

  • Sex is important: recombination of genes, generating new variants, establish into new niches, variation helps with adaptation, spread of traits for drug resistance
  • Lots of sex versus occasional sex versus no sex: every life cycle versus replicate clonally versus clonal epidemics with occasional recombination
    It's been a good meeting for Neil...
    It’s been a good meeting for Neil…
  • You can do genome resequencing of different strains but the results have been modest in comparison with other parasites becuase they don’t grow–difficult to culture. So what they did was basically take everything out of LSTMH freezer and just sequenced them…with very high bootstrap value…

Here’s Neil paused for a moment…he’d been at the workshop for several days and had heard Antonis talk about how you can’t trust bootstrap values and of course Chris who had just spoken before him slamming all the statistical techniques and lack of validation in the field leading to good ‘stories’ which weren’t necessarily correct…to which Neil responded if anyone did it wrong…his lab did it wrong first!

So Neil paused after this comment about bootstrap values and how they were excited they got high values…then kind of sighed and said

“It’s been a good meeting for me…” and much laughter ensued.

Back to some more Sex…

  • They did tests for recombination (4-gametes test) suing 4000 high quality candidate marker sites and found evidence of historical recombination. They further investigated this at the request of a reviewer to show that it was indeed recombination versus gene conversion.


  • The sources of variation in E. histolytica were SNP, gene loss/gain and copy number variation.
  • They looked at genomic plasticity: gene copy number variation which is associated with differential transcription.
  • The sources of intra-specific recombination included: recombination, CNV gene loss/gain
  • These process are affect the emergence and spread of drug resistance and virulence.

Genotyping of field samples:

  • They collected 19 amebic liver aspriates
  • Made 26 xenic cultures from 14 asympotmatic infections and 12 diarrheal infections
  • 20 E. histolytica positives were from diarrheal stool
  • They looked at 21 marker loci using Illumina sequencing
  • SNP genotypes from a single population showed consistency with frequent recombination and that specific SNPs segregate with disease for instance EHI_080100 reference alleles lead to diarrhea/dysentery as opposed to colitis/liver abscess.

So if recombination is occurring…when is it occurring?

  • Used E. invadens as a model for encystation
  • Extracted RNA longitudinally
  • Mapped reads using TopHat
  • Examined expression profiles and clustered using Short Time-Series Expression Miner STEM
  • Looked at differential expression using CuffDiff
  • They found during encystation there was down regulation of genes whereas during excystation there was upregulation and this was confirmed using Northern Analysis
  • Expression levels of genes involved in meiosis increase at 24 hrs after encystation.
  • Phospholipase D expression and activity increased during encystation as well

In Sum:

  • Genomic data support the hypothesis that recombination is occurring Entamoeba.
  • They were able to describe the transcriptional changes during encystation and excystation in E. invadens
  • They saw strong evidence for meiosis occurring during the encystation process.
  • They identified a lipid signally molecule that is involved in induction of encystation.

Some final thoughts:

“Genomics is cool. You don’t need to be superman to do bioinformatics you will need to fully understand your data and use/learn whatever techniques are required to do that but at the end of the day remember…you are a biologist.”

Fantastic end to a fantastic workshop…many Thanks Neil

“…at the end of the day remember…you are a biologist.”