Our Modern Day John Snow teaches us about cool crap

JOHN SNOW, considered the father of epidemiology

So Nick Loman probably needs no introduction, at least not to his 4,919 followers on twitter! To learn more about Nick head over to his faculty highlight…today we will focus on his presentation.

Nick Loman
University of Birmingham

Topic: Clinical Metagenomics

So Nick started us off with a battle of definitions as to what exactly metagenomics is and is not. Ie. 16S rRNA surveys are not metagenomics and he highlighted two blog posts saying as much…from Dr. Mick Watson and Dr. Johnathan Eisen.

When talking about 16S versus Metagenomics we can look at a few of the pros and cons.

16S offers cheaper sequencing, targeting of a single marker, it’s pretty much limited to bacteria but it’s easy to analyze, there are lots of known biases which can then be accounted for, you can do taxonomic assignment at the species level is problemmatic though as well as only being able to infer function, but you can go ‘deeper’ in terms of sequencing for your money.
Metagenomics is more expensive and expense will depend on how deep you want to sequences or how many samples, in theory it can detect anything/everything though it’s harder to analyze may also have it’s own biases and is overall a shallower analysis as you wouldn’t be able to go as deep as with a single marker. You can obtain strain level identification and functional information is directly accessible though so those are things to consider as well.

If you really get into this debate it turns into a somewhat religious experience as there are factions on both sides quite passionate about their stance.

So as with so many other talks lets get some pesky definitions out of the way so we are all on the same page:

Metagenomics: The collection of genomes and genes from the members of a microbiota

Microbiota: The assemblage of microorganisms present in a defined environment

Microbiome: This term refers to the entire habitat, including the microorganisms, their genomes (i.e., genes) and the surrounding environmental conditions

Same page…done,

So what are we trying to accomplish with Metagenomics?

Well we are attempting to find out:

Who is there? This would involve taxonomic assignment.
What they are doing? Functional analysis that also encompasses ‘what are they capable of doing?’
Whose doing what (and to whom)? Genomic reconstruction.

Functional signatures are not the same as taxonomic signatures (as Rob Knight showed so nicely in his talk).

So lets jump into some clinical microbiology so you can appreciate how much crap goes into processing, culturing, analyzing and identifying samples (no pun intended…)

…who am I kidding, pun totally intended.

and compare it with the ‘new’ digital version of clinical microbiology and identification

So lets jump into a story of Psuedomonas aeruginosa and burn patients…or rather burn treatment centers within hospitals.

Gram negative bacterium
Opportunistic pathogen in burns patients
Infection can lead to graft breakdown and sepsis
Isolated from 30% of burns patients
The sources of infection include endogenous, cross infection and the one we’ll focus on, water.
Several outbreaks have been linked to water and this causes a problem for burn patients in particular because an integral part of their treatment is showering.
So they set up a surveillance study to determine the relative contribution of endogenous, cross infection and water in infections of burn patients and they used whole genome sequencing to accomplish this and infer links between positive isolates.
They found 3 main sequence types one of which was affiliated with the water samples

This is where Nick goes all John Snow on us…they mapped the piping in the Burn Unit rooms (BCU) with the sequences found within ST395

They used a metagenomics protocol, 1 MiSeq run and got the Pseudomonas genome back at 5x coverage. They followed up with assembly and a SNP calling pipeline. They created a FastTree, called variants with low coverage data and placed variants on reference tree with the pplacer program.
In Sum…it was pretty cool

Nick followed up with another study on the German E. coli outbreak of 2011, however I am not going to go into exahaustive detail as this paper was covered in his faculty highlight so head over there for the rundown. What I will mention though is the amount of crowd sourcing that went into diagnosis and analysis of this outbreak and Nick highlights that nicely in his presentation.

Nick posted a blog linking to Ion Torrent data and asked crowd sourcing the analysis of it.
Mike the Mad Biologist followed up with his own blog post on the outbreak:

And…this was followed up by a compilation of blog posts about the outbreak which are listed in Nick’s presentation.

From here the students had a lot of questions with respect to Metagenomics and Nick structured the rests of his talk in an effort to address all those questions so I will structure the download as well as well…

First of all here’s the workflow:

What are the best ways to address getting representation of bacteria, viruses, fungi and others? Techniques for doing so?

First of all, human contamination is a real issue. But depending on your technique how ‘big’ of an issue it is remains to be seen. For instance, depending on the consistency of your stool you might actually have fairly low human contamination in the sequencing data…

Programmatically you can also do a series of ‘filtering’ steps similiar to what they did in the E. coli study to assist in narrowing down your population of interest.

One thing to also look out for is when you have low input DNA, because it is possible to sequencing the ‘KitOme’ rather than the organisms of interest. They found this when they were attempting to understand the metagenomic implications of chronic osteomyelitis.

So contamination is a very real possibility, consider your: sample collection, DNA extraction, PCR reagents and possible lab contamination. Some of the ways you can control for contamination is using techniques that either enrich your sample or deplete the ‘background’ that you are not interested in. There are kits that do this but each one has it’s pros and cons.

What are the best analysis pipelines for full viral sequencing to detect whether mutations are true or not? Comparing closely related taxa?

As an initial approach, should one try 16s sequencing prior to shotgun sequencing if interested in bacteria (or 18s/ITS1 for Fungi)? Which region?
Shotgun metagenomics versus single cell genomics – for high diversity samples is a shift toward single cell beneficial?

My general feeling is 16S initial surveys are fine but if you have the resources and want to try metagenomics, I say give it a go!

So what about bacterial single cell genomics? This is still very new and there are several considerations you need to think about when doing this type of experiment/analysis: your samples, your replicates, extraction methods, background contamination, controls, do you want to work in longitudinal or cross-sectional ‘space’?, what sequencing technology do you wan to use which in turn will affect your read length, and of course how many reads will work for the analysis. ALL of this will depend the goal or question you are attempting to answer in your research.

Any expertise in microbial or viral single cell genomics? Software suggestions for assembling viral genomes and large scale microbial genome comparison?

Metatranscriptomics versus metagenomics?

Be sure to think about your functional assignments or taxonomy. Because the suite of techniques are going to change depending on what exactly you want to look at. For instance…Read-based, Environmental gene tag or contig-based (from assembly), Pathway-based, Genome-based (from a great assembly)…do you want to look at similarity, phylogeny or composition?

Programs we use: Ray, Velvet, IBDA-UD

Benefits/disadvantages of each?

In terms of similarity methods:

Choice of aligner determines speed
Can make inference from a single read
Can work in translated BLAST mode
Very sensitive
BLAST is very slow!
Specificity can be low with naïve approach
Many reads are taxonomically uninformative or taxonomically misleading
We use MEGAN in our work…

In terms of Phylogeny methods:

Accurate
Give measurement of uncertainty
Can use multiple conserved marker genes (~40 common)
Slow
Dependent on quality and completeness of reference tree
Prokaryo-centric
Possible programs: PhyloSift, mOTUs, MLTreeMap

In terms of Composition:

We use Metaphlan…it’s a hybrid similarity/phylogenetic method

Best tools for annotation?

You can call genes or ORFs using Prokka (bacteria) or MAKER (eukaryotes) mentioned earlier in this workshop. Other ways to make functional assignments is using programs like: SEED, COGs, EggNOG, or KEGG.

More advanced programs like HUMAnN or LefSe are quite useful though HUMANn is restricted to well…Humans.

Another tool that’s been developed is called CONCOCT: Clustering contigs on coverage and composition. This program performs coassembly across all samples, map reads back to contigs to get mean coverage of contig in each sample, generate kmer frequency vector for each contig, join vectors and log transform, as well as does PCA.

Thoughts on combining methodologies – i.e. PacBio sequencing for scaffolding and Illumina/454 for depth/decreased error?

Nick’s thoughts? We;ll let the slides speak for themselves…

Now to be fair, Neil was the instigator of CrapBio and you can read all about it on the blog post cited in the slide and I enourage you to look at the comments, as apparently CAKseq made a bid for legal action and inquiry into CrapBio so the debate rages on…

Nick has provided us a valuable look into the world of clinical metagenomics and pathogen discovery and I look forward to what till come out of his lab and his twitter account in the near future.

By the way…I encourage you to follow Nick’s twitter @pathogenomenick, he always has highly entertaining

@pathogenomenick “make ’em barf”?

— Mike Cox (@MikeyJ) January 23, 2014

and scienficially relevant things to say and I’ve enjoyed it thus far:

SNP genotyping from very low coverage samples can be very accurate if you have well-characterised set of reference SNPs.

— Nick Loman (@pathogenomenick) January 22, 2014

In the time it’s taken me to write this blog Nick gained 3 more followers…woot woot.

Feel free to click follow…he’ll be very pleased indeed…right cultured culture?

…Dr. Mel

Share this: