The glories of the gut: QIIME and the prospect for Zer0-molecule genomics

Before I launch into the second half of Rob’s morning session…I forgot to mention his work with the Earth microbiome project (EMP). They were looking at extending this microbiome vision to the entire microbial world, not just the gut or humans in general. What principles can we derive that cross systems and scales? EMP has now provided a framework for analysis of natural samples that will allow for ecological insights and they’ve already processed thousands of samples with more pledged.

There have been a lot of pilot studies for the EMP, one in Yellowstone Nat’l Park (that’s where I did my Ph.D. work!)…but also in Yucatan, Mexico; vineyards (Merlot microbiome…now I could get behind that! Mmmm…wine); Moorea in Polynesia; coniferous forests, the Great Indian Desert, Antartica, ant microbiome, iguana microbiome…studies on the deep water horizon affects…which was more in the sediment than in the water column interestingly enough; and the list goes on…

This will end up the most ultimate kind of ‘source tracking’ and will allow us to look at ecosystems affects. It really is a need project. To learn more go here.

Onto Part II: QIIME

Rob started out by listing some questions that humans have always asked about life:

  • What kinds of animals are there? (Taxonomy)
  • How many kinds of animals can I find where I hunt? (alpha diversity)
  • How different is one place from another? (beta diversity)
  • Who can I eat and who can eat me? (interactions)

So lets expand a bit on alpha and beta diversity:

  • Alpha diversity: “How many species are in a sample” for example are polluted environments less diverse than pristine environments? For a qualitative analysis you would ask about presence versus absence only; whereas for quantitative you would look at ‘eveness’, anotherwards who ‘dominates’ or everyone there in equal abundance.
  • Beta diversity: “How many species are shared between samples?” for example Does the microbiota differ with different disease states? This is a metric for comparisons. For a qualitative analysis you would ask how many species are shared among different samples; whereas for quantitative you determine that samples are more similar if they contain the same species which are numerically dominant verus rare.

Rob went on to talk a little about sequencing though he did not go into great detail as we’ve already been inundated with knowledge about the sequencing market already. Though he did see fit to enlighten us on sequencing technologies to look out for that had most of the auditorium in stitches from laughing so hard…





Ok back to being serious…All this new sequencing, single genes, genomes, shotgun studies are capable of producing a lot of abundance tables which can be used for downstream analysis either via 16S data or shotgun data (alpha/beta diversity, phylotypes, statistics, metabolic networking etc). And with all this sequencing we have -Omics studies coming out of everywhere though he showed a slide that gives us some hope that the Omics terms are tapering off perhaps. The true omics studies are: genomics, metagenomics, metatranscriptomics, metaproteomics, metabolomics…these are the baseline types of Omics work and they a produce essentially the same types of tables. See Daniel’s paper in GigaScience for more Omics madness that has been going on and the introduction of standardization via BIOM.

One of the key challenges in what we do is relating all the different microbial communities. Squinting at pie charts is rough, phylogeny or taxonomy are easier but when we talk about phylogenetic versus taxonomic measures taxonomy really does incorporate phylogeny…it assumes a star phylogeny which isn’t a correct assumption when looking at natural populations. So it is actually more accurate to look at distance measures via phylogenetic analysis. Phylogenies are both a blessing and a curse, they can be super informative however when the numbers of sequences/species analyzed goes up they get more and more difficult to interpret or see the similarities in. Additionally, when your are looking at tree after tree after tree in the manuscripts you are reading…eventually this happens:


So this sets the stage for a new type of analysis that perhaps will help keep our heads from spontaneously exploding…

What you do is calculate a distance metric and draw relational information from many samples on one tree using phylogenetic information. From there you build a distance matrix, do hierarchical clustering and look at PCoA plots.


When they used this technique over environmental habitats they found a major split between saline and non-saline environments. Additonally, they found that environments we would typically think as outliers due to their extreme nature actually weren’t outliers. In fact the true outliers were environments like the vertebrate gut.

When next gen sequencing came about they looked at how much resolution they could get with a shorter segment of the 16S rRNA as opposed to having to use the whole gene and they found indeed you could get the same results using a smaller fragment as with the whole gene as long as you were assaying the right fragment.

The goal of QIIME was to develop a framework and resolve some of the technical challenges associated with next gen data.

“Wrap ‘name-brand’ algorithms”

The workflow includes: error correction allowing for precise demultiplexing, denoising, looking for chimeras (offers a few tools like UChime, Perseus, or CSlayer), OTU picking…as well as many other useful functionalities. 16S may not be the whole story…but it’s more than you might think.

One of the cautions in NGS that Rob makes is the labeling system. You really must move to move sophisticated standardized labeling because mistakes in labeling can lead to limited/no interpretable ability and or incorrect interpretations.

QIIME has been adapted/run on the cloud as well which is a big plus for laboratories wanting to run such analyses but don’t have the computing or electricity infrastructure to support having servers or even computers running without interruption.

In summary…QIIME is pretty damn cool and we’ll learn more about it this coming week.

Q/A Session:

What separates the American Gut Project or rather keeps it from falling into the ‘hole’ that 23andMe did?

  • The American Gut Project doesn’t claim health benefit or compete with FDA approved tests. They are clear on what their project can and cannot do. Do it out of interest and scientific knowledge rather than health (so a similar ‘motto’ or framework of National Geographic scientific studies). It is explicitly clear in project. Future directions would love to go in the direction of diagnosis but a lot of things that need to occur first not the least of which is improving technology and analysis. For their project also got specific IRB approval to sample kids… which had to give assent along with guardian. It was made clear what data could and couldn’t be released such as anything that wasn’t identifiable could be released. They reiterate several times to those involved in the project that the AGP was not a medical test.

Think we should do a microbiome study of Cesky Krumlov participants before and after the workshop to see what the change in location/diet does to our microbiomes? Perhaps how inordinate amounts of alcohol consumption might affect our microbiomes?

  • That would be interesting. Travel and changing diets on the microbiome and the affects of alcohol on microbiome; though it’s difficult to get approval for such things. Perhaps in a future iteration of the workshop. We would need a baseline of a few days first though. There have been more ‘personal’ studies done in this area, our group being one of them and there is a pretty large difference, although for their group they also had the misfortune of all contracting an illness during trip as well. Another study (unpublished) in Thailand found similar changes.

How probable is targeting the microbiome for disease interventions?

  • Just in the last few years…there has been work on cardiovascular disease/microbiome link (oxidation of dietary amines), colon cancer/microbiome (distal cancers; diffusion of metabolites in the blood stream), multiple sclerosis demyelations  and microbiome implications, neurological diseases –plausible to target microbiota, as well as drugs that are being developed with the microbiome in mind. Much is still unpublished at the moment.

Has any work been done on gut microeukaryotes?

  • Yeast using ITS fungal primers, a survey was done on the global gut but Gates Foundation restrictions don’t allow publishing at the moment. Fungi and viruses are definitely understudied do to challenges in picking up them using general primers.

How do you set up a good scientific crowdsourcing project? 

  • Completely naively. We underestimated the amount of software development that would have to occur. We overestimated the ability to reuse stuff from previous projects. The average project gets 1-10K. So we were expecting maybe a few hundred interested parties…we got thousands! I think it’s important to have a compelling project where people will get something out of it. You need to walk a fine line on what people are going to get and promises. NO diagnosis out of AG! that’s where 23andMe had issues. You will have to put up with a lot of stupid questions. You have to come to grips with the state of technology and make sure your ‘investors’ are aware as well of what you can and cannot do. If we were starting over, we’d pay more attention to timing, of when people sign up and when they get results. Logistics sending tubes, certificates paper, small things consume a lot of time. It’s amazing the little things that come up that you don’t think about it when you first set up the project…all the ‘little logistics’. Perhaps try to get a company to fulfill your administration end as much as possible; though that may also get difficult with the stringency of personal identifying information. Be sure to not only have something you deliver back to participants but also making sure it is understandible!

How much support from your institute?

  • ZERO, we had to pay them to do the project. In general it was a ‘go ahead and do it’ attitude and then we would wait to see if the university complained because they offered little to no guidance/involvement and many time their input only served to make things more confusing or difficult anyway.

Any anecdotes to share?

  • It’s amazing that no matter how simple the instructions, someone will inevitably do it wrong.
  • Somone told us the dog had eaten their swab after they sampled…we asked for proof they sent a picture of the dog eating the swab. We sent them a new kit.
  • Someone didn’t realize computers were involved so they wanted a refund.
  • Some didn’t realize we needed a swab sample…what did they think? We do it magically?
  • Be prepared to give a fair amount of refunds.
  • There will be some ‘crazies’.

Is your sampling biased?

  • Absolutely, which is why we can’t go the medical advisement route just yet, but we can use this for general conditions and as proof of concept, preliminary data to inform studies on cohorts etc. to test hypotheses. It is NOT a universal control group but then you can’t really have a universal control group either as there really is no ‘reference microbiome’. In fact obtaining a reference microbiome to represent all mankind like we’ve tried to do with the human genome will probably never happen.

The slide set is too big for me to load on the website at the moment, so I am figuring that out, it’ll go up as soon as I can get it up. Adam has offered me some options so I will be working on that this weekend/coming week.

Many Thanks to Rob and his amazing experience and insight into metagenomics and microbiome analysis.

…Dr. Mel