Workshop on Genomics 2014 Faculty Highlight
Mike Zody
The Broad Institute
Chief Sequencing Technologist
Sequencing Guru
Software Engineer
We think at night he puts on a mask and cape and fights crime…
So already many of you are probably looking at the picture I’ve included and wondering how in the world does a chick (chicken) in armor have anything to do with Mike Zody? Excellent question…
By training Mike was as metallurgist (studied metal science) and engineer. In fact his advisor was one of the guys that worked on the Manhattan Project! After finishing his degree there weren’t a ton of jobs requiring skills in metal science so he started looking at software engineering. Most of the software engineering jobs were quite boring on first glance until he got an offer from Whitehead (later Eric Lander’s lab split off and so The Broad was created) who needed a software engineer to help design software and computing infrastructure for genome science. For his Ph.D. one of the topics he worked on was whole genome sequencing of Chickens! Hence, we have a chicken in metal armor depiction. Probably a bit of a stretch but if you’ve been reading this blog at all, it really shouldn’t surprise you that I would post this.
So chickens? Honestly does it surprise you much? Many of you have interacted with him quite a bit over the past couple days and the only question you have left is “Is there anything Mike Zody DOESN’T know? Or HASN’T worked with?” Well, I’ll tell you that he revealed to me that he’s probably not so much up on his bacteriology/microbial knowledge…so now you know his kryptonite.
Back to chickens! When I asked for what research he thought would be good to highlight he sends me: Rubin, Zody et al., 2010. Whole genome sequencing reveals loci under selection during chicken domestication. Nature. So chicken genome science is where we are headed today.
I’m going to quote a bit from the introduction as my chicken science is rusty…
“For most of their history, domestic chicken populations have been bred for two purposes, egg laying and meat production cite. The effective chicken population size must have been huge in the past, before specialized commercial populations were established during the twentieth century, as a large proportion of farms kept a group of chickens interconnected with other groups by trade between regions, countries and continents. This is consistent with the extensive sequence diversity present in domestic chicken (~5 single nucleotide polymorphisms (SNPs) per kilobase (kb) in pairwise comparisons) cite. During the twentieth century, specialized layer and broiler breeds were established to circumvent the inherent conflict in selecting for both growth traits (meat production) and reproductive traits (egg production) in the same bird. This approach, combined with the implementation of modern breeding methods based on quantitative genetics theory, has been extremely successful in improving productivity cite.”
The Master Plan…
They sought to identify common alleles at the polymorphic sites and find selective sweeps that might be shared among specific populations that share the same trait (broiler, egg laying, jungle fowl). Looking at this would allow inferences to be made on how genetic variation shapes phenotypic diversity. The were looking specifically for alleles that might have led to the domestication of chickens as compared to their jungle fowl counterparts and subsequent specialization into the broiler and egg-laying types.
What they Did:
- Massively parallel sequencing using SOLiD technology (this was back in 2010 remember?) It was the latest and greatest then. SOLiD generated 35 bp reads.
- They resequenced birds from 8 populations of domestic chickens as well as from red jungle fowl since they had a sanger generated reference genome for the jungle fowl, from zoo populations (super helpful for quality checking and assembling such short reads!).
- Additionally, they sampled from egg-layer and broiler populations
- Alignment/Assembly using the MAPREADS program allowing for up to three mismatches (including ‘valid adjacent’ changes as a single mismatch) and no indels. Only reads aligning uniquely in the genome were retained.
- SNP identification using fairly stringent criteria/filtering (ie. they ended up eliminating ~40,000 putative SNPs because they could find support for the reference allele and assumed them to be sequencing errors)…Corona Lite pipeline from Life Technologies
- Constructed a distance tree using allele frequencies
- Detected selective sweeps by searching the genome for regions of high degrees of fixation then ruled out genetic drift for each putative region (used pooled sequence data, calculated heterozygosity along autosomes). Then cross-reference these results with others; such as verification of the sweep in additional chicken populations and data on co-localization with major quantitative trait loci (QTLs) as well as differentially expressed genes.
- Further interrogated a putative sweep in the TSHR region as it seemed to be a defining sweep between domesticated and jungle fowl.
- DASher software
- They looked at loss of function regions of the genome (deletions).
What they Found (some pulled directly from text because I didn’t want to re-link the citations manually because well…I’m lazy and their writing was understandable so paraphrasing was less needed):
- The reads covered 92% of the genome (1043 MB), impressive for how short the reads were!
- 90 MB was not covered and was probably either repeat sequences or under-represented sequences in emPCR step.
- They detected 7,453,845 SNP loci (still quite a lot after their stringent filtering)
- Sweep detected on chromosome 1 in a non-coding region upstream of SEMA3A, which encodes semaphorin 3A, an axon guidance molecule with an essential role in brain development cite.
- Sweep detected in a non-coding region 160 kb upstream of the gene for V-set and transmembrane-domain-containing protein 2A (VSTM2A), which is a predicted target-SNARE gene on chromosome 2.
- Sweep detected in the locus encoding thyroid stimulating hormone receptor (TSHR) on chromosome 5.
- TSHR because it had the lowest ZHp (sweep) score (-9.2) and because of the well-established biological significance of TSHR signalling for metabolic regulation and reproduction and showed almost complete fixation over a 40-kb region.
- Every domestic chicken tested, representing commercial as well as local populations, carried at least one copy of the sweep haplotype of TSHR.
- TSHR may be a domestication locus in chicken, that is, a locus where essentially all individuals of a domesticated species carry a mutant allele.
- They saw the TSHR sweep at an intermediate frequency in red jungle fowl representing zoo populations and believed that this was most likely due to the fact that many zoo populations have a history of some hybridization with domestic chicken.
- They found non-conservative amino-acid substitution, namely glycine to arginine at residue 558 which is thought to have been the sweep target for TSHR.
- A bioinformatic analysis using DASher cite indicated that the glycine-to-arginine substitution pushes this residue outwards from the membrane and may therefore influence ligand interaction.
- Other genes looked at: insulin-like growth factor 1 (IGF1), insulin signalling, insulin receptor (INSR) affecting growth traits and with a central role for , the TBC1D1 (TBC1 (tre-2/USP6, BUB2, cdc16) domain family).
- They found 1,300 deletions fixed or close to fixation in at least one population.
- They couldn’t link deletions with chicken domestication from jungle fowl. However, they were able to link a deletion with the split between egg layers and broilers within domesticated fowl. The deletion in the growth hormone receptor (GHR) gene has previously been reported to be a causative mutation for sex-linked dwarfism cite and has been used in some commercial broiler lines to reduce growth and feed consumption in parental lines.
- A novel deletion that removes all but the first exon of the gene SH3RF2 (SH3 domain containing ring finger 2). SH3RF2 lies within a QTL region for body weight.
- Analysis of 400 birds revealed a highly significant association between the presence of the deletion and increased growth (P < 0.001).
- Additionally, expression analysis revealed SH3RF2 expression in the low growth line but not in the high growth line, which is expected because the latter is fixed for the deletion. The result was of interest as it is well established that chickens from the high growth line have a genetic defect in hypothalamic appetite regulation.
What does it all mean!!???…
“The present study casts light on the genetic basis of domestication, but also has implications both for the use of chicken as a model organism for biomedical research and for the application of genomics to practical chicken breeding. The successful outcome of this approach suggests that it should be applied to other domesticated species as well as to natural populations, where it may reveal the genetic basis for rapid evolutionary adaptations.”
…and Chickens are pretty cool.
The following is my paraphrasing of Mike’s and my conversation at Mastal at dinner one night, not direct quotes.
So…chickens huh? I always thought you were on the human genome/viral side; what role did you play in this manuscript?
- “Funny, ya…this was actually the last paper in my Ph.D. process in Uppsala with Carl-Johan Rubin. I’ve been at Whitehead and Broad (now just The Broad) for 12 years. I also have papers from my thesis on Humans, chimps and a chicken paper. I’ve also worked on dog, possum, horse and the late blight of potatoes (Irish potato famine) pathogen (Phytophthora infestans).” *Going through Mels head*: I secretly wonder if Mike curates all the pages on Wikipedia…phytophrhora?!
So this study was published in 2010, it’s using SOLiD technology which is now all but obsolete. I noticed you had quite low coverage and super short reads though you did have the benefit of a reference genome (generated via Sanger). If you were to redo this study ‘today’ what would you change?
- “Well, now you can definitely generate a lot more data. It’s funny because we did a lot of things in the study that I tell students now not to do; such as we had too low of coverage and we pooled samples without barcoding. We did have a large pool of chromosomes though and with SOLiD’s di-base correction our read were very accurate (99.9+%). So what would I change? I would barcode individuals for redundancy, get at least 4x coverage per individual, have more sequences and use an imputation algorithm that would help in identifying patterns of co-localized variation; but that sort of algorithm is only useful if you have deeper coverage and tagging or barcoding of individuals.”
Do you have a favorite programming language?
- “I actually prefer Perl (Perl finally has a champion at the workshop!) but I have tried python, and worked in C and java. For biologists I recommend Perl because it’s so good at text file manipulation. When you think of the debate between Python and Perl it’s really like the preference between Mac and PC. They both get the job done, it’s just a matter of preference and what works well for you and the way you program. And R is great for numerical analysis.”
What’s your advice for new bioinformaticists?
- “As scary as it might sound you really need to learn a programming language. The easiest way to start is to find a problem you are really passionate about and work on it. You are dealing with large datasets so while it’s important to know what is going on in the lab you really need to know how to manipulate large datasets and programming will help you do that.”
“There is no problem in programming that can’t be solved with an additional layer of abstraction except the problem of too much abstraction” ~Ted Sharpe
What or inspired you in your career in sequencing and genomics?
- “There wasn’t really anyone in the field when I arrived into it given it was so new so there wasn’t really any one person that perhaps influenced me in that way. It was more my colleagues at The Broad who were really great and perhaps the defining moment for me was the human genome project. The human genome project DROVE me. It was exciting, it was challenging, we had competition with Celera, and many times we didn’t know from day to day if it was going to work or completely fail. A lot of adrenaline was flowing during this human genome project. People were personally invested and when the project left many of them left as well because it was like…’Ok, what’s left? We succeeded in sequencing the human genome, what is there to compare to that in the future?'”
Funny you should mention the competition involved with Celera in the Human Genome Project…I’ve heard the race for the genome compared to the Space Race…what do you think about that?
- “Ya I’ve heard that too, Eric Lander also drew that comparison with the ‘moon race’ and the Apollo program; sort of a genetic space race…but I would disagree. I feel like the Human Genome Project was more like the Manahattan Project…though with hopefully better future implications and ramifications. It was intensely multidisciplinary and could’ve never been accomplished without being so multidisciplinary (like the Manhattan project) and everyone was actively involved, did it. Additionally, it shares the commonality that once it’s done you can never go back from that. We can never undo the atomic bomb unfortunately and we can never ‘not’ have sequenced the human genome. The comparison stops hopefully in the ramifications of the projects in that the Human Genome Project has spawned many other great projects and continues to be a positive scientific force. It changed biology and medicine for the better, it changed science in the 21st century.”
Are you still involved in the Human Genome Project?
- Less and less. Occasionally I get an email regarding some curation efforts but for the most part I am not really involved anymore.
Lastly, three words or short phrase to sum up your field of Sequencing…
Continuously changing technology…
So be ready everyone, technology is only going to get more involved from here with the advent of single molecule and single cell work that is going on right now. Thanks Mike for joining us at the Workshop on Genomics it’s always mind blowing to hear what you’ve been working on and what you will do next and I’m sure you will continue to surprise for many years to come…
…Dr. Mel