Winning Essays - Evolution and Genomics

Five travel awards were made to participants of the 2012 Workshop on Genomics in Český Krumlov. Awards were selected using a competition where participants were asked to submit a short essay describing how the advent of modern sequencing technologies altered their research program. These essays were reviewed by Workshop faculty and awards provided to the five essays which are all displayed below.

[hr]

Natalia Bayona, Instituto de Ciencias del Mar y Limnología, Mexico

Serap Gonen, The Roslin Institute, University of Edinburgh, UK

Ulrich Knief, Max Planck Institute for Ornithology, Germany

Mandar Phatak, Tata Institute of Fundamental Research, India

Kim Steige, Department of Evolutionary Biology, Evolutionary Biology Center (EBC), Uppsala University, Sweden

[hr]

Natalia Bayona

Instituto de Ciencias del Mar y Limnología, Mexico

When I started in the genetics field in early 2000, I found myself in a magnificent area of study. I was astonished how the DNA molecule could help ecologists to resolve simple questions about populations and species processes, and I was able to research the role that genetics play in some physiological mechanisms. A big window had opened in front of my eyes, and it continues to be this way. It is amazing how in less than a decade, the technology, hand in hand with computational advances has revolutionized the way we see the world and the way we do research.

I should mention some revolutionary historical developments that have played a fundamental role in molecular genetics since the recognized experiments conducted by Gregor Mendel in the late 1800’s. In the twenty century this resulted in a big boom in what is commonly named classical genetics, where the model of heredity was studied using phenotypic data (chromosomes, morphological and protein characters), leading to what Warren Weaver termed molecular biology.

The study of the variation in the DNA molecule itself was facilitated by research that was conducted by several scientists, including the Avery-MacLeod-McCarty experiments that permitted the isolation of DNA molecules and demonstrating that DNA carries the heredity material. Watson & Crick subsequently described the molecule in 1953.

In the 1970’s and 1980’s, two revolutionary ideas gave place to the DNA-molecular variance era. The first was by Maxam-Gilbert and Sanger, and described the DNA sequencing methods. The second was by Dr. Kary Mullis in 1986, which improved the Polymerase Chain Reaction (PCR), a technique that significantly changed the genetics-scientific community and changed what they used to do and think about molecular ecology as discipline. These ideas allowed scientists to develop and implement new PCR-based techniques for the study of molecular variance and also to overcome other challenges like the selection, isolation, and analyses of variable and neutral markers. Since these discoveries, the generation of genetic information has been enormous and has shaped the study of model-organisms such as Drosophila throughout the years.

During my undergraduate studies, I was involved in the population genetics area, but I did not have to deal with DNA isolation problems because there were kits that made my job easier. Nor did I have to use handcrafted techniques to conduct a PCR because there were thermal cyclers. Thus, advances in DNA technologies have driven progress in molecular ecology.

In those years, the genomic era was mainly focused on the recently finished Human Genome Project. Thus, for me the genomics and generally all the “omics” projects, were distant objectives that only could be reached by developed countries, researchers with money and particularly projects focused on model and/or attractive organisms (as humans). Therefore, thinking about genomic projects in non-model species was a utopia dream for me.

Even though the sequencing methods based on real time pyrophosphate chemistry were published in 1998, the first Next-Generation Sequencing commercial platform was released in 2005 by Roche®, the 454. I discovered these techniques two years ago. In 2009 during an internship for microsatellite isolation using genomic enriched libraries, someone that worked in genetics asked me why I did not use new sequencing technologies to find the microsatellite loci.

That was the first time I had heard about these techniques. Since then I started to read papers about the use of those methods for population genetics studies, which is my main interest. I feel lucky because during the two months of my internship abroad, there was a two-day course by a postdoctoral fellow who had worked with pyrosequencing and microsatellite discovery from 454 Roche platforms. He talked about the available platforms, costs, performance and formats, and he gave some examples of his work. Additionally, the bibliography that I found was about the chemistry of the technique and applications like SNP and microsatellite discovery. All that information allowed me to understand a multitude of new approaches available to address the genetics of adaptation and ecological interactions in natural populations, not just in model organisms.

Thanks to this new, revolutionary era, it is possible to obtain thousands of new markers across almost any genome of interest in a single step, even in populations with little or no genetic information, and it is available at reasonable costs, a feature that has big repercussions in laboratories that are small and/or have limited funds. We are being introduced in the “Ecological Genomics” age, wherein it is feasible to acquire genome scale insights into natural variation through comparison of individual, population and species genomes. We are living in the ideal time of population genomics, which Luikart in 2003 described “capable to cover hundreds of polymorphic markers that cover the entire genome in a single, simple and reliable experiment”. Let’s take part of this new revolution, let’s think again on what we can do, how we should do it, and its scientific implications.

[hr]

Serap Gonen

The Roslin Institute, University of Edinburgh, UK

Modern sequencing technologies, also known as next and third generation sequencing technologies have revolutionised all areas of genomic research. Their ability to rapidly sequence the same region many times in parallel, combined with the continued decrease in sequencing cost has meant that they are becoming the preferred method of sequencing DNA (Hurd and Nelson, 2009). The different platforms used for next-generation sequencing (NGS) means that different methods could be used depending on the needs of the experiment. For example, in de novo genome assembly the 454 pyrosequencer may be the best platform to use since it gives longer read lengths (Metzker , 2010) relative to Illumina platforms, which may be preferred when looking at small areas of the genome at high coverage (Willing et al., 2011). If your DNA sample is rare or a longer read length is required, a third generation sequencing platform may be desired, since this provides single molecule sequencing without the need for amplifying DNA using PCR reactions, which can introduce errors (Metzker, 2010). However, due to the stepwise change in the scale of data produced, new and evolving computing and bioinformatics tools are required for their analysis (Pop and Salzberg, 2008).

The focus of my PhD research project is the genomic basis of resistance to disease in Atlantic salmon, a species that is farmed globally, and therefore is important economically (Moen et al., 2009). The two viral diseases I am studying are infectious pancreatic necrosis (IPN) and pancreas disease (PD), both of which cause high levels of mortality in aquaculture. Studies show that resistance to both viruses is heritable, therefore selecting for resistant individuals is possible (Houston et al., 2008; Norris et al., 2008). A resistance phenotype is only clear after a fish has been infected, after which it cannot be used for selective breeding (Moen et al., 2009). One way of overcoming this problem is to use marker assisted selection (MAS). This requires identifying genetic markers that can be used across populations to predict the response of an individual to an infection (Gilbey et al., 2006). It is now possible to use NGS technologies to identify/genotype marker alleles to use in MAS, even when there is no reference genome as is the case with Atlantic salmon (Willing et al., 2011). The most favoured markers to use are single nucleotide polymorphisms (SNPs), since they are the most frequent type of polymorphism, can be found in exonic sequences (and therefore potentially associated directly with genetic mutations causing a phenotype) and are biallelic such that marker- trait associations can be consistent across populations (Hayes et al., 2007).

Identifying markers to use in whole genome scans is a difficult task in itself. Doing this for a partially tetraploid genome where markers can be diploid or polyploid in state in any one individual is an even greater challenge (Hohenlohe et al., 2011; Hayes et al., 2007). Atlantic salmon and their relatives are descendents of an ancestral tetraploid. The genome duplication event is thought to have taken place 25-‐125 million years ago. The genome is in the process of reverting back to a diploid state; however areas of residual tetraploidy remain (Houston et al., 2008).

In our group’s research thousands of Atlantic salmon genetic markers have been identified and genotyped using NGS. Combining old techniques in genome characterisation such as the use of restriction enzymes with NGS has been particularly valuable in identifying novel markers such as single nucleotide polymorphisms (SNPs) to use in genome scans (Baird et al., 2008). The method of marker identification/genotyping we used is restriction site associated DNA marker (RAD) sequencing. This technique involves sequencing sites around a restriction site, often using the Illumina platform for high read coverage (Hohenlohe et al., 2011). Since the site for a restriction enzyme cut is the same in a population of genomes (assuming no polymorphisms which change the cut site occur), you sequence from the same site many times in a single sequencer run, and this sequence depth increases calling accuracy. This is the reason for the improvement in SNP detection and calling (Baird et al., 2008). Paired-end RAD‐sequencing has also been applied to provide a mini-contig of sequence data associated with each SNP, which aids further SNP discovery and comparative genomics. This is done by randomly shearing fragments produced after a restriction enzyme digestion within a certain size range, and then sequencing from both ends of the resulting fragment using the Illumina sequencer (Willing et al., 2011).

As part of my PhD project, these genetic markers identified by NGS technologies will now be used to construct linkage maps which can be used in whole genome scans, for example to identify sets of marker alleles which are thought to co‐segregate with a trait at the family or population level. These sets of markers identify areas of the genome known at quantitative trait loci (QTL), which are thought to harbour genes which influence the phenotype of the trait under study (Gilbey et al., 2006). A few sparse genetic maps exist for the Atlantic salmon genome; many of the markers in the maps are microsatellites (Gilbey et al., 2004). Although they do have their advantages, the two main disadvantages of microsatellites are that they are not dense enough to narrow down the region at which the QTL is detected, and extrapolation of marker alleles identifying QTLs across populations is not possible since they are highly polymorphic (Hayes et al., 2007).

In our group, we have already identified a set of microsatellite markers that flag up an area of the genome thought to contain genes influencing resistance to IPN (Houston et al., 2008). My project will narrow down this region using new SNP markers generated from RAD‐sequencing and identify any candidate genes from this region that may be influencing resistance to IPN in Atlantic salmon. Generation of a dense SNP linkage map using old and new SNPs will be very useful in future high-resolution QTL analyses, genomic selection and population studies (Gilbey et al., 2006).

In conclusion, NGS technologies have not only been reducing the time and effort required in sequencing genomes, but are also useful in new marker identification and de novo sequencing of non-model genomes. NGS has been particularly useful in my line of research especially in identifying new SNP markers to use in dense SNP linkage mapping to identify QTLs and candidate genes influencing resistance against diseases in Atlantic salmon. The end‐result of this research project will be improved knowledge of the genes involved in the response of salmon to viral disease, and genetic marker tests for improved disease resistance in aquaculture populations.

References:

Baird, N. A., Etter, P. D., Atwood, T. S., Currey, M. C., Shiver, A. L., Lewis, Z. A., Selker, E. U., Cresko, W. A. and Johnson, E. A., 2008. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS ONE. 3.

Gilbey, J., Verspoor, E., McLay, A. and Houlihan, D., 2004. A microsatellite linkage map for Atlantic salmon (Salmo salar). Animal Genetics. 35.

Gilbey, J., Verspoor, E., Mo, T. A., Sterud, E., Olstad, K., Hytterød, S., Jones, C. and Noble, L., 2006. Identification of genetic markers associated with Gyrodactylus salaries resistance in Atlantic salmon Salmo salar. Diseases of Aquatic Organisms. 71.

Hayes, B., Laerdahl, J. K., Lien, S., Moen, T., Berg, P., Hindar, K., Davidson, W.S., Koop, B. F., Adzhubei, A. and Høyheim, B., 2007. An extensive resource of single nucleotide polymorphism markers associated with Atlantic salmon (Salmo salar) expressed sequences. Aquaculture. 256.

Hohenlohe, P. A., Amish, S. J., Catchen, J. M., Allendorf, F. W. and Luikart, G., 2011. Next-‐generation RAD sequencing identifies thousands of SNPs for assessing hybridization between rainbow and westslope cutthroat trout. Molecular Ecology Resources. 11.

Houston, R. D., Haley, C. S., Hamilton, A., Guy, D. R., Tinch, A. E., Taggart, J. B., McAndrew, B. J. and Bishop, S. C., 2008. Major quantitative trait loci affect resistance to infectious pancreatic necrosis in Atlantic salmon (Salmo salar). Genetics. 178.

Hurd, P. J. and Nelson, C. J., 2009. Advantages of next-‐generation sequencing versus the microarray in epigenetic research. Briefings in Functional Genomics and Proteomics. 8.

Metzer, M. L., 2010. Sequencing technologies – the next generation. Nature Reviews Genetics.31.

Moen, T., Baranski, M., Sonesson, A. K. and Kjøglum, S., 2009. Confirmation and fine-‐mapping of a major QTL for resistance to infectious pancreatic necrosis in Atlantic salmon (Salmo salar): population –level associations between markers and trait. BMC Genomics. 10.

Norris, A., Foyle, L. and Ratcliff, J., 2008. Heritability of mortality in response to a natural pancreas disease (SPDV) challenge in Atlantic salmon, Salmo salar L., post-‐smolts on a West of Ireland sea site. Journal of Fish Diseases. 31.

Pop, M. and Salzberg, S. L., 2007. Bioinformatics challenges of new sequencing technology. Trends in Genetics. 24.

Willing, E., Hoffmann, M., Klein, J. D., Weigel, D. and Dreyer, C., 2011. Paired-‐end RAD-‐seq for de novo assembly and marker design without available reference. Bioinformatics. 27.

[hr]

Ulrich Knief

Max Planck Institute for Ornithology, Germany

In my PhD project I focus on evolutionary genetics in the zebra finch. Particularly, I am interested in identifying genetic factors causing hatching failure, i.e. embryo mortality, in this species and locating them in the genome.

For many wild bird populations hatching failure rates of around 11.5% are reported (Morrow et al. 2002; Spottiswoode & Møller 2004) and wild zebra finches have a hatching failure rate of about 17% (Zann 1996). Thus, hatching failure is a surprisingly common phenomenon given that natural selection is constantly acting against it. While some of these hatching failures might have environmental causes, genetic factors could play a major role as well. These genetic factors should exhibit mainly dominance effects because additive variation in embryo mortality is constantly removed by natural selection.

Chromosomal inversion polymorphisms are one of these possible genetic factors, and they may cause over‐ or underdominant effects on offspring survival by either carrying recessive deleterious mutations or incompatibility alleles, respectively (Sturtevant & Mather 1938, Kirkpatrick & Barton 2006 and references therein, Kirkpatrick 2010). Even if considering these detrimental effects, several scenarios are conceivable under which inversions spread in a population and which finally lead to stable polymorphisms (Kirkpatrick & Barton 2006). They are all based on the idea that an inversion happens to carry some beneficial alleles and suppresses recombination in heterozygotes.

Our captive population of zebra finches shows a hatching failure rate of approximately 30%, which is not associated with the level of inbreeding. Hence, there might be some kind of incompatibility mechanism underlying it. Several other lines of evidence further point to inversion polymorphisms as a major cause of embryo mortality in our population.

Until recently, it was almost impossible to locate inversions precisely in the genome of any species. Only large pericentric inversions have been detected in chromosome spreads (karyotyping), in which the position of the centromere was shifted due to the inversion (prebanding era 1879‐1970; e.g. Shields 1982). Unfortunately, this method is only feasible on a small sample of individuals and the precision to demarcate the inversion breakpoints is limited.

Successively, staining techniques on chromosome spreads helped to identify large paracentric inversions, but these methods still suffer from small sample sizes and poor precision (chromosomal banding era 1970‐1986; Kirkpatrick 2010). Later, fluorescence in situ hybridization allowed for detection of paracentric inversions, but although the precision of this method in principle is high, it is not feasible to locate the inversion breakpoints exactly in the genome (molecular cytogenetic era 1986‐2004; Liehr & Pellestor 2009). Moreover, screening whole genomes with this method is labor intensive and it is still impossible to detect small inversions. Consequently, knowledge about inversions, their frequencies and phenotypic effects was limited and mainly restricted to insects (e.g. Balanyá et al. 2006), which have large polytene chromosomes in their salivary glands.

With the advent of massively parallel paired‐end sequencing, progress has been made in detecting inversions and characterizing their phenotypic effects in humans, because this method allows identifying inversions with almost single base pair resolution in a time‐and cost‐efficient manner (Talkowski et al. 2011). Also, it is possible to call smaller inversions (less than 10 kb in size) since only the inconsistent orientation of mate‐pairs with respect to the reference genome is needed for detection (Korbel et al. 2007).

Excited by this new technique, I changed my research program from merely finding chromosomes that carry incompatibility alleles and speculating about the mechanisms underlying it, to actually identifying the causal variants promoting incompatibility and embryo mortality. Understanding the proximate causes that lead to embryo mortality is not only interesting for an evolutionary biologist like myself, but also relevant to a broader research field, ranging from developmental biology to human medicine.

I am going to sequence pooled whole genome DNA samples of 100 zebra finches on the Illumina HiSeq 2000 platform with approximately 200‐fold coverage. This will allow me to locate inversion polymorphisms with high precision and to use PCR to cheaply genotype many individuals. Subsequently, I will link pedigree information of approximately 5,000 birds and dead embryos with their individual inversion genotypes to embryo mortality.

In essence, next‐generation sequencing offers me the opportunity to study the proximate causes of embryo mortality in detail and thus changes my zebra finch specific research program to one with a more general relevance.

Literature

Kirkpatrick M (2010) How and why chromosome inversions evolve. PLoS Biology 8, e1000501.

Kirkpatrick M, Barton N (2006) Chromosome inversions, local adaptation and speciation. Genetics 173, 419‐434.

Korbel JO, Urban AE, Affourtit JP, et al. (2007) Paired‐end mapping reveals extensive structural variation in the human genome. Science 318, 420‐426.

Liehr T, Pellestor F (2009) Molecular cytogenetics: the standard FISH and PRINS procedure. In: Fluorescence in situ hybridization (FISH) – application guide. Springer‐Verlag Berlin, Germany.

Morrow EH, Arnqvist G, Pitcher TE (2002) The evolution of infertility: does hatching rate in birds coevolve with female polyandry? Journal of Evolutionary Biology 15, 702‐709.

Shields GF (1982) Comparative avian cytogenetics – a review. Condor 84, 45‐58.

Spottiswoode C, Møller AP (2004) Genetic similarity and hatching success in birds. Proceedings of the Royal Society of London Series B‐Biological Sciences 271, 267‐272.

Sturtevant AH, Mather E (1938) The interrelations of inversions, heterosis and recombination. American Naturalist 72, 447‐452.

Talkowski ME, Ernst C, Heilbut A, et al. (2011) Next‐generation sequencing strategies enable routine detection of balanced chromosome rearrangements for clinical diagnostics and genetic research. American Journal of Human Genetics 88, 469‐481.

Zann RA (1996) The zebra finch: a synthesis of field and laboratory studies. Oxford University Press, USA.

[hr]

Mandar Phatak

Tata Institute of Fundamental Research, India

During the course of my PhD project, I will be looking at the role of various components of intracellular trafficking machinery in maintenance of the homeostasis of the epidermis using zebrafish as a model. Being an interface between the larva and its external environment, the epidermis has to function as a barrier and protect the larva from chemical and mechanical damage. At the same time, as the development progresses, the epidermis has to respond to various patterning and growth signals. In order to achieve this, the epidermis has to be highly adaptive. This adaptiveness is achieved through the ability to regulate cell shape, proliferation, and polarity. Intracellular trafficking related genes have previously been found to play a role in establishment and maintenance of cell size, shape and polarity in a multitude of in vitro and few in vivo systems. However, their role in maintenance of epidermal homeostasis has remained unclear so far.

The conventional approach to find genes involved in a particular process would be either to study deficiencies/knockdowns of selected genes that are most likely to be involved in that process (reverse genetics) or to screen for mutations that produce a specific phenotype (forward genetics), which in my case will be defects in epidermal integrity. I would be taking a forward genetic approach and perform a mutagenesis screen for epidermal homeostasis mutants. In a mutagenesis screen, families of individual offspring (F1) of mutagenized males are raised, and the progenies of F2 fish are screened for aberrant phenotypes. In order to map the mutations, the fish carrying a desired mutation is crossed to a reference line and F2 progenies are genotyped to calculate the recombination rates between the mutation and established genetic markers. Next, the mutation is mapped onto the genome, the genes in the region (candidate genes) are individually cloned and sequenced to check for mutations. This process; although very effective, has its own limitations.

Breeding fish for 3-4 generations requires about a year, takes up a lot of lab space and resources. Due to this, the throughput of a mutagenesis screening is significantly limited. A mutagenesis screen usually includes an assay, for example for the phenotype of interest; which in my case is loss of epidermal integrity. This assay is performed at a specific age or developmental stage. This creates a bias towards mutations showing obvious morphological phenotypes. The next step, recombination mapping, is also a very time consuming and laborious process. In zebrafish, as the average distance between the SSLP (microsatellite) markers is 1.1cM or ~660Kb, finding out the exact mutant gene can be very difficult. One more hurdle in this candidate gene approach is the quality of the zebrafish genome assembly. Due to the unfinished nature of the assembly, many times it is hard to get a reliable set of candidate genes.

With the advent of Next Generation Sequencing (NGS), I can now think of ways in which some of these steps can be bypassed or made shorter. NGS is slowly becoming popular in the zebrafish community. Although NGS is primarily being used for genotyping and analysis of allele frequencies, as well as copy number variation and SNPs in the zebrafish genome, several publications on the identification of mutations using targeted resequencing of captured exons have come up in the recent past. With the advent of affordable tabletop next gen sequencers like Ion Torrent, it has now become possible to use the technology as a way to replace some of the steps mentioned above.

Currently, I am working on a mutant identified in a previous mutagenesis screen. This mutation is mapped to a region on the genome that is not assembled perfectly. The conventional approach is to do a BAC walk and manually assemble the region of interest. However this approach is time consuming and error prone, therefore we are using data from a whole genome massively parallel sequencing project to produce a de novo assembly for the region of interest. In the future, I plan to use targeted resequencing of the exome to identify mutations in the F1 generation. This will serve three purposes. First, it will save the time and space required for raising two generations of fish, recombination mapping and cloning because with the new approach the whole process will ideally take 10 days to process 16 samples using bar coding. Secondly, it will remove the bias for mutations showing an obvious phenotype and will increase the throughput of the whole process by allowing screening of a larger number of fish. I will also have the liberty to use the screen both in a forward and reverse genetic manner. In order to study the effects of a particular gene, it is always useful to have a mutant line. However, there are no mutant lines available for many important genes involved in intracellular trafficking. This is probably because they don’t show obvious morphological phenotypes. These mutants might have very interesting intracellular defects that can only be studied using cell biological techniques which are not handy to be used as an assay. Partial exome sequencing approach will allow me to concentrate on a set of interesting genes and screen only for mutations in those genes along with the usual assay based screening.

NGS technologies have completely changed the workflow of my project and have allowed us as a laboratory to have a broader perspective. It is now possible to study different components of the same pathway in a relatively short time and study their transcriptional regulation. In the last few years, a lot of studies have implemented exome sequencing using Next Generation technologies efficiently to find genes responsible for rare monogenic as well as polygenic human diseases. By combining the strengths of the zebrafish system along with the speed, efficiency and throughput of NGS, a significant progress can be made in discovering roles of various genes in the morphogenesis and homeostatic maintenance of various tissues in vertebrates.

[hr]

Kim Steige

Department of Evolutionary Biology, Evolutionary Biology Center (EBC), Uppsala University, Sweden

A very interesting research question is to clarify the adaptive significance of regulatory changes in plants with respect to speciation events. This helps to understand how natural selection affects genomes and how adaptive phenotypic diversity is generated. An ideal model to study this is the plant genus Capsella. The two diploid sister species Capsella grandiflora and C. rubella have diverged less than 50,000 years ago (Foxe et al. 2009, Guo et al. 2009). They show great differences in floral and reproductive traits, which is connected to the fact that C. grandiflora is a self-incompatible outcrosser and C. rubella is predominantly selfing. Those huge morphological differences between the two species originate from a limited standing variation in C. rubella, as this species went through a severe bottleneck during speciation. It is even hypothesized that C. rubella originated from a single individual of C. grandiflora that lost self-incompatibility. This leads to the question if those morphological differences are due to cis-regulatory changes.

What method would be best to look at this topic? Previously used techniques, like quantitative PCR or microarrays, are limited to a relatively small number of genes and rely on prior knowledge about the genes of interest. Next-generation sequencing is an answer to this problem. This now enables us to directly study variation of transcriptomes between species or populations. Another advantage is that we can now detect previously unknown genes by aligning the transcripts to the genome sequence. This will reveal novel genes that would be interesting to study in more detail by further analyses.

One of these next-generation sequencing methods, namely qualitative and quantitative RNA-sequencing, generates huge amounts of data which is helpful to clarify regulatory changes in plants. For my research in particular, I will use RNA-sequencing to assess the differential expression between two parental lines with the parental alleles in a F1 generation. Thus, I will be able to distinguish between cis– and trans-regulatory effects on gene expression (Bullard et al. 2010). This could lead to the detection of novel genes that were involved in the morphological changes of Capsella rubella.

Another question that can be addressed by using next-generation sequencing techniques is the impact of natural selection on cis-regulatory regions and contrast that with selection on protein-coding regions. Next-generation sequencing compared to the traditional Sanger sequencing allows us to collect a greater amount of sequence data in a much smaller timeframe than before. Using this new method I will be able to generate sequences of 5’ untranslated regions of genes that have already been analyzed (Slotte et al. 2010) and I will be able to test for a difference of rate and strength of selection compared to the coding region. The 5’ untranslated region is of great interest when looking at regulatory evolution, as cis-regulatory elements are located there.

Concluding next-generation sequencing methods have made it much easier to generate huge amounts of sequence data in a relative short amount of time. Additionally, studying gene expression has become much easier, as previous knowledge about genetic sequences or certain genes involved to assess differences between species, accessions or different tissues is no longer necessary.

References:

Bullard JH, Mostovoy Y, Dudoit S, Brem RB (2010) Polygenic and directional regulatory evolution across pathways in Saccharomyces. P Natl Acad Sci USA 107: 5058-5063

Foxe JP, Slotte T, Stahl EA, Neuffer B, Hurka H, Wright SI. 2009. Recent speciation associated with the evolution of selfing in Capsella. P Natl Acad Sci USA 106: 5241-5245

Guo YL, Bechsgaard JS, Slotte T, Neuffer B, Lascoux M, Weigel D, Schierup MH (2009) Recent speciation of Capsella rubella from Capsella grandiflora, associated with loss of self-incompatibility and an extreme bottleneck. Proc Natl Acad Sci USA 106: 5246–5251

Slotte T, Foxe JP, Hazzouri KM, Wright SI. 2010. Genome-wide evidence for efficient positive and purifying selection in Capsella grandiflora, a plant species with a large effective population size. Mol Biol Evol 27:1813–1821