Detecting Positive Selection

Magdalena Bohutínská, January 2025



Slides

Presentation slides


Section 1: Simple selection scan design and alpine adaptation in Arabidopsis

 

Background
In this activity, we will focus on the role of natural selection in the foothill and alpine populations of Arabidopsis arenosa, a close relative of the plant model species A. thaliana. We will try to understand which genes it has utilized to adapt to the alpine environment of the Tatry Mountains.

Natural populations of Arabidopsis arenosa usually grow in foothill areas (0-800 m), but occasionally also in alpine environments (>1600 m). The foothill and alpine populations exhibit many striking phenotypic differences; alpine plants are shorter, have a more robust leaf cuticle, fewer trichomes, larger flowers and seeds, a more extensive root system, and survive for more seasons. The phenotypes and population genetic structure of alpine and foothill populations are shown in the figure below (Knotek et al. 2021).

Q Can you think of a functional explanation for any of these phenotypic differences?

Q Does the analysis of genetic structure suggest a single origin or repeated origins for the alpine populations in our dataset?

When plants were transplanted from the foothill environment to the alpine (and vice versa), it was shown that they performed better in their native environment. This response of alpine and foothill plants to transplantation into alpine and foothill gardens is shown below in a figure from Wos et al. (2022).

Q What does this observation indicate?

These are probably all the details you need to know about the system. You will learn more from their genomes!

Objectives

During this activity, we aim to answer the following question:

  • Which genes show signals of positive selection and are therefore likely responsible for the differences between foothill and alpine populations?

In the final step, we will also think about how to understand the potential functions of these candidate genes under selection and how they could be studied further.

How to do this activity

  • Open Rstudio in you browser (http://ec2-XX-XXX-XXX-XXX.compute-1.amazonaws.com:8787)
  • Open the file ~/workshop_materials/28_detecting_positive_selection/geneticsOfLocalAdaptation.R
  • All instructions, discussion points, and commands are included in the file, but feel free to ask us anything at any time!
  • We encourage you to discuss the analyses as much as possible in pairs or with faculty members
  • Work on the bonus sections only if you are ahead of schedule. At the end of the activity, we will have a short group discussion to wrap up.

Section 2: More complex designs

 
In real-world scenarios, datasets are rarely as simple as the one used in the example of alpine adaptation. Often, we deal with multiple populations among which we aim to detect signals of positive selection, as demonstrated in the morning lecture on sticklebacks. How should we design a selection scan in such cases? Naturally, the design depends on the specific research question we are asking. Here, we will explore two possible approaches.

2.1. If we are interested in signals of selection associated with a continuous environmental variable (e.g., temperature) or a continuous adaptive phenotype (e.g., body size), we can perform a genotype-environment or genotype-phenotype association analysis across all our populations.
This will be demonstrated in the R tutorial section dedicated to associations between plants and environmental gradients.

2.2. If our populations differ in a binary environmental or phenotypic trait (e.g., comparing benthic and limnetic sticklebacks) or do not represent the entire gradient of a given trait, it may be more appropriate to stick with a selection scan. In this case, the question is how to integrate information across different populations.
– One option is to combine all individuals representing a specific trait (e.g., all benthic sticklebacks) into one large population and compare them to all individuals with a contrasting trait (e.g., all limnetic sticklebacks).

Q What potential issues might arise with this approach?

– A second option is to calculate selection scan metrics separately for each population and then integrate the information.

Both approaches will be demonstrated using the example of benthic and limnetic sticklebacks from the Mat-Su region in Alaska, which you learned about in the morning lecture. The geographic distribution and population genetic structure of benthic and limnetic sticklebacks from Alaska is shown in the figure below.