Detecting Selection with Extended Haplotype Statistics

Joana Meier
, January 2025



Presentation slides

Slides


Summary

In this activity we will identify selective sweeps in house sparrows. We will test if the European house sparrow (Passer domesticus domesticus) shows any recent sweeps associated with human commensalism and thus not found in wild sparrows (Passer domesticus bactrianus). For this we will use extended haplotype statistics which compare the haplotype lengths of alleles. In a selective sweep, we expect that not only the variant under selection increases in frequency but also linked variants, leading to long haplotypes. We can compare haplotype lengths between alternative alleles in the same population/species with iHS or between species with XP-EHH. After having found the region with the strongest signature of a selective sweep in the European house sparrow, we will check what genes we find in this region.

This tutorial is an adapted version from https://speciationgenomics.github.io and the data is based on Ravinet et al. (2018).

How to run this activity

The activity on the detection of selection with extended haplotype statistics will be run first through the Terminal, using SSH, and then with RStudio.

  • Connect to the instance using SSH. Use “wpsg” as the username and the usual password:
    ssh [email protected] (replace XXX with your Amazon instance IP address)
  • Once connected through SSH, navigate into the tutorial directory: cd ~/workshop_materials/28_ehh.
  • Follow the instructions for the first part of the activity – phasing the data – on this GitHub page.

The remainder of the activity will be done with RStudio.

  • To connect to the instance through RStudio, copy the instance address into the URL field of a browser, followed by a colon and the number of the port for RStudio, 8787:
    ec2-XXX-XXX-XXX-XXX.compute-1.amazonaws.com:8787
  • In RStudio, set the activity directory as the working directory: setwd(~/workshop_materials/28_ehh/data)

  • Follow the second part of the activity – running extended haplotype statistics – on this GitHub page.
  • Follow the third part of the activity – identifying genes within selective sweeps – on this GitHub page.