Estimating genome-wide genealogies using Relate

Jasmin Rees and Leo Speidel, 15 June 2022

Background

Relate estimates the joint genealogies of many thousands of modern individuals genome-wide. These genealogies describe how individuals are related through their most-recent common ancestors back in time and can be seen as the genetic analogue of a family tree for unrelated individuals.

The output of Relate is a sequence of binary trees, each describing the genealogical relationships locally in that part of the genome. Neighbouring genealogical trees differ because of recombination events that change the genetic relationships of individuals.

The method is published in L. Speidel, M. Forest, S. Shi, S. R. Myers. A method for genome-wide genealogy estimation for thousands of samples. Nature Genetics 51, 1321–1329 (2019).

A detailed documentation for Relate is available here.

To infer coalescence rates for low-coverage genomes, we developed a tool called Colate, published in L. Speidel, L. Cassidy, R. W. Davies, G. Hellenthal, P. Skoglund and S. R. Myers. Inferring Population Histories for Ancient Genomes Using Genome-Wide Genealogies. Molecular Biology and Evolution, Vol 38, Issue 9, 3497-3511 (2021).

How to run this exercise

For this activity, you will need to connect to the Amazon instance both via a terminal and via guacamole. For the connection via terminal, you can either use a terminal program on your own computer and then connect using SSH (ssh [email protected]), or you can use the terminal that comes with Guacamole (ec2-XX-XXX-XX-XXX.compute-1.amazonaws.com:8080/guacamole/).

Preferably, run the R code and scripts in the terminal, you can view the produced pdf and png files through the guacamole ubuntu GUI (if you feel more comfortable in Rstudio server you can also run the R code and scripts there).

Getting started

  • Open a terminal and navigate to the directory ~/workshop_materials/a15_relate.
  • Download the activity instructions and follow its instructions.