Estimating genome-wide genealogies using Relate

Leo Speidel and Simon R. Myers, 24 January 2020

Background

Relate estimates the joint genealogies of many thousands of modern individuals genome-wide. These genealogies describe how individuals are related through their most-recent common ancestors back in time and can be seen as the genetic analogue of a family tree for unrelated individuals.

The output of Relate is a sequence of binary trees, each describing the genealogical relationships locally in that part of the genome. Neighbouring genealogical trees differ because of recombination events that change the genetic relationships of individuals.

The method is published in L. Speidel, M. Forest, S. Shi, S. R. Myers. A method for genome-wide genealogy estimation for thousands of samples. Nature Genetics 51, 1321–1329 (2019).

A detailed documentation for Relate is available here.

How to run this exercise

For this activity, you will need to connect to the Amazon instance both via a terminal and via RStudio. For the connection via terminal, you can either use a terminal program on your own computer and then connect using SSH (ssh [email protected]), or you can use the terminal that comes with Guacamole (ec2-XX-XXX-XX-XXX.compute-1.amazonaws.com:8080/guacamole/).

To run R scripts and view pdf files produced with R, open an RStudio connection in your browser (ec2-XX-XXX-XX-XXX.compute-1.amazonaws.com:8787) and navigate to the directory ~/workshop_materials/24_relate_.

Getting started

  • Open a terminal and navigate to the directory ~/workshop_materials/24_relate.
  • Download the activity instructions and follow its instructions.