This example is based on a recent Science paper, and the figures here are reproduced from the supplement. Please do not read (or reread) the paper until after you have finished the exercise.
There are 46 H. pylori genomes in the sample, one of which is an ancient genome from “Otzi”, the Tirolean iceman. The aim of the practical is to establish how his H. pylori is related to modern isolates based on the SNPs in the genomes and in particular those from Europeans (hpEurope) and Indians (hpAsia2).
Helicobacter pylori is a bacterium with a 1.67 Megabase haploid genome, and the sample of 46 genomes is variable at 172,419 SNPs. Here are the results from three commonly used methods applied to these SNPs.
Neighbor joining tree
STRUCTURE
PCA
Questions 1-3:
(1) What do the results of the three methods tell us about relationships between hpEurope and HpAsia2 strains? What evolutionary scenarios might have accounted for this? Please note that for this and other questions, the hypotheses you suggest do NOT be consistent with history or geography, only with the genetic data.
(2) Suggest TWO evolutionary/historical hypotheses for the relationships between the hpAsia2 and hpEAsia populations.
(3) Please describe at least TWO evolutionary/historical hypotheses for the relationship of Otzi to the other samples that are consistent with the results of each of the three methods. Try to make the hypotheses as distinct as possible, given the data. Then describe TWO evolutionary/historical hypotheses that are consistent with the results of all three methods.
Now we’ll begin working with fineSTRUCTURE version 2. On the cloud, you can run this program from the command line by typing “fs
“. There is also a launcher for the fineSTRUCTURE GUI on the desktop. The data files for this activity can be found in ~/wpsg_2016/activities/fineSTRUCTURE
fineSTRUCTURE performs three steps.
(i) It estimates parameters of a “chromosome painting” Hidden Markov Model.
(ii) It paints each of the chromosomes as a mosaic of each of the others.
(iii) It uses a summary of the painting, the “coancestry matrix”, in order to divide the sample into populations.
fineSTRUCTURE takes three inputs.
(i) 46hpyloriHaplotypes.phase
is the genetic data from the 46 strains in PHASE format. Note that H. pylori is haploid.
(ii) hpyloriUnifRecMap.rcomb
is a file giving the genetic distance between the SNPs in the genome. Since there is no genetic map for H. pylori, this is simply the physical distance between SNPs in the reference genome.
(iii) 46hpyloristrains.id
is a file giving the names of each strain. It can also be used to set a flag as to whether the strain should be used in the analysis.
The command we will use to run version 2 of fineSTRUCTURE (fs-2.0.7) is:
fs hpyloriproject1.cp -n -phasefiles 46hpyloriHaplotypes.phase -recombfiles hpyloriUnifRecMap.rcomb -idfile 46hpyloristrains.id -ploidy 1 -go
hyploriproject1.cp
is the file that records the details of the fineSTRUCTURE runs, with intermediate files being written into the directory hpyloriproject1
.
“-n” indicates that a previous settings file (if there is one) should be overwritten.
“-ploidy 1” indicates the data is haploid. The default is for the data to be diploid.
“-go” indicates to run through the entire fineSTRUCTURE pipeline.
fs -help
gives an overview of fineSTRUCTURE help and the most basic parameters.
fs -help parameters
gives the parameters and their default values, which we use in this run.
An alternative is to assume all the SNPs are unlinked. This is performed by running the data in the same way, but with the recombination map file omitted. In this case, the data used by the algorithm is very similar to that used by STRUCTURE, PCA and neighbour joining.
fs hpyloriproject.cp -n -phasefiles 46hpyloriHaplotypes.phase -idfile 46hpyloristrains.id -ploidy 1 -go
Questions 4-10
(4) Use the GUI (part of fs version 0.01) to visualize the results. Describe them. Describe the differences between the linked and unlinked results.
Hint, the “change colour scale” function in the GUI can be used to allow a larger proportion of the colour palette variation to fall within regions of the overall range of values that are most informative.
(5) Which population consists of individuals that are not particularly related to each other? How do results like this arise?
(6) Intransitive relationships between populations are a possible signature of admixture; e.g., if pop1 and pop2 have high coancestry with each other and pop1 and pop3 have high coancestry with each other, but pop2 and pop3 do not. One scenario that can explain this pattern is that pop1 and pop2 were sister taxa before pop2 and pop3 admixed with each other. Find three such sets of population triplets in the data.
(7) fineSTRUCTURE infers a large number of populations under both models; why do you think this is?
(8) Now onto the relationship between Otzi and the other samples. What do the coancestry values of Otzi with hpEurope and hpAsia2 isolates tell us about Otzi’s relationships with these samples?
(9) How can your answers to question 8 be reconciled with results from the three other methods above? Describe a single evolutionary hypothesis that explains all the data. Are there any alternatives?
(10) Based on this hypothesis, how might the subtle differences between the linked and unlinked models in the results for Otzi be explained?