table of contents
- overview
- getting started
- exercise 1: species-tree inference under the coalescent
- exercise 2: lineage-tree inference under the coalescent
- exercise 3: single-locus analysis
overview
In this lab exercise, we will use SVDquartets to analyze the rattlesnake data from Kubatko et al. (Syst. Biol. 60(4): 393-409, 2011) that was discussed in the lectures. The data consist of 2 species, each divided into 3 subspecies, and an outgroup. There are 26 individuals (52 sequences) and 19 genes, for a total of 8,466 sites.
getting started
The data can be downloaded in nexus format from www.stat.osu.edu/~lkubatko/data-snakes.nex. It will be easiest to issue the following commands:
cd ~/workshop_materials/svd_quartets
wget http://www.stat.osu.edu/~lkubatko/data-snakes.nex
exercise 1: species-tree inference under the coalescent
- Start PAUP* and execute the data-snakes.nex data file by issuing the following at the command line:
exe data-snakes.nex; - Let's look at the possible options available with SVDquartets in PAUP*, by typing
svdq ? - Now we'll run a standard svdq analysis. For this exercise, set the number of randomly generated quartets to 20,000. We'll use a species tree analysis with 100 bootstrap replicates. This reproduces the analysis in Chifman and Kubatko (2014). The complete command is:
SVDQuartets nquartets=20000 speciesTree partition=snakespecies bootstrap;
Question: What is the bootstrap support for the clade containing the three S. catenatus subspecies and for the clade containing the three S. miliarius subspecies?
Answers will vary slightly due to the random selection of quartets and bootstrap samples. The catenatus clade will typically have bootstrap support above 90, while bootstrap support for miliarius is typically between 45 and 65.
- Now we'll run a basic analysis with the "showScores" command turned on. With this option enabled, PAUP* will print the scores of all three potential trees for each quartet considered. This will give us some insight about how the SVDquartets method works. The command to use is:
SVDQuartets nquartets=20000 speciesTree partition=snakespecies showScores=yes bootstrap=no;
Examine the output -- what do the scores represent?
Output from analyzing each quartet is shown. For each sampled quartet, three scores (representing the three possible unrooted trees) are given. Scanning over the list of sampled quartets, several different kinds of relationships are observed: sometimes one tree has a much lower score than the other two, and sometimes the scores for all three relationships are much more even. Note that numbers correspond to lineages, as ordered in the input file. Species labels are applied to build the species tree. The inferred quartets among species can be observed by selecting the "Write "quartets" file" option.
exercise 2: lineage tree inference under the coalescent
- Now suppose that we are interested in the relationships among the individual lineages under the coalecent model. We can have PAUP* ignore the species partition, and instead implement the method treating each sequence as distinct. The command is:
SVDQuartets nquartets=20000 speciesTree=no showScores=no bootstrap;
- Compare the lineage tree and bootstrap values you observed in this analysis to the species tree and bootstrap values you observed in exercise 1. Do the results make sense?
The bootstrap support at the species level should appear similar (subject to sampling variability) in the two analyses.
- In the GUI version of PAUP*, the trees estimated by SVDquartets can be nicely displayed. To do this, select the "Print/View SVDquartets Boostrap" option from the "Trees" menu. Notice that the subspecies have been color-coded to allow a nice visual assessment of the relationships among subspecies in the estimated tree. The commands to create the color-coding are in the file data-snakes.nex.
exercise 3: single-locus non-coalescent analysis
- When analyzing a single locus which is assumed to be a non-recombining unit there is no need to incorporate the coalescent process in the model, since all sites in the gene can be assumed to share a single underlying phylogeny. In this case, an analysis based on SVDquartets can still be carried out, with a few changes. We'll try this for the mitochondrial gene ATP in the rattlesnake data set, as follows:
- Select only the mitochondrial locus using the following commands:
charset ATP=1-665;
include ATP / only;
- To carry out the proper analysis, we must tell PAUP* to assume that a single tree (the gene tree) underlies all sites in the data set using the option "treemodel=shared". Also note that we would not use a species tree partition in this setting. The command is:
SVDQuartets treemodel=shared bootstrap;
- Select only the mitochondrial locus using the following commands:
- Compare the mitochondrial tree to the lineage tree from the full data in exercise 2.
In general, there is agreement between the mitochondrial gene tree and the lineage tree based on all loci. The mitochondrial gene is relatively informative about these relationships.
- Repeat this analysis for some of the other genes. You can see the list of gene in the MrBayes block in the data-snakes.nex file. What do you find?
The genes vary widely in terms of informativeness. Some trees display species-level relationships that are congruent with the species tree found in exercise 1, while other gene trees are largely unresolved. This is typical for multi-locus analyses.