SVDquartets activity
Table of contents
- Expected learning outcome
- Getting started
- Exercise 1: species-tree inference under the coalescent
- Exercise 2: lineage-tree inference under the coalescent
- Exercise 3: single-locus analysis
SVDquartets Introduction slides
Expected learning outcome
The objective of this activity is to carry out a species-level phylogenetic analysis using multi-locus or SNP data under the coalescent model using SVDquartets. We will use the implementation of the method in PAUP*. Both species-level and lineage-level inferences will be considered. We will also see how the basic SVDquartets method can be used for (non-coalescent based) single locus analysis.
Getting started
The primary data set we will use for this tutorial is the rattlesnake data of Kubatko et al. (Syst. Biol. 60(4): 393-409, 2011). The data consist of 2 species, each divided into 3 subspecies, and an outgroup. There are 26 individuals (52 sequences) and 19 genes, for a total of 8,466 sites. The data can be downloaded in nexus format from www.stat.osu.edu/~lkubatko/data-snakes.nex.
Exercise 1: species-tree inference under the coalescent
- Start PAUP* and execute the data-snakes.nex data file, either by selecting this file using the “Open…” option in the File menu, or by issuing the following at the command line:
exe data-snakes.nex;
- Go to the “Analysis” menu, and select “SVDquartets …”. We will discuss the possible options as a group.For this exercise, set the number of randomly generated quartets to 20,000 (or fewer, if you have a slow computer), select the bootstrapping option, and select the species-tree analysis option. This replicates the analysis in Chifman and Kubatko (2014). Click “OK”.Note that the entire analysis could be run from the command line by typing svdquartets. For a list of options, use PAUP*’s help by typing
svdq ?
What is the bootstrap support for the clade containing the three S. catenatus subspecies and for the clade containing S. miliarius miliarius and S. miliarius barbouri?
[toggle title_open=”Hide Answer” title_closed=”Show Answer” hide=”yes” border=”yes” style=”default” excerpt_length=”0″ read_more_text=”Read More” read_less_text=”Read Less” include_excerpt_html=”no”]Answers will vary slightly due to the random selection of quartets and bootstrap samples. The catenatus clade will typically have bootstrap support above 90, while bootstrap support for S. miliarius miliarius and S. miliarius barbouri is typically between 45 and 65.[/toggle] - Now we’ll run another analysis by going again to the SVDquartets dialog box in the Analysis menu. De-select the bootstrap option, and select the “Show quartet scores” option. Click “OK”. Examine the output — what do the scores represent?[toggle title_open=”Hide Answer” title_closed=”Show Answer” hide=”yes” border=”yes” style=”default” excerpt_length=”0″ read_more_text=”Read More” read_less_text=”Read Less” include_excerpt_html=”no”]Output from analyzing each quartet is shown. For each sampled quartet, three scores (representing the three possible unrooted trees) are given. Scanning over the list of sampled quartets, several different kinds of relationships are observed: sometimes one tree has a much lower score than the other two, and sometimes the scores for all three relationships are much more even. Note that numbers correspond to lineages, as ordered in the input file. Species labels are applied to build the species tree. The inferred quartets among species can be observed by selecting the “Write “quartets” file” option.[/toggle]
Exercise 2: lineage tree inference under the coalescent
- Now suppose that we are interested in the relationships among the individual lineages under the coalecent model. In the “SVDquartets” dialog box, restore the defaults (make sure that the species-tree analysis option is not selected), and select the bootstrap option. Set the number of quartets sampled to be the same in as exercise 1. Click “OK”.
- Once the analysis completes, the estimated tree can be displayed by selecting the “Print/View SVDquartets Boostrap” option from the “Trees” menu. Notice that the subspecies have been color-coded to allow a nice visual assessment of the relationships among subspecies in the estimated tree. The commands to create the color-coding are in the file data-snakes.nex.
- Compare the lineage tree and bootstrap values you observed in this analysis to the species tree and bootstrap values you observed in exercise 1. Do the results make sense? Close the “Print Trees Preview” when you’ve answered this question.[toggle title_open=”Hide Answer” title_closed=”Show Answer” hide=”yes” border=”yes” style=”default” excerpt_length=”0″ read_more_text=”Read More” read_less_text=”Read Less” include_excerpt_html=”no”]The bootstrap support at the species level should appear similar (subject to sampling variability) in the two analyses.[/toggle]
Exercise 3: single-locus non-coalescent analysis
- When analyzing a single locus which is assumed to be a non-recombining unit there is no need to incorporate the coalescent process in the model, since all sites in the gene can be assumed to share a single underlying phylogeny. In this case, an analysis based on SVDquartets can still be carried out, with a few changes. We’ll try this for the mitochondrial gene ATP in the rattlesnake data set, as follows:
- Select only the mitochondrial locus using the following commands:
charset ATP=1-665;
include ATP / only; - In the SVDquartets dialog box, change the matrix rank to 4, and make sure the species tree analysis option is not selected.
- Select only the mitochondrial locus using the following commands:
- Compare the mitochondrial tree to the lineage tree from the full data in exercise 2.[toggle title_open=”Hide Answer” title_closed=”Show Answer” hide=”yes” border=”yes” style=”default” excerpt_length=”0″ read_more_text=”Read More” read_less_text=”Read Less” include_excerpt_html=”no”]In general, there is agreement between the mitochondrial gene tree and the lineage tree based on all loci. The mitochondrial gene is relatively informative about these relationships.[/toggle]
- Repeat this analysis for some of the other genes. You can see the list of genes in the MrBayes block in the data-snakes.nex file. What do you find?[toggle title_open=”Hide Answer” title_closed=”Show Answer” hide=”yes” border=”yes” style=”default” excerpt_length=”0″ read_more_text=”Read More” read_less_text=”Read Less” include_excerpt_html=”no”]The genes vary widely in terms of informativeness. Some trees display species-level relationships that are congruent with the species tree found in exercise 1, while other gene trees are largely unresolved. This is typical for multi-locus analyses.[/toggle]