table of contents

  • expected learning outcome
  • getting started
  • exercise 1
  • exercise 2
  • explore the program

expected learning outcomes

The student will become familiar with the program Structurama (v2.0) for partitioning individuals to different populations.

 

getting started

 

NOTE: Ubuntu Linux users: download the following archive that contains a compiled version of Structurama:structurama2.tar.gz. Unpack and move the st2 file to ~/wme_jan2011/local/bin. Then you should be able to invoke the program from anywhere.

1. Change directory to the directory holding the example data file (mus.in).

cd [path here]

2. Start Structurama:

st2

You should see the following information on the screen:

 

Structurama 2.0 by
John P. Huelsenbeck (1), Edna T. Huelsenbeck (1), and Peter Andolfatto (2)
(1) Department of Integrative Biology University of California, Berkeley
(2) Department of Ecology \& Evolutionary Biology Princeton University
Type "Help" to get started.
Structurama >

 

Note the prompt (Structurama >) for entering new commands. At this point, you might want to follow the program’s advice and type help to see a list of the available commands. If you type help followed by the command name, you will see details of the use for a particular command. For example, help model describes the model command that sets the details of the assumptions of the analysis.

3. Read the example data file into memory by using the following command:

Structurama > exe mus.in

The file mus.in contains the data from the following study:

Orth, A., T. Adama, W. Din, and F. Bonhomme. 1998. Hybridation naturelle entre deux sous espces de souris domestique Mus musculus domesticus et Mus musculus castaneus prés de Lake Casitas (Californie). Genome41:104–110.

You should see the following information scroll along the screen as the program reads the mus.in data file:

Structurama > exe mus.in
Reading file "mus.in"
Number of individuals = 74
Number of loci = 15
Number of unique alleles at locus 1 = 2 (diploid)
Number of unique alleles at locus 2 = 2 (diploid)
Number of unique alleles at locus 3 = 2 (diploid)
Number of unique alleles at locus 4 = 3 (diploid)
Number of unique alleles at locus 5 = 3 (diploid)
Number of unique alleles at locus 6 = 3 (diploid)
Number of unique alleles at locus 7 = 4 (diploid)
Number of unique alleles at locus 8 = 4 (diploid)
Number of unique alleles at locus 9 = 3 (diploid)
Number of unique alleles at locus 10 = 2 (diploid)
Number of unique alleles at locus 11 = 3 (diploid)
Number of unique alleles at locus 12 = 5 (diploid)
Number of unique alleles at locus 13 = 4 (diploid)
Number of unique alleles at locus 14 = 3 (diploid)
Number of unique alleles at locus 15 = 2 (diploid)
Successfully read in file "mus.in"
Structurama >

 

exercise 1

 

1. Set the model assumptions for the analysis. For the first analysis, we are going to assume that there is no admixture and that the number of populations is fixed to 2. The following commands will accomplish this:

Structurama > model numpops=2 admixture=no

2. Run the Markov chain Monte Carlo (MCMC) analysis by using the following commands:

Structurama > mcmc ngen=100000 nchains=1

After typing this command, you should see information printed to the screen providing information on the progress of the MCMC analysis. It is possible that the program will prompt you to overwrite previously existing files. You should respond with yes.

3. After the MCMC analysis has completed, you will have a file that contains the samples output by the chain. The file should be called mus.in.p. Structrama provides several ways to summarize the MCMC samples. One method is finds a so-called ‘mean partition’ — a partition that minimizes the squared distance to all of the sampled partitions. To see the mean partition, type the following command:

Structurama > showmeanpart

It turns out that, for many situations, the program has more trouble finding the mean partition than it does performing the initial MCMC analysis. In other words, this step may take a while.

You should examine the mean partition. What does the mean partition imply about the population structure present in the mice?

 

exercise 2

 

1. Set the model assumptions for the analysis. For the second analysis, we are going to assume that there is no admixture and that the number of populations is an unknown random variable drawn from a Dirichlet Process Prior distribution. The following commands will accomplish this:

Structurama > model numpops=rv admixture=no

2. Run the Markov chain Monte Carlo (MCMC) analysis by using the following commands:

Structurama > mcmc ngen=100000 nchains=1

As before, if prompted to overwrite pre-existing files, respond with yes.

3. Examine the mean partition for this second analysis by typing

Structurama > showmeanpart

How many populations does the mean partition contain? How does this mean partition differ from the previous analysis, which assumed two populations?

 

explore the program

 

The two exercises walked you through two of the four models implemented by Structurama 2.0. The other two models allow the individuals to be admixed. The remaining two models can be implemented using the following commands:

Structurama > model numpops= admixture=yes

and

Structurama > model numpops=rv admixture=yes

You should explore the mus.in data using these additional models. You should be warned, however, that the mean partition will take a lot longer to calculate. This is because the partitions sampled by the program are much larger with admixture turned on. The total number of elements to be partitioned is equal to the number of individuals times the number of alleles for each individual (the number of loci times two). The algorithm for finding the mean partition for such cases needs to be improved. The program provides a few other commands for summarizing the results of a MCMC analysis (showtogetherness and shownumpops).