GARLI (Genetic Algorithm for Rapid Likelihood Inference) performs phylogenetic searches on aligned nucleotide, codon and amino acid data sets using the maximum likelihood criterion. On a practical level, the program is able to perform maximum-likelihood tree searches on large data sets in a number of hours.
GARLI is loosely based on the program GAML (Lewis 1998). It uses a stochastic genetic algorithm-like approach to simultaneously find the topology, branch lengths and substitution model parameters that maximize the log-likelihood (lnL). This involves the evolution of a population of solutions termed individuals, with each individual encoding a tree topology, a set of branch lengths and a set of model parameters. Each individual is assigned a fitness based on its lnL score. Each generation, random mutations are applied to some of the components of the individuals, and their fitnesses are recalculated. The individuals are then chosen to be the parents of the individuals of the next generation, in proportion to their fitnesses. This process is repeated many times, and the population of individuals evolves toward higher fitness solutions.
The mutation types used by GARLI are divided into three types: topological mutations, model parameter mutations and branch-length mutations. Topological mutations consist of the standard NNI and SPR rearrangement types, as well as a localized form of SPR in which the pruned subtree may only be reattached to branches within a certain radius of its former location. Topological mutations are followed by some degree of rough branch-length optimization. Model mutations simply choose one of the model parameters and multiply it by a gamma-distributed variable with mean 1.0. When branch-length mutations are performed, a number of branches are chosen and each has its current length multiplied by a different gamma-distributed variable.
Substitution models available in version 1.0 include:
- Nucleotide models: All models nested within the General Time Reversible (GTR) model, optionally with discrete gamma distributed rate heterogeneity and/or an inferred proportion of invariable sites.
- Amino acid models: Many of the well known fixed amino acid rate matrices (Dayhoff, Jones, WAG, mtRev, mtmam), with either fixed or observed (aka “+F”) amino acid frequencies, and discrete gamma distributed rate heterogeneity and/or an inferred proportion of invariable sites.
- Codon models: The basic Goldman and Yang (1994) model and other related models, with a number of options for codon frequencies (equal, “F1x4”, “F3x4”, observed) and one or more estimated non-synonymous rate categories (aka dN/dS or omega parameters).
GARLI is available for all operating systems at http://garli.googlecode.com.
For quick reference, look at the cheat sheet.
For detailed instructions on using GARLI, see the GARLI wiki.
For algorithmic details, see Derrick Zwickl’s dissertation.
Input Format:
GARLI expects a garli.conf configuration file and data set to be in the directory that it is executed from.
To run GARLI in the terminal, type:
garli