MAFFT (Multiple Alignment with Fast Fourier Transform) is a multiple sequence alignment program. MAFFT implements the Fast Fourier Transform (FFT) to optimize protein alignments based on physical properties of the amino acids (Katoh et al., 2002; 2005; 2008; 2009). The program uses progressive alignment and iterative alignment. Nucleotide and amino acid sequences in FASTA format can be aligned. MAFFT is useful for hard-to-align sequences such as those containing large gaps (e.g., rRNA sequences containing variable loop regions).
At the command prompt of the terminal window, type:
mafft
MAFFT will ask you for an input file, which must be in FASTA format. You can use your own FASTA file, or use this coding DNA data set (view, download)
or this amino acid data set (view, download). Additional unaligned nucleotide and amino acid sequences in FASTA format are available.
Type the name of the output file. By default, MAFFT outputs the file in FASTA format. For example, you can type:
output1.fasta
The program will ask you several questions, such as the number of trees to rebuild, the number of iterations, scoring matrix, gap cost penalties, etc. Hit the return key to keep the default setting for any of the questions. Make sure to change the number of iterations from 0 to any positive number when running the iterative alignment.
The default setting in MAFFT is a progressive FFT alignment with two tree-building cycles (FFT-NS-2). A quick way to generate an alignment keeping all default settings is:
mafft input.fasta output.fasta
It can also detect the best settings by using:
mafft --auto input.fasta output.fasta
For a list of other parameters you can type:
mafft help
MAFFT is also available via a web interface.