A sequence in Fasta format begins with a single-line description followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (“>”) symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length to faciliate viewing and editing.
Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters (see below). Before submitting a request, any numerical digits in the query sequence should either be removed or replaced by appropriate letter codes (e.g., N for unknown nucleic acid residue or X for unknown amino acid residue).
Example DNA and protein sequence, both aligned and unaligned can be obtained from the menu to the right.