Common sequence formats used in BxSeqTools are: GenBank, FASTA, and Free Text
GenBank Format:
See detailed explanation, fields, feature keys, feature qualifiers, or feature locations
Example: |
LOCUS AAURRA 118 bp ss-rRNA linear 16-JUN-1986 DEFINITION A.auricula-judae (mushroom) 5S ribosomal RNA. ACCESSION K03160 VERSION K03160.1 GI:173593 KEYWORDS 5S ribosomal RNA; ribosomal RNA. SOURCE A.auricula-judae (mushroom) ribosomal RNA. ORGANISM Auricularia auricula-judae Eukaryota; Fungi; Eumycota; Basidiomycotina; Phragmobasidiomycetes; Heterobasidiomycetidae; Auriculariales; Auriculariaceae. REFERENCE 1 (bases 1 to 118) AUTHORS Huysmans,E., Dams,E., Vandenberghe,A. and De Wachter,R. TITLE The nucleotide sequences of the 5S rRNAs of four mushrooms and their use in studying the phylogenetic position of basidiomycetes among the eukaryotes JOURNAL Nucleic Acids Res. 11, 2871-2880 (1983) FEATURES Location/Qualifiers rRNA 1..118 /note="5S ribosomal RNA" ORIGIN 1 ATCCACGGCC ATAGGACTCT GAAAGCACTG CATCCCGTCC GATCTGCAAA GTTAACCAGA 61 GTACCGCCCA GTTAGTACCA CGGTGGGGGA CCACGCGGGA ATCCTGGGTG CTGTGGTT // |
FASTA Format:
A sequence in FASTA format begins with a single-line description, followed by lines of sequence data.
- The description line starts with a greater than symbol (">").
- The word following the greater than symbol (">") immediately is the "ID" (name) of the sequence, the rest of the line is the description.
- The "ID" and the description are optional.
- The sequence ends if there is another greater than symbol (">") symbol at the beginning of a line and another sequence begins.
The following example contains three sequences
(Example1, Example2, and hCdk9): |
>Example1 envelope protein
ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT
QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC
HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK
MDWFLNYLNNLTV
>Example2 synthetic peptide
HITREPLKHIPKERYRGTNDTLSPQIESIWAAELDRYKLVKTNCSNVS
>gi|17017983|ref|NM_001261.2| Homo sapiens cyclin-dependent kinase 9 CGCCCGCCGGAGGGGCCTGGAGTGCGGCGGCGGCGGGACCCGGAGCAGGAGCGGCGGCAGC
AGCGACTGGGGGCGGCGGCGGCGCGTTGGAGGCGGCCATGGCAAAGCAGTACGACTCGGTG
GAGTGCCCTTTTTGTGATGAAGTTTCCAAATACGAGAAGCTCGCCAAGATCGGCCAAGGCA
|
Free Text Format:
Example: (Most BxSeqTools programs will automatically remove non-IUPAC characters) |
121 atccacggcc ataggactct gaaagcactg catcccgtcc gatctgcaaa gttaaccaga
61 gtaCCgccca gttagtaccGGGa cggtggggga ccagga atcctgggtg ctgtggtt
// |
Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters (see below). Before submitting a request, any numerical digits in the query sequence should either be removed or replaced by appropriate letter codes (e.g., N for unknown nucleic acid residue or X for unknown amino acid residue).
The nucleic acid codes supported are:
A --> adenosine M --> A C (amino)
C --> cytidine S --> G C (strong)
G --> guanine W --> A T (weak)
T --> thymidine B --> G T C
U --> uridine D --> G A T
R --> G A (purine) H --> A C T
Y --> T C (pyrimidine) V --> G C A
K --> G T (keto) N --> A G C T (any)
- gap of indeterminate length
The accepted amino acid codes are:
A alanine P proline
B aspartate or asparagine Q glutamine
C cystine R arginine
D aspartate S serine
E glutamate T threonine
F phenylalanine U selenocysteine
G glycine V valine
H histidine W tryptophane
I isoleucine Y tyrosine
K lysine Z glutamate or glutamine
L leucine X any
M methionine * translation stop
N asparagine - gap of indeterminate length
|