HOMEREGISTERLOGIN
BxSeqTools » Help, Videos, and Manuals Got questions? Contact Us Now
Sequence Formats Description
Tip: Description of All Sequence Formats supported by BxSeqTools

Help: Common Sequence Formats

Common sequence formats used in BxSeqTools are: GenBank, FASTA, and Free Text

 

GenBank Format:

See detailed explanation, fields, feature keys, feature qualifiers, or feature locations

Example:
LOCUS       AAURRA                   118 bp ss-rRNA    linear       16-JUN-1986
DEFINITION A.auricula-judae (mushroom) 5S ribosomal RNA.
ACCESSION K03160
VERSION K03160.1 GI:173593
KEYWORDS 5S ribosomal RNA; ribosomal RNA.
SOURCE A.auricula-judae (mushroom) ribosomal RNA.
ORGANISM Auricularia auricula-judae
Eukaryota; Fungi; Eumycota; Basidiomycotina; Phragmobasidiomycetes;
Heterobasidiomycetidae; Auriculariales; Auriculariaceae.
REFERENCE 1 (bases 1 to 118)
AUTHORS Huysmans,E., Dams,E., Vandenberghe,A. and De Wachter,R.
TITLE The nucleotide sequences of the 5S rRNAs of four mushrooms and
their use in studying the phylogenetic position of basidiomycetes
among the eukaryotes
JOURNAL Nucleic Acids Res. 11, 2871-2880 (1983)
FEATURES Location/Qualifiers
rRNA 1..118
/note="5S ribosomal RNA"
ORIGIN
1 ATCCACGGCC ATAGGACTCT GAAAGCACTG CATCCCGTCC GATCTGCAAA GTTAACCAGA
61 GTACCGCCCA GTTAGTACCA CGGTGGGGGA CCACGCGGGA ATCCTGGGTG CTGTGGTT
//

 

FASTA Format:

A sequence in FASTA format begins with a single-line description, followed by lines of sequence data.

  • The description line starts with a greater than symbol (">").
  • The word following the greater than symbol (">") immediately is the "ID" (name) of the sequence, the rest of the line is the description.
  • The "ID" and the description are optional.
  • The sequence ends if there is another greater than symbol (">") symbol at the beginning of a line and another sequence begins.

 

The following example contains three sequences
(Example1, Example2, and hCdk9):
>Example1 envelope protein
ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT
QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC
HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK
MDWFLNYLNNLTV
>Example2 synthetic peptide
HITREPLKHIPKERYRGTNDTLSPQIESIWAAELDRYKLVKTNCSNVS
>gi|17017983|ref|NM_001261.2| Homo sapiens cyclin-dependent kinase 9
CGCCCGCCGGAGGGGCCTGGAGTGCGGCGGCGGCGGGACCCGGAGCAGGAGCGGCGGCAGC AGCGACTGGGGGCGGCGGCGGCGCGTTGGAGGCGGCCATGGCAAAGCAGTACGACTCGGTG GAGTGCCCTTTTTGTGATGAAGTTTCCAAATACGAGAAGCTCGCCAAGATCGGCCAAGGCA

 

Free Text Format:

Example: (Most BxSeqTools programs will automatically remove non-IUPAC characters)
        121 atccacggcc ataggactct gaaagcactg catcccgtcc gatctgcaaa gttaaccaga
       61 gtaCCgccca gttagtaccGGGa cggtggggga ccagga atcctgggtg ctgtggtt
//

Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters (see below). Before submitting a request, any numerical digits in the query sequence should either be removed or replaced by appropriate letter codes (e.g., N for unknown nucleic acid residue or X for unknown amino acid residue).
The nucleic acid codes supported are:

        A --> adenosine           M --> A C (amino)
        C --> cytidine            S --> G C (strong)
        G --> guanine             W --> A T (weak)
        T --> thymidine           B --> G T C
        U --> uridine             D --> G A T
        R --> G A (purine)        H --> A C T
        Y --> T C (pyrimidine)    V --> G C A
        K --> G T (keto)          N --> A G C T (any)
                                  -  gap of indeterminate length
          
The accepted amino acid codes are:
    A  alanine                         P  proline
    B  aspartate or asparagine         Q  glutamine
    C  cystine                         R  arginine
    D  aspartate                       S  serine
    E  glutamate                       T  threonine
    F  phenylalanine                   U  selenocysteine
    G  glycine                         V  valine
    H  histidine                       W  tryptophane
    I  isoleucine                      Y  tyrosine
    K  lysine                          Z  glutamate or glutamine
    L  leucine                         X  any
    M  methionine                      *  translation stop
    N  asparagine                      -  gap of indeterminate length