HOMEREGISTERLOGIN
BxSeqTools » Help, Videos, and Manuals Got questions? Contact Us Now
GenBank Fields Description
Tip: Description of GenBank format fields

Help: Explanation of GenBank Fields

 
Field Description
LOCUS A short mnemonic name for the entry, chosen to suggest the sequence's definition. Mandatory keyword/exactly one record.
DEFINITION A concise description of the sequence. Mandatory keyword/one or more records.
ACCESSION The primary accession number is a unique, unchanging code assigned to each entry. (Please use this code when citing information from GenBank.) Mandatory keyword/one or more records.
VERSION A compound identifier consisting of the primary accession number and a numeric version number associated with the current version of the sequence data in the record. This is followed by an integer key (a "GI") assigned to the sequence by NCBI. Mandatory keyword/exactly one record.
NID An alternative method of presenting the NCBI GI identifier (described above). The NID is obsolete and was removed from the GenBank flatfile format in December 1999.
KEYWORDS Short phrases describing gene products and other information about an entry. Mandatory keyword in all annotated entries/one or more records.
SEGMENT Information on the order in which this entry appears in a series of discontinuous sequences from the same molecule. Optional keyword (only in segmented entries)/exactly one record.
SOURCE Common name of the organism or the name most frequently used in the literature. Mandatory keyword in all annotated entries/one or more records/includes one subkeyword.
  ORGANISM - Formal scientific name of the organism (first line) and taxonomic classification levels (second and subsequent lines). Mandatory subkeyword in all annotated entries/two or more records.
REFERENCE Citations for all articles containing data reported in this entry. Includes seven subkeywords and may repeat. Mandatory keyword/one or more records.
 

AUTHORS - Lists the authors of the citation. Optional subkeyword/one or more records.

CONSRTM - Lists the collective names of consortiums associated with the citation (eg, International Human Genome Sequencing Consortium), rather than individual author names. Optional subkeyword/one or more records.

TITLE - Full title of citation. Optional subkeyword (present in all but unpublished citations)/one or more records.

JOURNAL - Lists the journal name, volume, year, and page numbers of the citation. Mandatory subkeyword/one or more records.

MEDLINE - Provides the Medline unique identifier for a citation. Optional subkeyword/one record.

PUBMED - Provides the PubMed unique identifier for a citation. Optional subkeyword/one record.

REMARK - Specifies the relevance of a citation to an entry. Optional subkeyword/one or more records.

COMMENT Cross-references to other sequence entries, comparisons to other collections, notes of changes in LOCUS names, and other remarks. Optional keyword/one or more records/may include blank records.
FEATURES Table containing information on portions of the sequence that code for proteins and RNA sequences and information on experimentally determined sites of biological significance. Optional keyword/one or more records.
BASE COUNT Summary of the number of occurrences of each base code in the sequence. Mandatory keyword/exactly one record.
ORIGIN Specification of how the first base of the reported sequence is operationally located within the genome. Where possible, this includes its location within a larger genetic map. Mandatory keyword/exactly one record. - The ORIGIN line is followed by sequence data (multiple records).
//

Entry termination symbol. Mandatory at the end of an entry/exactly one record.