7.4 Appendix IV: Summary of qualifiers for feature keys
7.4.1 Qualifier List
The following is a list of available qualifiers for feature keys and their
usage.
The information is arranged as follows:
Qualifier name of qualifier; qualifier requires a value if followed by
an equal sign
Definition definition of the qualifier
Value format format of value, if required
Example example of qualifier with value
Comment comments, questions and clarifications
Qualifier /allele=
Definition name of the allele for the given gene
Value format "text"
Example /allele="adh1-1"
Comment all gene-related features (exon, CDS etc) for a given
gene should share the same /allele qualifier value;
the /allele qualifier value must, by definition, be
different from the /gene qualifier value; when used with
the variation feature key, the allele qualifier value
should be that of the variant.
Qualifier /anticodon=(pos: ,aa: )
Definition location of the anticodon of tRNA and the amino acid for which
it codes
Value format pos:<base_range>,aa:<amino_acid> where base_range
is the position of the anticodon and amino_acid is the
abbreviation for the amino acid encoded
Example /anticodon=(pos:34..36,aa:Phe)
Qualifier /bound_moiety=
Definition moiety bound
Value format "text"
Example /bound_moiety="repressor"
Qualifier /cell_line=
Definition cell line from which the sequence was obtained
Value format "text"
Example /cell_line="MCF7"
Qualifier /cell_type=
Definition cell type from which the sequence was obtained
Value format "text"
Example /cell_type="leukocyte"
Qualifier /chromosome=
Definition chromosome (e.g. Chromosome number) from which
the sequence was obtained
Value format "text"
Example /chromosome="1"
Qualifier /citation=
Definition reference to a citation listed in the entry reference field
Value format [integer-number] where integer-number is the number of the
reference as enumerated in the reference field
Example /citation=[3]
Comment used to indicate the citation providing the claim of and/or
evidence for a feature; brackets are used for conformity.
Qualifier /clone=
Definition clone from which the sequence was obtained
Value format "text"
Example /clone="lambda-hIL7.3"
Comment not more than one clone should be specified for a given source
feature; to indicate that the sequence was obtained from
multiple clones, multiple source features should be given.
Qualifier /clone_lib=
Definition clone library from which the sequence was obtained
Value format "text"
Example /clone_lib="lambda-hIL7"
Qualifier /codon=
Definition specifies a codon which is different from any found in the
reference genetic code
Value format (seq:"codon-sequence",aa:<amino_acid>) where
"codon-sequence" contains the bases of the codon
and <amino_acid> is the abbreviation for the translated amino
acid, the abbreviation for a modified unusual amino_acids from
section 7.5, or the word OTHER
Example /codon=(seq:"ttt", aa:Leu)
Comment used to specify unusual genetic codes, organellar codes, etc,
that are different from the "normal" code for the organism;
the codon specified by "seq" codes for the amino acid or stop
codon specified by "aa";
the codon that is specified is used throughout the CDS;
amino acids that are not on the controlled vocabulary list
can be annotated by using "aa:OTHER" as the amino acid
designation, and by giving the name of the residue in a /note
qualifier; only nucleotides a, g, c or t can be used in
"codon-sequence";
multiple /codon qualifiers should be used to describe ambiguous
nucleotides.
Qualifier /codon_start=
Definition indicates the offset at which the first complete codon of a
coding feature can be found, relative to the first base of that
feature.
Value format 1 or 2 or 3
Example /codon_start=2
Qualifier /cons_splice=
Definition differentiates between intron splice sites that conform
to the 5'-GT ... AG-3' splice site consensus
Value format (5'site:<value>, 3'site:<value>), where <value>
can be 'YES', 'NO' or 'ABSENT'
Example /cons_splice=(5'site:YES, 3'site:NO)
/cons_splice=(5'site:ABSENT, 3'site:NO)
Comment since the vast majority of splice sites conform to the
consensus, this qualifier should be used only when one
does not and the sequence has been checked; 'ABSENT'
can be used when one of the termini is not part of the
sequence and information on splice site is not
available.
Qualifier /country=
Definition country of origin for DNA sample, intended
for epidemiological or population studies.
Value format "any country from
http://www.ncbi.nlm.nih.gov/projects/collab/country.html"
Example "Canada"
Comment /country should be a single token taken from the country list
/country can also have the following format: country:sub_region,
such as: /country="Canada:Vancouver".
Qualifier /cultivar=
Definition cultivar (cultivated variety) of plant from which sequence was
obtained.
Value format "text"
Example /cultivar="Nipponbare"
/cultivar="Tenuifolius"
/cultivar="Candy Cane"
/cultivar="IR36"
Comment 'cultivar' is applied solely to products of artificial
selection; use the variety qualifier for natural, named
plant and fungal varieties;
Qualifier /db_xref=
Definition database cross-reference: pointer to related information in
another database.
Value format "<database>:<identifier>" where database is
the name of the database containing related information, and
identifier is the internal identifier of the related information
according to the naming conventions of the cross-referenced
database.
Example /db_xref="SWISS-PROT:P12345"
Comment the complete list of allowed database types is kept on
NCBI's public WWW server, at URL:
http://www.ncbi.nlm.nih.gov/projects/collab/
Qualifier /dev_stage=
Definition if the sequence was obtained from an organism in a specific
developmental stage, it is specified with this qualifier
Value format "text"
Example /dev_stage="fourth instar larva"
Qualifier /direction=
Definition direction of DNA replication
Value format left, right, or both where left indicates toward the 5' end of
the entry sequence (as presented) and right indicates toward
the 3' end
Example /direction=LEFT
Qualifier /EC_number=
Definition Enzyme Commission number for enzyme product of sequence
Value format "text"
Example /EC_number="1.1.2.4"
Comment valid values for EC numbers are defined in the list prepared
by the IUPAC-IUB Commission on Biochemical Enzyme Nomenclature
(published in Enzyme Nomenclature 1984 New York: Academic
Press (1984) or a more recent revision thereof).
Qualifier /ecotype
Definition a population within a given species displaying genetically
based, phenotypic traits that reflect adaptation to a local
habitat.
Value Format "text"
Example /ecotype="Columbia"
Comment an example of such a population is one that has adapted hairier
than normal leaves as a response to an especially sunny habitat.
'Ecotype' is often applied to standard genetic stocks of
Arabidopsis thaliana, but it can be applied to any sessile
organism.
Qualifier /environmental_sample
Definition identifies sequences derived by direct molecular
isolation (PCR, DGGE, or other anonymous methods) from
an environmental sample with no reliable identification
of the source organism
Value format none
Example /environmental_sample
Comment used only with the source feature key; source feature
keys containing the /environmental_sample qualifier
should also contain the /isolation_source qualifier.
Qualifier /estimated_length
Definition estimated length of the gap in the sequence
Value format unknown
Example /estimated_length=unknown
Comments the gap feature key is currently only applied to the
gaps of unknown length; the value format for /estimated_length
will be extended in the next edition of the Feature Table
document (April 2004)
Qualifier /evidence=
Definition value indicating the nature of supporting evidence,
distinguishing between experimentally determined and
theoretically derived data
Value format experimental, not_experimental
Example /evidence=experimental
Comment experimental indicates that the feature identification or
assignment is supported by direct experimental evidence;
not_experimental indicates that the data for the feature are
derived (eg promotor as identified by consensus match).
Qualifier /exception=
Definition indicates that the amino acid or RNA sequence
will not translate or agree with the DNA sequence according
to standard biological rules.
Value format "text"
Example /exception="RNA editing"
/exception="reasons given in citation"
Comment only to be used to describe biological mechanisms such
as RNA editing; where the exception cannot easily be described
a published citation must be referred to; protein translation of
/exception CDS will be different from the according conceptual
translation;
- must not be used where transl_except would be adequate,
e.g. in case of stop codon completion use:
/transl_except=(pos:6883,aa:TERM)
/note="TAA stop codon is completed by addition of 3' A
residues to mRNA".
- must not be used for ribosomal slippage, instead use join
operator, e.g.: CDS join(486..1784,1787..4810)
/note="ribosomal slip on tttt sequence at 1784..1787"
Qualifier /focus
Definition defines the source feature of primary biological interest for
records that have multiple source features originating from
different organisms
Value format none
Example /focus
Comment the /focus qualifier identifies the organism which is
displayed in the organism line and determines the
DDBJ/EMBL/GenBank taxonomic division the entry will appear in;
if no translation table is specified, the organism with /focus
will define the translation table; within an entry with several
source features, only one will exist with /focus on it;
multi-source entries with a /transgenic source feature
do not require a /focus qualifier.
Qualifier /frequency=
Definition frequency of the occurrence of a feature
Value format text representing the fraction of population carrying the
variation expressed as a decimal fraction
Example /frequency=".85"
Qualifier /function=
Definition function attributed to a sequence
Value format "text"
Example function="essential for recognition of cofactor"
Comment /function is used when the gene name and/or product name do not
convey the function attributable to a sequence.
Qualifier /gene=
Definition symbol of the gene corresponding to a sequence region
Value format "text"
Example /gene="ilvE"
Comment see O'Brien, S.J., ed., Genetic Maps 1987, Cold Spring Harbor
or a recent revision.
Qualifier /germline
Definition if the sequence shown is DNA and a member of the immunoglobulin
family, this qualifier is used to denote that the sequence is
from unrearranged DNA.
Value format none
Example /germline
Comment /germline cannot be used in the same entry/record as /rearranged
Qualifier /haplotype=
Definition haplotype of organism from which the sequence was obtained
Value format "text"
Example /haplotype="Dw3 B5 Cw1 A1"
Qualifier /insertion_seq=
Definition insertion sequence element from which the sequence
was obtained
Value format "text"
Example /insertion_seq="IS-11"
Comment /insertion_seq is legal on repeat_region feature key;
Qualifier /isolate=
Definition individual isolate from which the sequence was obtained
Value format "text"
Example /isolate="Patient #152"
Qualifier /isolation_source=
Definition describes the physical, environmental and/or local
geographical source of the biological sample from which
the sequence was derived
Value format "text"
Examples /isolation_source="rumen isolates from standard
Pelleted ration-fed steer #67"
/isolation_source="permanent Antarctic sea ice"
/isolation_source="denitrifying activated sludge from
carbon_limited continuous reactor"
Comment used only with the source feature key;
source feature keys containing an /environmental_sample
qualifier should also contain an /isolation_source
qualifier; the /country qualifier should be used to
describe the country and major geographical sub-region.
Qualifier /label=
Definition a label used to permanently tag a feature
Value format feature_label
Example /label=Alb1_exon1
Comment feature labels follow the naming conventions
for all feature table objects
(see Sections 3.1 and 3.4)
Qualifier /lab_host=
Definition laboratory host used to propagate the organism from which the
sequence was obtained
Value format "text"
Example /lab_host="chicken embryos"
Qualifier /locus_tag
Definition feature tag assigned for tracking purposes
Value Format "text"(single token)
Example /locus_tag="RSc0382"
/locus_tag="YPO0002"
Comment /locus_tag can be used with any feature where /gene is valid;
identical /locus_tag values may be used within an entry/record,
but only if the identical /locus_tag values are associated
with the same gene; in all other circumstances the /locus_tag
value must be unique within that entry/record.
Qualifier /map=
Definition genomic map position of feature
Value format "text"
Example /map="8q12-13"
Qualifier /macronuclear
Definition if the sequence shown is DNA and from an organism which
undergoes chromosomal differentiation between macronuclear and
micronuclear stages, this qualifier is used to denote that the
sequence is from macronuclear DNA.
Value format none
Example /macronuclear
Qualifier /mod_base=
Definition abbreviation for a modified nucleotide base
Value format modified_base
Example /mod_base=m5c
Comment modified nucleotides not found in the restricted vocabulary
list can be annotated by entering '/mod_base=OTHER' with
'/note="name of modified base"'
Qualifier /mol_type=
Definition in vivo molecule type of sequence
Value format "genomic DNA", "genomic RNA", "mRNA", "tRNA", "rRNA",
"snoRNA", "snRNA", "scRNA", "pre-RNA", "other RNA",
"other DNA", "unassigned DNA", "unassigned RNA"
Example /mol_type="genomic DNA"
Comment these text values describe the in vivo molecule that has been
sequenced and not the sequencing technique that has been used
(e.g. mRNA is a valid value, cDNA is not);
the value "genomic DNA" does not imply that the molecule is
nuclear (e.g. organelle and plasmid DNA should be described
using "genomic DNA");
ribosomal RNA genes should be described using "genomic DNA";
"rRNA" should only be used if the ribosomal RNA molecule itself
has been sequenced;
/mol_type is mandatory on every source feature key;
all /mol_type values within one entry/record must be the same;
values "other RNA" and "other DNA" should be applied to
synthetic sequences, values "unassigned DNA", "unassigned
RNA" should be applied were in vivo molecule is unknown;
Qualifier /note=
Definition any comment or additional information
Value format "text"
Example /note="This qualifier is equivalent to a comment."
Qualifier /number=
Definition a number to indicate the order of genetic elements (e.g.,
exons or introns) in the 5' to 3' direction
Value format unquoted text (single token)
Example /number=4
/number=6B
Comment text limited to integers, letters or combination of integers
and/or letters represented as an unquoted single token
(e.g. 5a, XIIb); any additional terms should be included in
/standard_name.
Example: /number=2A
/standard_name="long"
Qualifier /operon
Definition name of the operon the feature belongs to
Value format "text"
Example /operon="lac"
Comment currently valid only on Prokaryota-specific features
Qualifier /organelle=
Definition type of membrane-bound intracellular structure from which the
sequence was obtained
Value format mitochondrion, nucleomorph, plastid, mitochondrion:kinetoplast,
plastid:chloroplast, plastid:apicoplast, plastid:chromoplast,
plastid:cyanelle, plastid:leucoplast, plastid:proplastid,
Examples /organelle="mitochondrion"
/organelle="nucleomorph"
/organelle="plastid"
/organelle="mitochondrion:kinetoplast"
/organelle="plastid:chloroplast"
/organelle="plastid:apicoplast"
/organelle="plastid:chromoplast"
/organelle="plastid:cyanelle"
/organelle="plastid:leucoplast"
/organelle="plastid:proplastid"
Comments modifier text limited to values from controlled list
Qualifier /organism=
Definition scientific name of the organism that provided the
sequenced genetic material.
Value format "text"
Example /organism="Homo sapiens"
Comment the organism name which appears on the OS or ORGANISM line
will match the value of the /organism qualifier of the
source key in the simplest case of a one-source sequence.
Qualifier /partial
Definition differentiates between complete regions and partial ones
Value format none
Example /partial
Comment not to be used for new entries from 15-DEC-2001;
use '<' and '>' signs in the location descriptors to
indicate that the sequence is partial.
Qualifier /PCR_conditions=
Definition description of reaction conditions and components for PCR
Value format "text"
Example /PCR_conditions="Initial denaturation:94degC,1.5min"
Comment used with primer_bind key
Qualifier /phenotype=
Definition phenotype conferred by the feature
Value format "text"
Example /phenotype="erythromycin resistance"
Qualifier /pop_variant=
Definition population variant from which the sequence was obtained
Value format "text"
Example /pop_variant="population variant name"
Qualifier /plasmid=
Definition name of plasmid from which sequence was obtained
Value format "text"
Example /plasmid="C-589"
Qualifier /product=
Definition name of a product encoded by a sequence
Value format "text"
Example /product="catalase"
Qualifier /protein_id=
Definition protein identifier, issued by International collaborators.
this qualifier consists of a stable ID portion (3+5 format
with 3 position letters and 5 numbers) plus a version number
after the decimal point.
Value format <identifier>
Example /protein_id="AAA12345.1"
Comment when the protein sequence encoded by the CDS changes, only
the version number of the /protein_id value is incremented;
the stable part of the /protein_id remains unchanged and as a
result will permanently be associated with a given protein;
this qualifier is valid only on CDS features which translate
into a valid protein.
Qualifier /proviral
Definition if the sequence shown is viral and integrated into another
organism's genome, this qualifier is used to denote that
Value format none
Example /proviral
Comment /proviral cannot be used in the same entry/record as /virion
Qualifier /pseudo
Definition indicates that this feature is a non-functional version of the
element named by the feature key
Value format none
Example /pseudo
Qualifier /rearranged
Definition if the sequence shown is DNA and a member of the immunoglobulin
family, this qualifier is used to denote that the sequence is
from rearranged DNA.
Value format none
Example /rearranged
Comment /rearranged cannot be used in the same entry/record as /germline
Qualifier /replace=
Definition indicates that the sequence identified a feature's intervals is
replaced by the sequence shown in "text"; if no sequence is
contained within the qualifier, this indicates a deletion.
Value format "text"
Example /replace="a"
/replace=""
Qualifier /rpt_family=
Definition type of repeated sequence; "Alu" or "Kpn", for example
Value format "text"
Example /rpt_family="Alu"
Comment preferred usage is to qualify the repeat_region instead of any
of the constituent repeat_units
Qualifier /rpt_type=<repeat_type>
Definition organization of repeated sequence
Value format tandem, inverted, flanking, terminal, direct, dispersed, and
other
Example /rpt_type=INVERTED
Comment preferred usage is to qualify the repeat_region instead of any
of the constituent repeat_units. definitions of these values
will be added in a future release of this document. see
Singer, M. Int Rev Cytol 76, 67-112 (1982); Cell 26, 293-95
(1981); Hardman, N. Biochem J 234, 1-11 (1986).
Qualifier /rpt_unit=
Definition identity of repeat unit
Value format "text" or <base_range>
Example /rpt_unit="aagggc"
/rpt_unit=202..245
Comment used to indicate the literal sequence, or the base range of
the sequence that constitutes a repeat_region or a single
repeat_unit; the repeat family name should not be entered in
/rpt_unit="text"; /rpt_family should be used instead.
Qualifier /segment=
Definition name of viral or phage segment sequenced
Value format "text"
Example /segment="6"
Qualifier /serotype=
Definition serological variety of a species characterized by its
antigenic properties
Value format "text"
Example /serotype="B1"
Comment used only with the source feature key;
the Bacteriological Code recommends the use of the
term 'serovar' instead of 'serotype' for the
prokaryotes; see the International Code of Nomenclature
of Bacteria (1990 Revision) Appendix 10.B "Infraspecific
Terms".
Qualifier /serovar=
Definition serological variety of a species (usually a prokaryote)
characterized by its antigenic properties
Value format "text"
Example /serovar="O157:H7"
Comment used only with the source feature key;
the Bacteriological Code recommends the use of the
term 'serovar' instead of 'serotype' for prokaryotes;
see the International Code of Nomenclature of Bacteria
(1990 Revision) Appendix 10.B "Infraspecific Terms".
Qualifier /sex=
Definition sex of the organism from which the sequence was obtained
Value format "text"
Example /sex="female"
Qualifier /specific_host=
Definition natural host from which the sequence was obtained
Value format "text"
Example /specific_host="Rhizobium NGR234"
Qualifier /specimen_voucher=
Definition an identifier of the individual or collection of the source
organism and the place where it is currently stored, usually
an institution.
Value format "text"
Example /specimen_voucher="Smith s. n. 4-IV-1995 (U. S. Natl.
Herbarium)"
Qualifier /standard_name=
Definition accepted standard name for this feature
Value format "text"
Example /standard_name="dotted"
Comment use /standard_name to give full gene name, but use /gene to
give gene symbol (in the above example /gene="Dt").
Qualifier /strain=
Definition strain from which sequence was obtained
Value format "text"
Example /strain="BALB/c"
Qualifier /sub_clone=
Definition sub-clone from which sequence was obtained
Value format "text"
Example /sub_clone="lambda-hIL7.20g"
Comment the comments on /clone apply to /sub_clone
Qualifier /sub_species=
Definition name of sub-species of organism from which sequence was
obtained
Value format "text"
Example /sub_species="lactis"
Qualifier /sub_strain=
Definition sub_strain from which sequence was obtained
Value format "text"
Example /sub_strain="abis"
Qualifier /tissue_lib=
Definition tissue library from which sequence was obtained
Value format "text"
Example /tissue_lib="tissue library 772"
Qualifier /tissue_type=
Definition tissue type from which the sequence was obtained
Value format "text"
Example /tissue_type="liver"
Qualifier /transgenic
Definition identifies the source feature of the organism
which was the recipient of transgenic DNA
Value format none
Example /transgenic
Comment transgenic sequences must at least have two source
feature keys; the source feature key describing the
organism of the recipient DNA must span the whole
sequence; the /transgenic qualifier identifies the
organism which is displayed in the organism line and
determines that the entry will appear in the
DDBJ/EMBL/GenBank Synthetic Construct division;
multi-source entries including a /transgenic source
feature should not have a /focus qualifier.
Qualifier /translation=
Definition automatically generated one-letter abbreviated amino acid
sequence derived from either the universal genetic code or the
table as specified in /transl_table and as determined by
exceptions in the /transl_except and /codon qualifiers
Value format IUPAC one-letter amino acid abbreviation, "X" is to be used
for AA exceptions.
Example /translation="MASTFPPWYRGCASTPSLKGLIMCTW"
Comment to be used with CDS feature only; this is a mandatory qualifier
to the CDS feature key except for /pseudo CDSs;
see /transl_table for definition and location of genetic code
Tables.
Qualifier /transl_except=
Definition translational exception: single codon the translation of which
does not conform to genetic code defined by Organism and /codon=
Value format (pos:location,aa:<amino_acid>) where amino_acid is the
amino acid coded by the codon at the base_range position
Example /transl_except=(pos:213..215,aa:Trp)
/transl_except=(pos:1017,aa:TERM)
/transl_except=(pos:2000..2001,aa:TERM)
/transl_except=(pos:X22222:15..17,aa:Ala)
Comment if the amino acid is not on the restricted vocabulary list use
e.g., '/transl_except=(pos:213..215,aa:OTHER)' with
'/note="name of unusual amino acid"';
for modified amino-acid selenocysteine use three letter code
'Sec' (one letter code 'U' in amino-acid sequence)
/transl_except=(pos:1002..1004,aa:Sec);
for partial termination codons where TAA stop codon is
completed by the addition of 3' A residues to the mRNA
either a single base_position or a base_range is used, e.g.
if partial stop codon is a single base:
/transl_except=(pos:1017,aa:TERM)
if partial stop codon consists of two bases:
/transl_except=(pos:2000..2001,aa:TERM) with
'/note='stop codon completed by the addition of 3' A residues
to the mRNA'.
Qualifier /transl_table=
Definition definition of genetic code table used if other than universal
genetic code table. Tables used are described in appendix V,
section 7.5.5.
Value format integer
Example /transl_table=4
Comment genetic code exceptions outside range of specified tables are
reported in /codon or /transl_except qualifiers;
1=universal table 1; 2=non-universal table 2; etc.
Qualifier /transposon=
Definition transposable element from which the sequence was
obtained
Value format "text"
Example /transposon="Tn9"
Comment /transposon is legal on repeat_region feature key;
Qualifier /usedin=
Definition indicates that the feature is used in a compound feature in
another entry
Value format Accession-number:feature-name or
Database_name::Acc_number:feature_label
Example /usedin=X10087:proteinx
Comment database_name is an abbreviation for the name of the database
in which the entry for the accession number can be found.
Qualifier /variety
Definition variety (= varietas, a formal Linnaean rank) of organism
from which sequence was derived.
Value format "text"
Example /variety="insularis"
Comment use the cultivar qualifier for cultivated plant
varieties, i.e., products of artificial selection;
varieties other than plant and fungal variatas should be
annotated via /note, e.g. /note="breed:Cukorova"
Qualifier /virion
Definition viral genomic sequence as it is encapsidated (distinguished
from its proviral form integrated in a host cell's chromosome)
Value format none
Example /virion
Comment /virion cannot be used in the same entry/record as /proviral
|