|
|
 |
|  |
EcogeneTopic Page
TopicPage for Annotation Science of Escherichia coli K-12
         |
Description:
In order to maximize the benefit of sequencing genomes and to avoid errors during computational and experimental analyses, the intervals defining the information-rich features of a genome, such as coding regions, need to be annotated as accurately as possible. The EcoGene database uses a combined approach of compiling published experimental validation and refining prediction methods to improve the annotation of genomic features. The annotation of genomic sequence features can be structural or functional. Functional annotation involves describing experimental evidence or computer predictions of the function of the gene such as a protein/gene name, activity, description, regulation and mutant phenotype.
Structural annotation is epitomized by the tertiary structural analyses of RNA and proteins and where the relative three dimensional atomic resolution of the spatial postiions of ribonucleotide bases and amino acid residues are specified. Secondary structual annotation of proteins and RNAs can represent intramolecular cysteine bridges and base pairs, respectively, as a two dimensional picture. The accurate 3D and 2D representations of RNA and proteins require that the primary sequence be accurately known. The sequence of an RNA or protein is determined by a theoretical transcription or translation of DNA sequence data.
The sequence annotation that determines the sequence of an RNA or protein product is the start-stop interval in genomic coordinates. The gene intervals in EcoGene are designated by a triple: Left End= smaller genomic coordinate of the gene or feature interval (LE), Right End= larger genomic coordinate (RE), and Orientation (Clockwise or Counterclockwise). The length and composition, as well as the tertiary structure, of an overexpressed protein depends on what translational start codon is chosen to direct translation intiation when designing the expression clone.
Gene intervals are used to create deletion strains and to design and interpret mutational analysis Annotation errors can lead to the misinterpretation of experimental results.
The genomic intervals annotated for non-genic features such as DNA repeats, transcription factor binding sites, and promoters also determine their length and sequence.
Details:
One example of this bilateral approach to annotation science is the use of the Verified Set of compliled N-terminal protein sequence validations to train a new model for ribosome binding sites based on information theory (Shultzaberger, 2001).
Bibliography (14 total) : Review Only   Up
- 2013
- 2012
- 2010
- 2009
-
Benítez-Páez A (2009) Considerations to improve functional annotations in biological databases. OMICS 13:527-35
-
Hu JC, Karp PD, Keseler IM, Krummenacker M, Siegele DA (2009) What we can learn about Escherichia coli through application of Gene Ontology. Trends Microbiol 17:269-78 Review
-
Keilwagen J, Baumbach J, Kohl TA, Grosse I (2009) MotifAdjuster: a tool for computational reassessment of transcription factor binding site annotations. Genome Biol 10:R46
-
Zheng Y, Pósfai J, Morgan RD, Vincze T, Roberts RJ (2009) Using shotgun sequence data to find active restriction enzyme genes. Nucleic Acids Res 37:e1
- 2007
- 2005
- 2001
|
| | |