TopicPage for Common Core of Escherichia coli K-12
| ||Minimal Gene Sets         |
The 2097 Common Core genes and 20 Common Core pseudogenes are currently derived from a set of 2150 core E. coli K-12 genes common to five E. coli genomes (Davids 2008).
The Topic CommonCore2150-5 is the starting point for the genes in this Topic.
The Common Core Gene Set
When additional E. coli genomes are included in the derivation of a Common Core, this EcoGene-based set will probably shrink in number.
The 2150 Common Core genes identified by Davids (2008) contain four sequences annotated as IS element genes or IS pseudogenes: insF, insK, insN' and insO'. These IS sequences are HT and are removed from this version of the Common Core so that 2146 genes and pseudogenes are in the EcoGene Common Core gene set linked to this TopicPage.
The 2150 Common Core genes identified by Davids (2008) contain 29 prophage sequences, including 5 pseudogenes. These prophage genes and pseuodgenes are HT sequences and are removed from this version of the Common Core.
Removing the 4 IS and 29 prophage sequences from this list leaves 2117 Common Core genes, including 20 Common Core pseudogenes. The authors were unaware that they were examining pseudogenes so the 20 Core pseudogenes not labeled as Common Core genes until they are studied further. They are located in the Core Pseudogenes subTopic.
EcoGene has currently annotated 2097 intact genes as the Common Core.
The HT and Common Core predictions need to be heuristically evaluated by the user to assess validity. Continuity is one way to acheive this so the HT and Core designation are indicated in the prophage/IS region of the GenePage EcoMaps. Blocks of HT and Core genes can be inspected for inconsistencies. A curated form of these predictions will evolve as this is done within EcoGene with a log file of changes to these HT designations based on heuristic and bioinformatics analysis done in house and taken from the literature, incorporating predictions based on different approaches into a more consistent and accurate set of predictions.
The aga operon provides an example.
The five E. coli organism codes and strains and the NCBI RefSeq accession numbers of the genome sequences used by Davids (2008) to define the common core of genes represented in the Common Core are:
ECOLI: K-12 MG1655
ECOSA: O157:H7 Sakai
ECOED: O157:H7 EDL933
ECOUT: UTI89 (UPEC)
Please see Davids (2008) for their methods.
Bibliography (7 total) : Review Only   Up
Devillers H, Chiapello H, Schbath S, Karoui ME (2011) Robustness assessment of whole bacterial genome segmentations. J Comput Biol 18:1155-65
Vieira G, Sabarly V, Bourguignon PY, Durot M, Le Fčvre F, Mornico D, Vallenet D, Bouvet O, Denamur E, Schachter V, Médigue C (2011) Core and panmetabolism in Escherichia coli. J Bacteriol 193:1461-72