TopicPage for Minimal Gene Sets of Escherichia coli K-12
| || Comparative Genomics and Homology Analysis Minimal Genome         |
One approach to defining a minimal genome is to find the genes that are present in all the genomes in a multi-genome comparative sequence analysis.
Typically a Minimal Gene Set, or Common Core of genetic information, includes well conserved housekeeping genes for basic metabolism and macromolecular synthesis, many of which are essential genes. E. coli has numerous representatives of genes present in the oldest living ancestor genome as deduced by these methods.
The minimal set of genes obtained is a function of which and how many genomes are chosen for the analysis. As more E. coli genomes are completed, the Common Core will be more defined. It is simple to exclude a gene from the Common Core because it is missing from only one genome when relatively few genomes are compared. But when a large number of genomes are compared, the rare gene loss from one strain need not always be taken as exclusionary from a true Common Core as that strain might have acquired a horizontally transmitted (HT) gene that allowed for the gene loss only in that strain.
Nonetheless, within EcoGene a strict definition of Common Core genes is desired for consistency and all genomes used to define a Common Core should have an allele of the core gene. A pseudogene allele is couinted as the presence of the gene, but the pseudogene is noted. In most cases gene will excluded from the intraspecies Common Core gene sets in EcoGene.
If a published or outside unpubllished study has a gene set annotated in EcoGene that allows for one or a few genomes to lack the a Common Core gene, it will be noted in the description on the TopicPage for that publication. Otherwise, all Common Core genes are in all genomes compared.
In the case of cross-species Common Cores, all compared genomes that define the Common Core subtype must have a copy of the Common Core ortholog. At phylogenetic distances great enough that synteny with E. coli has eroded, that neighbohood-based confirmation of orthology is lost. A nearly isofunctional paralog can compensate for the loss of a Common Core ortholog in some species, and so as long as the phylogenetic distance is consistent with orthology these will be included. Because of this, cross-species comparisons outside the Enterobacteriaceae can include clusters of orthologs into the Common Core, which makes it necessary to include more than one member of the Common Core from each genome. To avoid this dilution of our concept of the Common Core, we limit our cross-species comparisons to the Enterobacteriaceae when we have confirmation of orthology from neighborhood analysis.
Each Common Core indexed as EcoTopics is associated with a specific reference, either a publication or documentation associated with an unpublished data set. This reference will explain the alignment algoritms, cutoffs and other criteria used to define their Common Core, summarized on the TopicPages. Note that "Common Core" is EcoGene terminology and that various studies refer to cores or Minimal Gene Sets (MGS) or use other terms.
The Common Core subtypes have an identifier for the EcoTopic asociated with that Common Core gene set. The first dataset entered is CommonCore2158-5.
Davids (2008) performed a comparative analysis of five E. coli genomes to derive a Common Core of 2158 K-12 genes shared by all five E. coli genomes.
Mulkidjanian (2006) provide an example of what can be learned by comparing 14 cyanobacterial genomes.
Bibliography (7 total) : Review Only   Up
Mulkidjanian AY, Koonin EV, Makarova KS, Mekhedov SL, Sorokin A, Wolf YI, Dufresne A, Partensky F, Burd H, Kaznadzey D, Haselkorn R, Galperin MY (2006) The cyanobacterial genome core and the origin of photosynthesis. Proc Natl Acad Sci U S A 103:13126-31