CRISPR Repeats

Description:

The E. coli CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) repeats are 29 base pair tandem direct intergenic repeats with 32 or 33 bp variable non-repetitive sequence spacers, previously referred to as the iap-linked (IAP) repeats or 29 bp repeats (Nakata, 1989; Bachellier, 1996; Rudd, 1999).

14 CRISPR repeats (Cluster I) are located downstream of the iap gene encoding the alkaline phosphatase isozyme conversion aminopeptidase Iap. The first CRISPR repeats were discovered when the E. coli iap gene was sequenced (Nishino, 1987). Nishino et al. (1987) report in the last paragraph their discovery of the first five (of fourteen) CRISPR repeats downstream of the iap gene, which constitute an "unusual structure". They recognized the 29/32 repeat/non-repeat pattern, presented a multiple alignment, distinguished CRISPR from the REP sequences as a novel intergenic repeat family, and noted a dyad symmetry (Nishino, 1987).

A follow-up publication focused on the E. coli CRISPR repeats, sequencing the rest of Cluster I and discovering Cluster II of seven CRISPR repeats 25 kb clockwise of Cluster I, 640 bp downstream of ygcE (Nakata, 1989). They also demonstrated that the Salmonella genome had CRISPR by cross-hybridizing DNA, showing that CRISPR are not limited to the E. coli genome.

Two additional CRISPR repeats (Cluster III) are located near Cluster II, 47 downstream of the ygcE gene, for a total of 23 Ecoli type CRISPR repeats in E. coli (Bachellier, 1996). Alternate CRISPR cluster designations refer to Cluster I as CRISPR1 and designate cluster Cluster II and Cluster III as a gapped cluster CRISPR2 (Touchon, 2010). An Ypest type CRISPR cluster has been identified in E. coli K-12 between the clpA and serW genes and designated as the CRISPR3 cluster; K-12 has no Ypest type cas genes, although some other E. coli strains do contain Ypest type cas genes at a novel CRISPR4 cluster adjacent to CRISPR3 (Touchon, 2010).

Touchon, 2010 Figure 2

Touchon, 2010 Figure 5

Details:

CRISPR gene nomeclature

The cas and cse nomenclature is complicated by the use of numbers in the gene names. Numbers are not allowed in E. coli gene names because they are easily confused with allele numbers. Generally a conversion of 1,2,3 to A,B,C is done to create similar gene names in EcoGene. Thus the earlier designations cas1, cas2, cas3, cas5, cse1, cse2, cse3, and cse4 were converted to casA, casB, casC, casE, cseA, cseB, cseC, and cseD, however it appears that these designations have not been utilized widely. Previously in the EcoGene annotations, in the absence of functional information, the ygene names were retained and the cas and cse names were used as synonyms.

Now, a new gene nomenclature has been proposed based on the functional characterization of a casABCDE (p reviously denoted as cseABD-casE-cseC) encoded Cascade complex that cleaves the CRISPR RNA precursor and retains the mature crRNAs to perform an anti-viral function assisted by the predicted RNA helicase YgcB(Cas3), as demonstrated with engineered anti-lambda sequences (Brouns, 2008). The previous EcoGene-only designations have now been replaced by the original numbered designations as synonyms only to reduce confusion. The new casA-E designations are used as primary gene names and the original ygene names are used as synonyms. The ygene names are retained for the remaining three cas/cse genes, including the YgcB(Cas3) helicase/annealase, the YgbT(Cas1) endonuclease and the predicted endonuclease YgbF(Cas2). CasE(YgcH) can cleave or facilitate autocleavage of CRISPR RNA (crRNA) precursors to ~67 nt fragments (Brouns, 2008).

Ecoli type CRISPR repeats in EcoGene are numbered according to their order on the chromosome. The two Ypest CRISPRs are labeled as Ypest_CRISPR-L for the intact repeat (28/28) and Ypest-CRISPR'-R for the partial repeat (16/28).

CRISPR Families

The CRISPR name was assigned to this family of intergenic repeats when a set of conserved adjacent genes were discovered as uniquely associated with CRISPR sequences (Jansen, 2002). CRISPR repeats and their associated proteins have been predicted to function at the RNA level to protect agains plasmids and phage, analogous to RNAi (Makarova, 2006).

CRISPR repeats are present in many prokaryotes and can have different lengths and sequences than the E. coli K-12 CRISPR repeats (Mojica, 2000; Touchon, 2010). Similar to the use of REP PCR to type enterobacterial strains, CRISPR PCR is used to type Mycobacterium tuberculosis complex strains in a process called spoligotyping (Brudey, 2006). Campylobacter strains are also typed using CRISPR sequences (Schouls, 2003). The E. coli iap-linked CRISPR repeats define a CRISPR subclass found in a subset of prokaryotic genomes, called the Ecoli subtype (Haft, 2005b). Ypest is another subtype.

E. coli K-12 has representatives of 4/5 of the Cas1-5 (CRISPR-associated) gene families: ygbT(cas1), ygbF(cas2), ygcB(cas3), and casD(ygcI,cas5). Cse1-4 are also conserved gene families associated with the Ecoli CRISPR subtype (Haft, 2005b). In E. coli K-12 these four Cse family genes are casA(ygcL,cse1), casB(ygcK,cse2), casE(ygcH,cse3), and casC(ygcJ,cse4).

CRISPR Function

A function of the E. coli Cascade CasABCDE-crRNA complex appears to be protection against phage, plasmid and transposon infection; CRISPR in other bacteria can help protect cells from phage and plasmid infection, including adaptation by dynamic spacer acquistion (Mojica, 2005; Barrangou, 2007; Brouns, 2008). CRISPR spacers can be derived from phage or plasmids (Pourcel, 2005; Bolotin, 2005; Mojica, 2005), although the E. coli K-12 Ecoli type CRISPR spacers have no DNA sequence identities in as-yet sequenced phage, plasmid or transposon DNA (Mojica, 2005; Touchon, 2010).

The E. coli K-12 CRISPR-cas promoters are repressed by Hns (Pul, 2010). The E. coli K-12 CRISPR system can be activated by an hns mutation to acquire spacers and cure an invading plasmid (Swarts, 2012). Overexpression of the Cas proteins in E. coli BL21 also leads to adaptation by spacer acquistion; Cas1, Cas2, a single CRISPR pair and the leader sequence are required for adaptation; the inserted repeat is always replicated from the leader proximal repeat (Yosef, 2012).

The Cas1(YgbT) multifunctional endonuclease and the CRISPR repeats may be involved in host DNA repair and chromosome segregation; ygbT(cas1) mutants are UV sensitive; genetic interactions suggest Cas1 involvement in RecBC and RuvB mediated DNA repair (Babu, 2011). Unlike Cas3 from S. thermophilus, E. coli Cas3 does not possess HD domain-dependent endonuclease activity (Sinkunas, 2011; Jamieson, 2011). E. coli Cas3 has two antagonistic activities, ATP-independent magnesium-dependent DNA-RNA R-loop annealing and ATP-dependent helicase RNA unwinding from a model R-loop, toggled by ATP levels; the Cascade complex can also promote R-loop formation (Jamieson, 2011).

Anti-Ypest_CRISPR

An anti-CRISPR spacer at the Ypest type CRISPR3 locus was identified in K-12 and other E. coli strains that lack the Ypest type CRISPR4 locus; this anti-CRISPR spacer is directed at the Ypest cas1 gene and presumably protects K-12 from invasion by phage and plasmid encoded Ypest type CRISPR systems (Touchon, 2010). The last 26 bp of the Ypest type 32 bp spacer is exactly complementary to bases 98-123 of the cas1 gene from E. coli UTI89 (UTI89_C0890, NCBI RefSeq NC_007946.1 from 887218 to 888201 bp).

Ypest type anti-CRISPR Sequences and Locations

Ypest_CRISPR-L is GTTCACTGCCGTACAGGCAGCTTAGAAA, located between genome coordinates 924976 and 925003 bp.
Ypest_CRISPR'-R is GTTCACTGCCGTACAG, located between genome coordinates 925036 and 925051 bp.
Anti-Ypest type cas1 spacer is GGTAACATACTCCACCCGCCCACCAT, located between genome coordinates 925010 and 925035 bp.


Ecoli type CRISPR Locations

CRISPR

Left End

Right End

Length

Orientation

 

 

 

 

 

CRISPR1

2875662

2875690

29

Clockwise

CRISPR2

2875723

2875751

29

Clockwise

CRISPR3

2875784

2875812

29

Clockwise

CRISPR4

2875845

2875873

29

Clockwise

CRISPR5

2875906

2875934

29

Clockwise

CRISPR6

2875967

2875995

29

Clockwise

CRISPR7

2876028

2876056

29

Clockwise

CRISPR8

2876089

2876117

29

Clockwise

CRISPR9

2876150

2876178

29

Clockwise

CRISPR10

2876212

2876240

29

Clockwise

CRISPR11

2876274

2876302

29

Clockwise

CRISPR12

2876335

2876363

29

Clockwise

CRISPR13

2876396

2876424

29

Clockwise

CRISPR14

2876457

2876485

29

Clockwise

CRISPR22

2901443

2901470

28

Clockwise

CRISPR23

2901503

2901531

29

Clockwise

CRISPR15

2902035

2902063

29

Clockwise

CRISPR16

2902096

2902124

29

Clockwise

CRISPR17

2902157

2902185

29

Clockwise

CRISPR18

2902218

2902246

29

Clockwise

CRISPR19

2902279

2902307

29

Clockwise

CRISPR20

2902340

2902368

29

Clockwise

CRISPR21

2902401

2902429

29

Clockwise

 

 

 

 

 

CRISPR-I

2875662

2876485

824

Clockwise

CRISPR-III

2901443

2901531

89

Clockwise

CRISPR-II

2902035

2902429

395

Clockwise


CRISPR databases

CRISPRdb is a database cataloging the CRISPR repeats in prokaryotes (Grissa, 2007).

The CRISPR subtypes have been organized as a guild using the TIGR Genome Properties system (Haft, 2005a). A TIGR website allows one to search CRISPR subtypes across genomes, and has information pages for the Genome Properties (GenProp0065), the CRISPR guild (GenProp0021), and the Ecoli subtype (GenProp0315).

Sophie Bachellier has a webpage about the E. coli CRISPR repeats, which refers to them as the 29 bp repeats.

There is also a CRISPR Wikipedia entry.


Non-PubMed Reviews

Bachellier S., Gilson E., Hofnung M., Hill C.W.,
in: Neidhardt F.C., Curtiss R., Ingraham J.L., Lin E.C.C., Low K.B., Magasanik B., Reznikoff W.S., Riley M., Schaechter M. Umbarger H.E., (Eds.),
Escherichia coli and Salmonella:: Cellular and Molecular Biology, ASM Press, Washington,D.C., 1996, pp. 2012-2040.