cfos
The cfos gene, a member of the fos gene family of regulator proteins, functions in processes such as cell differentiation, proliferation and apoptosis. fos genes encode leucine zipper proteins that dimerize with Jun family proteins forming the AP1 complex. AP1 activity functions in a wide range of biological processes, such as cell proliferation, differentiation, apoptosis, and oncogenesis (Jochum et al., 2001). Blanchette and Tompa (2002) reported two conserved motifs in the promoters (+ 5'UTR) of cfos orthologs in mouse, hamster, pig, human, and Tetraodon nigroviridis; one of them was also conserved in the promoter region of chicken cfos gene. Because the Fugu pufferfish is closely related to the (fresh water) Tetraodon pufferfish, it is safe to assume that the Fugu intergenic sequence would also contain the two conserved motifs, detected by Blanchette and Tompa (2002) using FootPrinter (Blanchette and Tompa, 2003).
In none of the cfos datasets (Material and Methods) we could detect significantly conserved blocks and thus the two (in pufferfish) conserved motifs previously described by Blanchette and Tompa (2002) could not be recovered. Our results indicate that the overall similarity between the mammalian and Fugu cfos orthologs is low and that the motifs, if biologically functional are not located in a conserved region.

hoxb2
The transcription factor Hoxb2 is expressed within specific rhombomeres (i.e. a transient array of segments that compartmentalize the antero-posterior length of the hindbrain, ranging from r1 to r7) within the developing hindbrain. It also functions in the organization of the neurons in the hindbrain (Davenne et al., 1999; Sham et al., 1993; Vieille-Grosjean et al., 1997). Scemama et al. (2002) showed, using in situ hybridization, that the expression profile of hoxb2 within the rhombomeres is conserved between human, mouse, zebrafish (Danio rerio) and striped bass (Morone saxatilis), whereas expression in the migrating neural crest tissues (observed in zebrafish and other vertebrates) is absent in striped bass. To assess whether the differences in expression of the respective hoxb2 orthologs are the result of differences in regulatory elements in their corresponding intergenic regions, Scemama et al. (2002) performed a comparative analysis of the hoxb2-hoxb3 region of striped bass, zebrafish, Fugu, human and mouse. This pointed out three significantly conserved regions, containing binding sites for transcription factors that are known to be responsible for hoxb2 expression in the rhombomeres.
We identified eight significant blocks in the hoxb2 data set. The location of these blocks on the complete intergenic region of the hoxb2 ortholog of Fugu is shown in the article ( Figure 2). Block hoxb2 1.1 contains both the Hox/Pbx and Meis motifs previously reported by Scemama et al. (2002). Transfac screening pointed out several additional potential binding sites in this block e.g. a second Meis motif and several binding sites for homeodomain proteins. Scemama et al. (2002) also described some additional regulatory motifs located within this same region conserved between the vertebrate species. These motifs (Krox-20, Box1), shown to be essential for hoxb2 expression in rhombomeres in mouse (Sham et al., 1993; Maconochie et al., 1997; Vesque et al., 1996), were not recovered by our methodology using the current selection criteria. Box1, for instance, was detected by BlockSampler, but in a very non-significant block (4th percentile). As was pointed out by Scemama et al. (2002), the location of the motifs box1 and Krox-20 in the different intergenic sequences is not conserved (i.e. upstream from the Meis/Hox/Pbx-location for mouse and downstream in Fugu and human (2002)). As a result, when aligning these motifs, the sequence surrounding these motifs might be less conserved. This explains why they remain undetected by our methodology.
Blocks hoxb2 2.1 to 2.4 contain motifs previously described by Scemama et al. (2002) (see article, Table 2), located in a region conserved in striped bass, zebrafish, mouse and human. At the time Scemama et al. (2002) performed the analysis the Fugu sequence was still incomplete. Therefore, they missed the corresponding region in the Fugu hoxb2intergenic region, which we identified in this study. As was to be expected, many of the motifs present in the striped bass hoxb2a intergenic region are also present in the closely related Fugu orthologous sequence (Fugu and striped bass diverged between 100 and 200 Mya). Screening with Transfac pointed out the presence of additional potential motifs, some of which re-occur in more copies in the different blocks: for instance Cap, CdxA, NF-Y, SRY. Besides at the positions reported by Scemama et al. (2002), some of the previously described motifs, such as the octamer binding site and the CCAAT boxes, were also detected elsewhere in the (cluster 2) sequence (see article, Table 2). These repeated occurrences augment the confidence in the biological functionality of these detected motifs. Some motifs reported by Scemama et al. (2002) to be located within the same conserved region, however, could not be recovered by our methodology, namely URTF, CBF1, Krox-20, Oct1, HNF1 and HOXD8,9,10. The latter three motifs, however, were detected by BlockSampler in cluster 2, but belonged to non-significant blocks (respectively the 4th, 18th and 39th percentile). Concerning URTF, CBF1 and Krox-20, the sensitivity of our methodology was too low to recover these motifs: URTF and CBF1 are lost in the preselection step, probably as a result of the chosen selection criteria. Krox-20 on the other hand is not present in the rat intergenic region and is thus lost in our analysis (see article, Material and Methods, selection procedure). Remark that block hoxb2 2.5 is located much further upstream of the transcription start site, circa 12 kb (see article, Figure 2), as compared to the location of the other detected blocks in the same cluster 2. According to Scemama et al. (2002) (based on Fugu genome consortium) the complete Fugu hoxb2a intergenic region comprises circa 5.4 kb only. This indicates that block hoxb2 2.5 (see article, Table 2) is located outside this hoxb2aintergenic region. Closer inspection (using Ensembl) indicated that a pseudogene is present in the intergenic region of the Fugu hoxb2 ortholog (see article, Material and Methods). The conserved block hoxb2 2.5 thus most likely corresponds to the regulatory region of this pseudogene.
Blocks 3.1 and 3.2 are significant blocks corresponding to conserved regions not previously described by Scemama et al. (2002). They are located near the transcription start site of the Fugu hoxb2a (see article, Figure 2) and contain some putative binding sites for upstream stimulating factors (USF, articleTable 2). TFIID, previously described by Scemama et al. (2002) was detected by BlockSampler, but in a non-significant block (77th percentile) located within cluster 3.

pax6
Pax6 is a regulatory protein that plays a crucial role in the morphogenesis of the eye. It is also an important player during the development of the brain and spinal cord and functions during the development of the pancreas. Using expression studies in mice, Kammandel et al. (1999) identified distinct elements (sequence regions) controlling tissue specific expression. Within these regulatory elements several motifs were identified that are highly conserved in the intergenic sequences of the human, mouse, and Fugu pax6 gene (Kammandel et al., 1999). In the pax6 data set, we detected 13 conserved blocks (see article, Table 3).
Six conserved blocks (pax1.1-pax1.6, article Table 3) were located in mammalian cluster 1 ; their localization within the Fugu complete intergenic sequence is shown in the article (Figure 2). Block pax 1.1 (and Block pax 1.3 and 1.6) contains (a part of) the region described by Kammandel et al. (1999) as minimally required for expression in the lens and cornea (article, Table 3). Additionally, block pax 1.1 contains a motif with consensus "CTTAATG", also described by Kammandel et al. (1999). Transfac screening identified latter motif as a homeobox-binding site (CdxA, see article, Table 3). Interestingly, many more CdxA binding sites were found in the pax6 vertebrate sequences and they mostly occurred multiple times in the conserved blocks (Table 3). Block pax6 1.6 harbours many potential homeobox-binding sites e.g. a homeodomain-binding site also reported by Kammandel et al. (1999) (see article, Table 3). The motifs described above have been shown to be located in a region responsible for expression in eye tissues of head surface ectoderm origin (e.g. lens and cornea) (1999). In the pax1.5 block, located in same region (article, Figure 3), we identified a few not previously described binding sites such as SRY and EN-1 (see article, Table 3). Two blocks, pax6 1.2 and pax6 1.4, correspond to elements controlling expression in the developing pancreas described by Kammandel et al. (1999) (these are spatially separated from pax6 1.1, 1.3, 1.5 and 1.6 as is shown in the article, Figure 2). These blocks are characterized by the presence of a PBX-1 binding site (pax6 1.2) and two motifs for homeodomain-binding sites (in respectively pax6 1.2 and 1.4) as was also previously described by Kammandel et al. (1999). Transfac screening identified the homeodomain-binding site present in block pax6 1.2 as a HoxA3 motif (see article, Table 3).
In Cluster 2, we detected four not previously described conserved blocks: pax6 2.1 to pax6 2.4 (see article, Table 3). These blocks were rich in homeodomain binding sites such as CdxA, Nkx2-5, En-1. When looking at the localization of the identified cluster 2-blocks on the pax6 Fugu intergenic region (see article, Figure 3), it is remarkable that block pax6 2.4 is situated several kb downstream from the other blocks of that cluster, i.e. closer to the transcription start site. This can be due to the presence of a duplicated region within the Fugu intergenic sequence.
Finally, three significantly conserved blocks (pax3.1-pax3.3,;see article, Table 3) were identified in pax6 cluster 3 (see article, Figure 3). Also these blocks were not formerly described but contain many potential binding sites as identified by a screening with Transfac. As was also observed in previous pax6 conserved blocks, homeobox-binding sites, such as CdxA, En-1, Nkx2-5, Hoxa3 and Msx1 are abundantly present. Other binding sites that were identified multiple times in the different pax6 blocks of cluster 3 are for instance the upstream stimulating factor and the sex-determining region Y product (SRY). Block pax6 3.3 also contains three GATA-binding sites.

scl
scl encodes a transcription factor that functions in hematopoiesis and vasculogenesis. Comparative analysis of scl intergenic regions from human, mouse, chicken, Fugu and zebrafish pointed out some strongly conserved regions (Göttgens et al., 2002). In one of these regions, Göttgens et al. (2002) identified five conserved elements, three (two GATA sites and a putative SKN1) of which have previously been shown to play a role in promoter activity in hematopoietic cell lines or to be important for activity of the midbrain enhancer (Bockamp et al., 1995; Sinclair et al., 1999). Göttgens et al. (2002) showed that the two additional unnamed motifs are necessary to ensure full scl promoter activity in erythroid cells. We detected one conserved block in the scl orthologs (see article, Table 4, Figure 2). This block contained the conserved unnamed motifs and the putative SKN1 site formerly described by Göttgens et al. (2002). We could not identify the SKN1 by screening for existing motifs because Transfac does not provide a vertebrate motif matrix for SKN. One of the unnamed motifs was identified by Transfac screening as an En-1 or part of a HOXA3 binding site (see article, Table 4). As in the previous data sets, such homeobox-binding sites were present abundantly.
The two conserved GATA sites, formerly reported by Göttgens et al. (2002) were not detected by our methodology. These two motifs might either be too short or too isolated (i.e. no surrounding sequence conservation) to produce a significantly conserved block. For instance, the GATA site with consensus 'GCTTATCGGG' was recovered in a block with conservation level below our threshold (in the 42nd percentile).

back