Supplementary information


Overview
  1. Input data
  2. Two-step procedure
  3. Data reduction
  4. BlockSampler
  5. Evaluation of developed procedure
  6. References
Input data


BENCHMARKDATASETS

The benchmark data sets always consist of orthologs of following species: human, chimp, mouse, rat and Fugu.
The composition of the initial data sets is given in table 1.
Input data files can be downloaded from these links:

  • cfos_int_SINFRUG18.fasta
  • cfos_int_SINFRUG19.fasta
  • cfos_int_SINFRUG87.fasta
  • hoxb2intergenic.fasta
  • pax6intergenic.fasta
  • sclintergenicpaper.fasta


  • ADDITIONAL DATASETS

    These datasets contain more than one distantly related organism (compared to mammals). The constitute of different combinations of human, chimp, mouse, rat, dog, chicken, Fugu, Tetraodon and zebrafish.
    The composition of these data sets is given in The input data files can be downloaded from these links
    Top


    Two-step procedure


    DATA REDUCTION

    Table 2 lists the clusters that contain at least one subsequence from each mammalian ortholog. When a certain cluster contained more than one subsequence from a single ortholog, this cluster was divided into subclusters Figure 1. These are represented by a profile consisting of the constituting subsequence id's (one per ortholog), as given in table 2. Table 3 gives an overview of the generated subclusters.

    Top

    BLOCKSAMPLER


    PARAMETER SETTINGS

    Our analysis flow consists of 3 major algorithms (Avid, TribeMCL, BlockSampler) each of which has its own parameters. Parameter fine-tuning of the major algorithms used in our analysis flow is based on multiple test runs with several benchmark data sets and different parameter settings.
    More details about the choice of parameters can be found here.

    BlockSampler

    Using the two-step procedure we detected respectively 8 significant blocks for hoxb2, 13 for pax6, 1 in scl and none in the cfos data set (see article). To validate these blocks we checked whether they contained transcription factor binding sites: we looked for previously described motifs (Göttgens et al., 2002; Kammandel et al., 1999; Scemama et al., 2002) and we also performed a screening with the Transfac database of vertebrate transcription profiles (Wingender et al., 2001). The result are summarized in the article; a more detailed description of the regulatory motifs recovered in the detected blocks can be found here.

    Top

    Evaluation of developed procedure

    Alignment of conserved blocks (resulting from BlockSampler) compared to the alignments obtained using MAVID (Bray and Pachter, 2003; Bray and Pachter, 2004):

  • hoxb2
  • pax6
  • scl


  • Top

    References

  • references

  • Top