Supplementary information

Overview

Input data
Two-step procedure
Data reduction
BlockSampler
Evaluation of developed procedure
References

Input data

BENCHMARKDATASETS

The benchmark data sets always consist of orthologs of following species: human, chimp, mouse, rat and Fugu.
The composition of the initial data sets is given in table 1.
Input data files can be downloaded from these links:

cfos_int_SINFRUG18.fasta

cfos_int_SINFRUG19.fasta

cfos_int_SINFRUG87.fasta

hoxb2intergenic.fasta

pax6intergenic.fasta

sclintergenicpaper.fasta

ADDITIONAL DATASETS

These datasets contain more than one distantly related organism (compared to mammals). The constitute of different combinations of human, chimp, mouse, rat, dog, chicken, Fugu, Tetraodon and zebrafish.
The composition of these data sets is given in The input data files can be downloaded from these links
Top

Two-step procedure

DATA REDUCTION

Table 2 lists the clusters that contain at least one subsequence from each mammalian ortholog. When a certain cluster contained more than one subsequence from a single ortholog, this cluster was divided into subclusters Figure 1. These are represented by a profile consisting of the constituting subsequence id's (one per ortholog), as given in table 2. Table 3 gives an overview of the generated subclusters.

Top

BLOCKSAMPLER

PARAMETER SETTINGS

Our analysis flow consists of 3 major algorithms (Avid, TribeMCL, BlockSampler) each of which has its own parameters. Parameter fine-tuning of the major algorithms used in our analysis flow is based on multiple test runs with several benchmark data sets and different parameter settings.
More details about the choice of parameters can be found here.

BlockSampler

Using the two-step procedure we detected respectively 8 significant blocks for hoxb2, 13 for pax6, 1 in scl and none in the cfos data set (see article). To validate these blocks we checked whether they contained transcription factor binding sites: we looked for previously described motifs (Göttgens et al., 2002; Kammandel et al., 1999; Scemama et al., 2002) and we also performed a screening with the Transfac database of vertebrate transcription profiles (Wingender et al., 2001). The result are summarized in the article; a more detailed description of the regulatory motifs recovered in the detected blocks can be found here.

Top

Evaluation of developed procedure

Alignment of conserved blocks (resulting from BlockSampler) compared to the alignments obtained using MAVID (Bray and Pachter, 2003; Bray and Pachter, 2004):

hoxb2

pax6

scl

Top

References

references

Top