Supplementary information
- Input data
- Two-step procedure
- Data reduction
- BlockSampler
- Evaluation of developed procedure
- References
BENCHMARKDATASETS
The benchmark data sets always consist of orthologs of following species: human, chimp, mouse, rat and Fugu.
The composition of the initial data sets is given in table 1.
Input data files can be downloaded from these links:
cfos_int_SINFRUG18.fasta
cfos_int_SINFRUG19.fasta
cfos_int_SINFRUG87.fasta
hoxb2intergenic.fasta
pax6intergenic.fasta
sclintergenicpaper.fasta
ADDITIONAL DATASETS
These datasets contain more than one distantly related organism (compared to mammals). The constitute of different combinations of human, chimp, mouse, rat, dog, chicken, Fugu, Tetraodon and zebrafish.
The composition of these data sets is given in
The input data files can be downloaded from these links
Top
DATA REDUCTION
Table 2 lists the clusters that contain at least one subsequence from each mammalian ortholog. When a certain cluster contained more than one subsequence from a single ortholog, this cluster was divided into subclusters Figure 1. These are represented by a profile consisting of the constituting subsequence id's (one per ortholog), as given in table 2. Table 3 gives an overview of the generated subclusters.
Top
BLOCKSAMPLER
PARAMETER SETTINGS
Our analysis flow consists of 3 major algorithms (Avid, TribeMCL, BlockSampler) each of which has its own parameters. Parameter fine-tuning of the major algorithms used in our analysis flow is based on multiple test runs with several benchmark data sets and different parameter settings.
More details about the choice of parameters can be found here.
Using the two-step procedure we detected respectively 8 significant blocks for hoxb2, 13 for pax6, 1 in scl and none in the cfos data set (see article). To validate these blocks we checked whether they contained transcription factor binding sites: we looked for previously described motifs (Göttgens et al., 2002; Kammandel et al., 1999; Scemama et al., 2002) and we also performed a screening with the Transfac database of vertebrate transcription profiles (Wingender et al., 2001). The result are summarized in the article; a more detailed description of the regulatory motifs recovered in the detected blocks can be found here.
Top
Evaluation of developed procedure |
---|
Alignment of conserved blocks (resulting from BlockSampler) compared to the alignments obtained using MAVID (Bray and Pachter, 2003; Bray and Pachter, 2004):
hoxb2
pax6
scl
Top
references
Top