For the sequences we used data constructed by Xie et al. 2008, Genome Res
. The data consists of 22 genomic sequences from mouse genome, each 1000 bp in length. In the first 20 sequences, transcription factors OCT4, SOX2 and FOXD3 are each inserted 3 times, in a region of at most 164 bp. The inserted nucleotides are sampled from the respective TRANSFAC PWMs. The last two sequences have no inserted transcription factors.
For the ChIP-Seq data we used from Chen et al. 2008, Cell
. We download 5 binding peak files for five transcription factors KLF4, NANOG, OCT4, SOX2 and STAT3 from the GEO
database with ID number GSE11431. Each of the data we use 100 genomic sequences, each 500 bp in length centering the top 100 binding peaks for transcription factors (TFs) KLF4, SOX2, OCT4, NANOG and STAT3 respectively.