Table 2: Details on the chosen set of clusters for each (benchmark + additional) data set.

(a) Name of data set; (b) similarity parameter and (c) inflation value parameter used to generate these set of clusters; (d) number of clusters in the chosen set and (e) the number of subsequences in the largest cluster.

Gene (a)	P (b)	I (c)	# clusters (d)	# el largest cluster (e)
cfos	0	4	12	5
hoxb2	-10	4	4	6
pax6	0	4	20	6
scl	-10	4	11	4
EGR3	0	4	11	8
GSH1	-10	4	12	4
HIV-EP1	0	4	13	6
HOXB5	0	4	1	4
MEIS2	0	4	24	6
PCHD8	0	4	14	4