SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms



This website contains supplementary information and the latest software update of SynTReN.

Release notes

version 1.2 (2007-06-08):

IMPORTANT: some significant changes have occured which are _not_ downwards compatible:

  • file naming conventions have changed: the gene expression dataset now has the suffix "_unnormalized_dataset.txt". For backwards compatibility, the (old) normalized dataset with suffix "_normalized_dataset.txt" is also given.
  • dataset format is changed to the more conventional genes by conditions format.
  • More realistic datasets (introduction of a maximum expression value per gene, improving the realism of the data and removing some artifacts from MA-plots)
  • command line interface is extended: users can now specifiy their own external input file (an example is provided in the data/samples folder) and for example generate a concentration series experiment.

version 1.1.3, 2006-03-23.

A command line interface has been added, which can be launched with syntren_cli.bat or for Windows and Linux users respectively. The command line interface requires an .ini file. Sample ini files can be found in ./data/samples/*.ini.


Tim Van den Bulcke*, Koenraad Van Leemput*,
Bart Naudts, Piet van Remortel, Hongwu Ma, Alain Verschoren, Bart De Moor and Kathleen Marchal

* Contributed equally



The development of algorithms to infer the structure of gene regulatory networks based on expression data is an important subject in bioinformatics research. Validation of these algorithms requires benchmark data sets for which the underlying network is known. Since experimental data sets of the appropriate size and design are usually not available, there is a clear need to generate well-characterized synthetic data sets that allow thorough testing of learning algorithms in a fast and reproducible manner.


In this paper we describe a network generator that creates synthetic transcriptional regulatory networks and produces simulated gene expression data that approximates experimental data. Network topologies are generated by selecting subnetworks from previously described regulatory networks. Interaction kinetics are modeled by equations based on Michaelis-Menten and Hill kinetics. Our results show that the statistical properties of these topologies more closely approximate those of genuine biological networks than do those of different types of random graph models. Several user-definable parameters adjust the complexity of the resulting data set with respect to the structure learning algorithms.


This network generation technique offers a valid alternative to existing methods. The topological characteristics of the generated networks more closely resemble the characteristics of real transcriptional networks. Simulation of the network scales well to large networks. The generator models different types of biological interactions and produces biologically plausible synthetic gene expression data.


Additional software information

Click here for additional information regarding the software and file formats.


Tim Van den Bulcke: tim dot vandenbulcke at esat dot kuleuven dot be
Koenraad Van Leemput: koen dot vanleemput at ua dot ac dot be
Piet van Remortel: piet dot vanremortel at ua dot ac dot be
Kathleen Marchal: kathleen dot marchal at biw dot kuleuven dot be
(corresponding author)

External links