SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms

Home
Software information

About

Additional information on the SynTReN software package and the generated files.

Installation & launching

The software is provided as an executable jar file. No installation is required, just place these files in a folder where you have write access to. You can launch the software either by double-clicking on SynTReN.jar or by using the following command line:

java -jar SynTReN.jar

Generated files

All generated files are by default saved into a folder called "./data/results". The files have the following template:

nn<v>_nbgr<w>_hop<x>_bionoise<y>_expnoise<z>_(neigh|clust)Add_ (dataset|external|network|correlatedExternal).(txt|sif|xml)

where:
  • v: the number of nodes of the 'foreground' network
  • w: the number of nodes of the 'background' network
  • x: the probability that an interaction is 'complex' (as explained in the paper)
  • y: the amount of biological noise
  • z: the amount of experimental noise
  • (neigh|clust): neighbour addition or cluster addition method
  • (dataset|external|network|correlatedExternal): the dataset (tab-delimited), the set of external nodes, the complete description of the network (both in xml and sif) and the list of correlated external inputs respectively

Content of the files

The content of the files with the following suffixes is:
  • _correlatedExternal.txt: This file contains a list of external input genes that are correlated. Each line represents a pair of correlated inputs according to the template node <gene1> correlated with node <gene2>
  • _dataset.txt: This file contains the simulated microarray dataset in tab-delimited format. Columns are the different genes and each row gives the expression values for these genes under a different experimental condition.
  • _external.txt: The list of genes that are external inputs to the network.
  • _network.sif: The network in SIF format (Structure Information File). Each line represents an interaction between two genes. The template is: <gene1> (ac|re|du) <gene2>, where ac represents an activating interaction, re represents a repression interaction, and du represents a dual interaction. Dual interactions are randomly chosen as either activating or repressing for the generation of the simulated data.
  • _network.xml: This file is self-documenting to a certain extent. A list of all the nodes (=genes) is given. For each node, the type is given (e.g. defining external nodes), the noise model and the strength of the noise, the interaction type and the parameters specifying the interaction. The file also contains an edge list defining the interactions between the different genes. For the interaction type of a node, the order in which the inputs of that particular node are considered for the interaction equation, is determined by the order in which the incoming edges for that node are defined in the xml file.

Software requirements:

  • Java 5.0 VM or higher