Understanding the Results Folder
The results folder has the following structure.The opt folder
This folder contains all the paths and networks resulting from the analysis with the selected parameters.
This group of files (HT1_paths, HT2_paths, ..., etc) contains the results of the path-finding step for each population. The paths define the set of edges in the topology weighed interaction network that connect mutated genes from one population to any of the mutated genes occurring in the other populations.
Each path is assigned a weight reflecting the degree of belief that the path contributes to the adaptive phenotype. The path weight is derived from the network topology and the relevance scores of the begin and end nodes. For more information on the path weights see the extended help file.
The file paths.txt collects all paths detected in all populations.
In the example below the path is given from b1341 (start) occurring in HT1 to b1296 (end) where the path is defined as a set of connected edges, each with their own weight. The weight of the path is here 0.2641678655493167.
b2370 b2370 edge_pp(b2370,b2219,Directed)(0.8892726820276319)
b2219 b2219 edge_pp(b2219,b2220,Directed)(0.9046505351008906)
b2220 b2220 edge_pp(b2220,b1296,Directed)(0.9180894228446748)
b1296 (0.2641678655493167) Downstream
Edge Cost Folders
The optimization strategy or sub-network inference step uses as input the paths found in the path-finding step to search for a sub-network that connects as many mutations as possible from different population using the least number of edges. The latter is imposed by the cost parameter.
By default a sweep is performed over the cost parameter. Per tested cost parameter the stochastic algorithm is tested multiple times (indicated by the number of repeats parameter). Hence for each cost parameter a separate folder exists that contains for each repeated run of the algorithm a network file with the selected nodes/edges (sub-network).
For each cost parameter, the network with the highest score is selected (best.result.network). Per cost parameter there is also a folder results_summary which contains an overview of the summary statistics of the networks obtained for each different repeated run performed at that cost ‘networkMetrics’ and a comparison of the different networks in terms of score, genes and edges (‘solutionstability’).
Resulting Networks Folder
In the folder ‘resulting_networks’ the best network obtained during the selection sweep is provided ‘highestScoringSubnetwork’ together with the merged network (resultingSubnetwork.weightednetwork). This information is contained in result_(# of repeats).network (where is enumerated at which cost a particular edge or node was recovered). Nodes are also prioritized based on the maximum cost at which they were first recovered.
This information is contained in the rankedMutations.txt. This is a tabdelimited file in which gene is followed by its rank. The number of ranks correspond to the number of times the cost parameter was varied during the sweep. So the smaller the steps during the sweep the higher the resolution of the ranking. Note that sometimes the folder for a particular cost is empty, meaning that no network could be selected at a particular cost (cost was taken too stringent).
Networks are provided in a format that can be visualized in html. Visualization is in any html browser without internet connection. The combination of "CTRL or CMD" keys + mouse scroll can be used to zoom in and out (if at first sight you do not see any network try to zoom out).
Other formats (.sif, .xgmml) that allow offline visualization in other platforms as Cytoscape are also provided.