QDB

Introduction

QDB (Query Driven Biclustering) is a Bayesian query-driven biclustering framework for microarray data in which the prior distributions allow introducing knowledge from a set of seed genes (query) to guide the pattern search. The algorithm has been described and validated in the following paper:
Dhollander T, Sheng Q, Lemmens K, De Moor B, Marchal K, Moreau Y Query-driven module discovery in microarray data Bioinformatics , 23(19):2573-80. (2007).

Supplementary information

Supplementary file 1

Supplementary file 2

Software

The software for query-driven biclustering (QDB) is freely available for ACADEMIC USE ONLY under the license. To download the software, registration is required. For commercial usage, please contact us. The software for query-driven biclustering (QDB) was implemented in R (version 2.4.1). The code consists of a collections of R-scripts. To successfully run the software perform the following steps:

Extract the files to a folder called 'QDB_v1.1'.
Set the R working directory to the folder QDB_v1.1: setwd("/home/usr/QDB_v1.1/")
The QDB-software is called through the script 'qdb_main_CM.R'. You need to modify this script to set the desired parameter settings, to load the expression data, specify the seedgenes and to define the outputfiles for query-driven biclustering.

Changing the parameter settings: set the desired parameters in the 'params' section of the script. Currently, default parameter settings are used. For a detailed description of the different parameters and their default settings we refer to the manuscript.
Define the expression data: specify the filename with the gene expression data in the 'read data sources' section of the script. Expression data is assumed to be a matrix with in the row the genes and the columns referring to the different conditions under which gene expression was measured. The expression data file must be a tab-delimited text file with in the first row the identifiers of the experimental conditions, the first column the gene locus tags and the remainder of the file the expression matrix.
Specify the seedgenes: set the seedgenes in the 'seed genes' section of the script. Seedgenes can be spefied in two different ways: either by their locus tags (note that these should correspond to the row names in the expression data file) or through their indices in the expression data file (i.e. the row number).
Define outputfiles: these files are specified in the 'save the result' section of the script. Two different outpufiles can be specified. The "/tempresult.RData" file contains the output of the whole QDB-run and contains the gene scores, condition scores and loglikelihood-scores for all iterations of the algorithm. This data can be loaded into R using the load-command. The "tempresult.bcl" file is a txt-file that contains a selection of all biclustering results produced by the resolution sweep approach, chosen based on the Akaike Information Criterion described in the manuscript

After modifying the main-file according to your desired settings you can run the main-file in R, this will start the QDB-algorithm: source("/main files/qdb_main_CM.R")

Citation

Dhollander,T. et al. (2007) Query-driven module discovery in microarray data. Bioinformatics, 23, 2573-2580.

Contact

kathleen.marchal<at>biw.kuleuven.be