Genome-wide integration on transcription factors, histone acetylation and gene expression reveals genes co-regulated by histone modification patterns

 

by  Yayoi Natsume-Kitatani, Motoki Shiga and Hiroshi Mamitsuka

 

This support page includes the source code files of MATLAB and data resources, which are necessary to reproduce the results shown in the paper.

 

To reproduce the results, follow the instructions below.

1. Load the resources listed below on to MATLAB.

              Source code

              SelectCell.mat

              normvector.mat

              movmf_m.mat

 

              Input datasets

              genelist_GP.xls: genelist in GSE9217 (dataset GP)

              genelist_ES.xls: genelist in GSE9840 (dataset ES)

              genelist_TFHM.xls: genelist shared between two ChIP-chip datasets (dataset TR and AH+)

              matrix_GP.txt: gene expression profile in GSE9217 (dataset GP)

              matrix_ES.txt: gene expression profile in GSE9840 (dataset ES)

              matrix_TR.txt: binding t-CDFs for 1756 genes in dataset TR

             matrix_AHplus.txt: binding t-CDFs for 1756 genes in dataset AH+

 

2. Run clustering genes in ChIP-chip data.

   % according to TF-binding

              W_h=matrix_TR*matrix_TR';

              [normvector_h]=normvector(W_h,clsn);

              [bestclust_TR] = movmf_m(W_h,normvector_h,clsn,iter,kappa);

 

              clsn: the number of clusters (eg: 10)

              iter: the number of iterations (eg: 1000)

              kappa: concentration parameter of vMF distribution (eg: 10)

 

   % according to histone acetylation

              W_k=matrix_AHplus*matrix_AHplus';

              [normvector_k]=normvector(W_k,clsn);

              [bestclust_AHplus] = movmf_m(W_k,normvector_k,clsn,iter,kappa);

 

   For your reference, the results to be obtained are included in the following files.

   ("bestclust_TR.txt" and "bestclust_AHplus.txt")

 

3. Run clustering genes in microarray expression data (GSE9217: matrix_GP).

              Run "SelectCell".

              [Bestclust, GeneGroup, GeneGroupList]=SelectCell(matrix_GP, genelist_GP, clsn, iter, kappa, genelist_TFHM, bestclust_TR, bestclust_AHplus);

 

OUTPUT

              Bestclust: cluster IDs of genes in microarray data

              GeneGroup: cluster IDs of pattern-cells with t-values of more than 0.99

              GeneGroupList: genelists of pattern-cells

 

  Other outputs are 1) the number of genes in each cell of TF-HM, 2) t-CDFs of genes in each cell and 3) heatmaps of 1).

 

NOTE:

Cluster IDs are assigned randomly, which might make the resultant IDs different from those in the paper. If you run this software on your own gene expression data, the above parameters need to be replaced with in the followings:

   matrix_GP -> gene expression profile of the microarray dataset

   genelist_GP -> genelist of the dataset

 

The above procedure uses datasets previously reported in the following papers:

Harbison, C.T. et al. (2004) Nature, 431, 99-104.

Kurdistani, S.K. et al. (2004) Cell, 117, 721-733.

Bernstein, B.E. et al. (2004) Genome Biol, 5, R62.

Lee, Y.L. and Lee, C.K. (2008) Mol Cells, 26, 299-307.

GSE9217

GSE9840

 

Also the mixture model estimation part uses the source code, being accompanied with the following paper:

Banerjee, A. et al. (2005) J Mach Learn Res 6: 1-39.