|
Computational detection of genomic cis-regulatory modules |
|
[编者的话] 尽管预测蛋白质编码区的基因预测程序在基因组研究中已经取得很大的成功,但是,关于非编码区以及调控元件的计算分析方法仍然处于发展初期。本文就是相关的一个新算法,作者以蝇胚胎的顺式转录调控的研究为例讨论了这一方法。
Regulation of gene transcription
is crucial for the function and development of all organisms. While gene
prediction programs that identify protein coding sequence are used with
remarkable success in the annotation of genomes, the development of
computational methods to analyze noncoding regions and to delineate
transcriptional control elements is still in itsinfancy. Here we present novel algorithms
to detect cis-regulatory modules through genome wide scans for clusters of
transcription factor binding sites using three levels of prior
information. When binding sites for the factors are known, our statistical
segmentation algorithm, Ahab, yields about 150 putative gap gene regulated
modules, with no adjustable parameters other than a window size. If one or
more related modulesare known, but no binding sites, repeated motifs can
be found by a customized Gibbs sampler and input to Ahab, to predict genes
with similar regulation. Finally using only the genome, we developed a
third algorithm, Argos, that counts and scores clusters of overrepresented
motifs in a window of sequence. Argos recovers many of the known modules,
upstream of the segmentationgenes, with no training data. Complete results
and module predictions across the genome are available
athttp://physics.rockefeller.edu/~siggia/ We have demonstrated, in the case of body patterning in the Drosophila embryo, that our algorithms allow the genome-wide identification of regulatory modules. We believe that Ahab overcomes many problems of recent studies and we estimated the false positive rate to be about 50\%. Argos is the first successful attempt to predict regulatory modules using onlythe genome without training data. Complete results and module predictions across the Drosophila genome are available at http://uqbar.rockefeller.edu/~siggia/.
|
|
|
|
1999-2005 中国科学院上海生命科学研究院生物信息中心 |