|
Making sense of EST sequences by CLOBBing them |
|
[编者的话] 利用EST序列进行基因识别曾是非常流行的方法,但是这种方法的错误率很高,为了有效利用EST信息,首先要对其进行聚类,本文是关于这方面的研究,CLOBB是作者开发的自由软件。
Expressed sequence tags (ESTs) are single pass reads from randomly selected cDNA clones. They provide a highly cost-effective method to access and identify expressed genes. However, they are often prone to sequencing errors and typically define incomplete transcripts. To increase the amount of information obtainable from ESTs and reduce sequencing errors, it is necessary to cluster ESTs into groups sharing significant sequence similarity. As part of our ongoing EST
programs investigating `orphan' genomes, we have developed a clustering
algorithm, CLOBB (Cluster on the basis of BLAST similarity) to identify
and cluster ESTs. CLOBB may be used incrementally, preserving original
cluster designations. It tracks cluster-specific events such as merging,
identifies `superclusters' of related clusters and avoids the expansion of
chimeric clusters. Based on the Perl scripting language, CLOBB is highly
portable relying only on a local installation of NCBI's freely available
BLAST executable and can be usefully applied to > 95 % of the current
EST datasets. Analysis of the Danio rerio EST dataset demonstrates that
CLOBB compares favourably with two less portable systems, UniGene and TIGR
Gene Indices. CLOBB provides a highly portable EST clustering solution and is freely downloaded from:http://www.nematodes.org/CLOBB
|
|
|
|
1999-2005 中国科学院上海生命科学研究院生物信息中心 |