|
Efficient Boolean implementation of universal sequence maps |
|
[编者的话] 有一些生物信息学家一直在努力尝试着发展一种生物序列的新的表示方法,以利于序列的分析与比较。在1990年,Jeffery发表了一种名为Chaos Game Representation (CGR),该方法受到广泛的关注。本文是在CGR方法上做出的新的突破。
Recently, Almeida and Vinga
offered a new approach for the representation of arbitrary discrete
sequences, referred to as Universal Sequence Maps (USM), and discussed its
applicability to genomic sequence analysis. Their work generalizes and
extends Chaos Game Representation (CGR) of DNA for arbitrary discrete
sequences. We have considered issues
associated with the practical implementation of USMs and offer a variation
on the algorithm that 1) eliminates the overestimation of similar segment
lengths 2) permits the identification of arbitrarily long similar segments
in the context of finite word length coordinate representations, 3) uses
more computationally efficient operations, and 4) provides a simple
conversion for recovering the USM coordinates. Computational performance
comparisons and examples are provided. We have shown that the desirable properties of the USM encoding of nucleotide sequences can be retained in a practical implementation of the algorithm. In addition, the proposed implementation enables determination of local sequence identity at increased speed.
|
|
|
|
1999-2005 中国科学院上海生命科学研究院生物信息中心 |