|
Whole-proteome interaction mining |
|
[编者的话] 后基因组时代科学与技术的发展目标就是试图阐述基因组所编码蛋白的功能。为实现这一目的,其中一个策略就是研究蛋白质组中蛋白与蛋白的相互作用,然后找出在这种相互作用中起作用的代谢与调控路径以及蛋白结构上的特点,从而最终确定个体蛋白的功能角色。 下文所采用的方法是采用数据挖掘与知识学习的方法利用现有的数据来探讨蛋白质组水平上的蛋白与蛋白相互作用。
A major post-genomic scientific
and technological pursuit is to describe the functions performed by the
proteins encoded by the genome. One strategy is to first identify the
protein–protein interactions in a proteome, then determine pathways and
overall structure relating these interactions, and finally to
statistically infer functional roles of individual proteins. Although huge
amounts of genomic data are at hand, current experimental protein
interaction assays must overcome technical problems to scale-up for
high-throughput analysis. In the meantime, bioinformatics approaches may
help bridge the information gap required for inference of protein
function. In this paper, a previously described data mining approach to
prediction of protein–protein interactions (Bock and Gough, 2001,
Bioinformatics, 17, 455–460) is extended to interaction mining on a
proteome-wide scale. An algorithm (the phylogenetic bootstrap) is
introduced, which suggests traversal of a phenogram, interleaving rounds
of computation and experiment, to develop a knowledge base of protein
interactions in genetically-similar organisms. The interaction mining approach was demonstrated by building a learning system based on 1,039 experimentally validated protein–protein interactions in the human gastric bacterium Helicobacter pylori. An estimate of the generalization performance of the classifier was derived from 10-fold cross-validation, which indicated expected upper bounds on precision of 80% and sensitivity of 69% when applied to related organisms. One such organism is the enteric pathogen Campylobacter jejuni, in which comprehensive machine learning prediction of all possible pairwise protein–protein interactions was performed. The resulting network of interactions shares an average protein connectivity characteristic in common with previous investigations reported in the literature, offering strong evidence supporting the biological feasibility of the hypothesized map. For inferences about complete proteomes in which the number of pairwise non-interactions is expected to be much larger than the number of actual interactions, we anticipate that the sensitivity will remain the same but precision may decrease. We present specific biological examples of two subnetworks of protein–protein interactions in C. jejuni resulting from the application of this approach, including elements of a two-component signal transduction systems for thermoregulation, and a ferritin uptake network.
|
|
|
|
1999-2005 中国科学院上海生命科学研究院生物信息中心 |