新闻 | 论坛 | 生物信息学专题 | 新思路 | 软件下载 | 相关数据库 | 免费主页

网站首页 BioSino Databese BioSino Lab BioSino Navigator 关于本站

 
站内搜索:  

Biologists in Norway use a computer program to “read” the scientific literature and successfully predict gene interactions

 

[编者的话]

很奇怪这篇文章为什么能发在nature genetics上,仔细想想这也许与国外杂志崇尚创造,尤其是思维上的创新有很大的关系。常常说研究科学一定要有idea,其实idea很多时候只是搁着一层窗户纸,就看你能不能捅破,想到别人想不到的,即使工作很简单,那也是有价值的。

 

Biologists in Norway have used a computer program to "read" the scientific literature and successfully predict gene interactions.

This data-mining of the "biobibliome" provides a way of dealing with the ever-increasing torrent of biological data - millions of papers a year. But even more impressively, the completely automated process can make new genetic discoveries - essentially free research.

Most scientific papers are now published online, but there's no way any person could sift through all of them. "It's beyond human cognition," says geneticist Daniel Masys from the University of California in San Diego.

So Eivind Hovig of the Norwegian Radium Hospital in Norway and his colleagues designed a computer program to do the job. They based their method on one simple assumption: if two genes are mentioned in the same paper, they must be biologically related.

The program scans through the titles and abstracts of scientific papers and picks out names of human genes. The researchers used it to search over 10 million papers on the publicly available database MEDLINE.

Networking

They found references to 13,712 different human genes and built up a network of which genes were likely to be related. Then they annotated the gene network with potential biological functions using the keywords or subject headings associated with each paper.

The result was a database called PubGene. For any particular gene, the researchers can call up a list of other genes it is likely to interact with, in order of probability, and a list of medical areas in which that gene is likely to be involved. This information gives researchers a valuable head start when they are planning their research.

PubGene predicted which genes were likely to be related seven times better than random selection. When tested against other databases for a limited number of genes it only came up with about half the number of interactions that we know about from other methods, such as lab experiments. But it did come up with some relationships that hadn't been predicted before, and that were subsequently found to be real.

"It is an exploratory tool," says Masys. "They don't promise to give all possible insights, but it is an aid to trying to digest and condense these huge amounts of information."

The ultimate goal of scientists trying to analyse biological information is to have computers that can read and understand scientific papers in an intelligent way. "The holy grail is for a computer to be able to read an article like you or I would read it and extract the concepts and relate them all to each other," says Masys. "But we haven't yet got anywhere close to that automated understanding."

More at: Nature Genetics (vol 28, p 21)


1999-2005 中国科学院上海生命科学研究院生物信息中心  
版权所有 All rights reserved.