|
Biologists in Norway use a computer program to “read” the scientific literature and successfully predict gene interactions |
|
[编者的话] 很奇怪这篇文章为什么能发在nature genetics上,仔细想想这也许与国外杂志崇尚创造,尤其是思维上的创新有很大的关系。常常说研究科学一定要有idea,其实idea很多时候只是搁着一层窗户纸,就看你能不能捅破,想到别人想不到的,即使工作很简单,那也是有价值的。 Biologists in Norway have used
a computer program to "read" the scientific literature and
successfully predict gene interactions. This data-mining of the
"biobibliome" provides a way of dealing with the
ever-increasing torrent of biological data - millions of papers a year.
But even more impressively, the completely automated process can make
new genetic discoveries - essentially free research. Most scientific papers are now
published online, but there's no way any person could sift through all
of them. "It's beyond human cognition," says geneticist Daniel
Masys from the University of California in San Diego. So Eivind Hovig of the
Norwegian Radium Hospital in Norway and his colleagues designed a
computer program to do the job. They based their method on one simple
assumption: if two genes are mentioned in the same paper, they must be
biologically related. The program scans through the
titles and abstracts of scientific papers and picks out names of human
genes. The researchers used it to search over 10 million papers on the
publicly available database MEDLINE. Networking They found references to 13,712
different human genes and built up a network of which genes were likely
to be related. Then they annotated the gene network with potential
biological functions using the keywords or subject headings associated
with each paper. The result was a database
called PubGene. For any particular gene, the researchers can call up a
list of other genes it is likely to interact with, in order of
probability, and a list of medical areas in which that gene is likely to
be involved. This information gives researchers a valuable head start
when they are planning their research. PubGene predicted which genes
were likely to be related seven times better than random selection. When
tested against other databases for a limited number of genes it only
came up with about half the number of interactions that we know about
from other methods, such as lab experiments. But it did come up with
some relationships that hadn't been predicted before, and that were
subsequently found to be real. "It is an exploratory
tool," says Masys. "They don't promise to give all possible
insights, but it is an aid to trying to digest and condense these huge
amounts of information." The ultimate goal of scientists
trying to analyse biological information is to have computers that can
read and understand scientific papers in an intelligent way. "The
holy grail is for a computer to be able to read an article like you or I
would read it and extract the concepts and relate them all to each
other," says Masys. "But we haven't yet got anywhere close to
that automated understanding." More at: Nature Genetics (vol
28, p 21)
|
|
|
|
1999-2005 中国科学院上海生命科学研究院生物信息中心 |