|Contact: Dan Krotz, firstname.lastname@example.org|
There’s a reason scientific progress is slow: in genomic studies, for example, it may take weeks for a biologist to comb through a stack of journal articles and discover that one gene is functionally related to another. This relationship could lead to a new way to fight a disease, but not if it remains hidden.
Berkeley Lab researchers hope to accelerate this needle-in-a-haystack hunt with an innovative search engine that simulates the way scientists think. It’s called GenoPharm, and rather than search through data by keyword, like Google does, it searches by association, like scientists do.
“GenoPharm mimics the way a biologist searches through biomedical literature for connections between genes,” says Kasian Franks of Berkeley Lab’s Life Sciences Division, who developed the software with Life Sciences biologists Mina Bissell and Connie Myers. “It could enable a biologist to do in minutes what now takes them days.”
To use GenoPharm, a person enters a gene symbol and selects a context such as “molecular function” or “therapeutics.” The result is a web of relationships, with genes that appear more closely together in scientific literature appearing more closely together in the web. Plug in “BRAC-1,” for example, which is a gene that plays a role in breast cancer, and a GenoPharm search yields a sprawling network of associations. Some connections are known, some aren’t. By following one thread of relationships, a researcher can learn that BRAC-1 is linked to a gene that performs DNA binding functions, which is related to another gene that’s the target of a drug that slows the growth of cancer cells.
“We are able to find indirect connections between genes and therapies that haven’t been noticed before,” says Franks. “The system is meant to serve as an add-on to a biologist’s brain.”
The idea for a search engine that maps associations came to Franks by way of his three young children. He noticed how each child processed information by taking two pieces of knowledge, combining them, and coming up with something new. Franks wondered whether he could get a computer to do the same thing — that is, help a biologist connect two genes in a previously unknown way.
He turned to the Geneva Development System, something akin to a search engine factory developed at Berkeley Lab to find contextual relationships in biomedical databases. The system measures the proximity of every word to every other word in millions of documents, and, when asked, reveals how a specific word is related to others. In developing the system, the team drew its inspiration from the way a person’s brain works when asked to list the words associated with the word “sky.” Nearly always, a person will immediately respond with “blue” and “cloud” largely because they are accustomed to seeing these words very near “sky” in text.
“We are literally mimicking the process of auto-association, which is a cognitive principle that describes how a human stores and recollects information,” says Franks.
In this manner, GenoPharm focuses the Geneva Development System’s powers on a database of 70,000 gene descriptions and PubMed functional references. Once an associative network surrounding a gene is generated, a separate database maps relevant diseases and therapies to each gene, creating an interlinking web of genes, diseases, and their therapies.
The system is still in the developmental stage. As Franks says, it isn’t easy getting a computer to do what comes naturally to a child, but his goal is to narrow the gap separating how computers and people process information.
“The successful outcome of this effort is in part due to the support of Mina Bissell’s Lab, which has encouraged unique multidisciplinary approaches in biological research,” says Franks. “We want to create a biomimetic search engine. And we want to show how it is different than what we have today because I think search engines need to change. After all, who needs 100,000 documents if they lack context?”
A Search Engine that Thinks, Almost
Feature Story • March 31, 2005
. . . . .