Berkeley Lab bioscientists Nomi Harris and Chris Mungall at the Aquatic Park Office. (Credit: Laurent-Philippe Albou/Berkeley Lab)

Rare diseases are … rare, right? Not as rare as you might think. As much as 10% of the population is thought to have a “rare disease.” Unfortunately, due to a lack of understanding, many rare diseases remain very difficult to diagnose and treat.

Inspired by the enormous unmet needs of people with rare diseases, a group of scientists from across the globe has teamed up to develop open-access tools and resources for sharing disease characteristics and treatment information. The research is centered around an artificial intelligence-enabled catalog of disease descriptions called Mondo, which, like a Wikipedia for rare diseases, can be added to and improved by the scientific and medical community.

In a recent commentary in Nature Reviews Drug Discovery, the group explained how agreeing on precise definitions of each rare disease can lead to more accurate diagnoses and better treatments. They also shared results from a preliminary analysis that suggests that the number of different rare diseases may be higher than previously estimated.

The project team, led by Melissa Haendel of Oregon Health & Science University, and Tudor Oprea of the University of New Mexico, includes Lawrence Berkeley National Laboratory (Berkeley Lab) researchers Chris Mungall, Nomi Harris, Deepak Unni, and Marcin Joachimiak. We spoke with Chris and Nomi about the project and why they are participating in it.

How do we decide what qualifies as a rare disease? 

Nomi: There’s no single definition of “rare disease” because it depends on which region or group you’re talking about. In the U.S., a rare disease is legally defined as one that affects fewer than 200,000 people; in the EU, a rare disease is one that affects fewer than 1 in 2,000 people. Some diseases are rare in some groups but common in others – for example, Tay-Sachs disease is rare in the general population, but much more common in Ashkenazi Jews, and tuberculosis is rare in the U.S. but is one of the top 10 causes of death worldwide.

All of us almost certainly know someone who has a rare disease, though they may be undiagnosed.

How are the current systems or protocols for classifying rare diseases translating into problems in patient care? 

(Credit: iStock/marchmeena29)

Nomi: To diagnose and treat a disease, we need to know how to define and characterize the disease. For common diseases, there are many cases to observe, so we have a pretty good idea of what that disease looks like – what the symptoms are, how to test for it, how to treat it. For rare diseases, there may be only scattered information – maybe one physician in South America has seen a case, and one researcher in China, but they aren’t sharing their information, so we don’t have a complete picture of what that disease looks like. And if we can’t precisely define a disease, then it’s hard to reliably diagnose it, and even harder to treat it optimally.

Our preliminary analysis, included in the commentary, suggests that the number of rare diseases may be higher than we thought – maybe around 10,000 different diseases, rather than the 5,000-7,000 that has previously been estimated. That means that distinct rare diseases (for example, different varieties of thyroid cancer) have probably been lumped together, when there might be different subtypes that benefit from different treatments.

What needs to be done to improve and expedite rare disease research, diagnosis, and treatment?

Chris: As Nomi mentioned, it’s hard to come up with the best treatment for a disease if you’re not even sure what exactly that disease looks like, or if it is confused with a similar disease. To address this, our team is working to catalog the whole landscape of rare diseases. We’re bringing together separate efforts in rare disease research, and developing computational tools to help experts come up with a precise definition for each rare disease. We developed a new artificial intelligence algorithm that helps disambiguate and unify the disease definitions from different databases and reference sources. We call this unified set of disease definitions “Mondo,” from the Italian word for “world,” because it brings together information from all over the world.

To accelerate this important work, we hope that funding and regulatory agencies, patient advocacy groups, and biomedical researchers will join together to support a coordinated effort to build a complete catalog of rare diseases.

How can Berkeley Lab play a role in this effort?

Chris: Berkeley Lab has been at the forefront of efforts to establish standards for representing and sharing biomedical data. My specialty is ontologies, which are like specialized vocabularies for precisely describing a class of things, such as symptoms, diseases, biochemical processes, or even entire ecological systems. One of the most widely used ontologies in biological science, the Gene Ontology, was launched by a team that included several Berkeley Lab researchers. My group has helped to build many other important biomedical ontologies, including Mondo, and we write computational tools to help others build, use, and expand ontologies.

There are many advantages to engaging in this type of work at Berkeley Lab, including the presence of leading researchers in computer science, biology, and other relevant fields, and also a commitment to open science – meaning that anyone in the world is free to not only use the resources we develop, but also to contribute to them. When we’re attacking a big problem like accurately defining all rare diseases, we can use all the help we can get!

Berkeley Lab is a great place to engage in this research, but I also want to recognize the key contributions of our talented Mondo collaborators at Oregon State University, the Jackson Laboratory, the European Bioinformatics Institute, and many others.

What motivated you both, personally, to join this project?

Chris: One of my main areas of research is characterizing and interpreting regions of the genome using ontologies. Many rare diseases are Mendelian, which means the cause of the disease can be traced back to changes within or affecting parts of the genome. Other rare diseases may be environmental, or a mixture of environmental and genetic, and I’m very interested in how the environment influences the health of complex organisms like humans. This led to the creation of Mondo as a way to annotate genomes and environments. My role was developing the algorithms that used different kinds of reasoning to bring together multiple sources of information and organize it coherently.

Nomi: My master’s thesis involved applying artificial intelligence techniques to predict the risk of inheriting genetic disorders. After that, I worked for years on bioinformatics projects that didn’t directly relate to human health. I was excited to have a chance to get back into the medical realm and contribute to a project that we hope will ultimately help to improve the prospects of those with rare diseases.


# # #

Founded in 1931 on the belief that the biggest scientific challenges are best addressed by teams, Lawrence Berkeley National Laboratory and its scientists have been recognized with 13 Nobel Prizes. Today, Berkeley Lab researchers develop sustainable energy and environmental solutions, create useful new materials, advance the frontiers of computing, and probe the mysteries of life, matter, and the universe. Scientists from around the world rely on the Lab’s facilities for their own discovery science. Berkeley Lab is a multiprogram national laboratory, managed by the University of California for the U.S. Department of Energy’s Office of Science.

DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit