A handful of muck or a bucket of water can teem with millions of microorganisms — a few of which could be the next big thing when it comes to learning how to create biofuels or understanding the planet’s carbon cycle.

This search for the movers and shakers of the microbial world is getting easier thanks to a database of “fingerprints” maintained by Lawrence Berkeley National Laboratory (Berkeley Lab) scientists that surpassed one million entries earlier this year.

What secrets does a handful of soil hold? Greengenes, one of the world's largest database of microbial fingerprints,

What secrets does a handful of soil hold? Greengenes, one of the world's largest database of microbial fingerprints, is helping scientists worldwide better understand the diversity of microbes, and how they can help us develop clean energy technologies and fight disease, among many applications. (image: Berkeley Lab)

The database, called Greengenes, is one of the world’s largest collections of high-quality DNA sequences of 16S ribosomal RNA genes. These protein-making genes are found in all microbes, and in general each species has a unique variation. They’re genetic IDs, the one thing that can finger a specific microbe in a crowded lineup, if you know which 16S rRNA belongs to which microbe.

That’s where Greengenes come in. Researchers from around the world can access the database online and enter 16S rRNA sequences extracted from samples of soil, water, and even intestinal bacteria. A match with a sequence in Greengenes is a giveaway that a specific microbe is in the sample. If there’s not a match, perhaps a new species has been discovered.

In this way, Greengenes is fast becoming a go-to resource for scientists seeking to better understand what microbes do, their diversity, and what we can learn from them. The database launched in 2002 and now gets about 100 citations per year in scientific papers.

“Our goal is to develop the highest quality reference set so scientists can use it to better understand life at the microscopic scale. We want to cover as much microbial diversity on Earth as possible,” says Todd DeSantis, a scientist in Berkeley Lab’s Earth Sciences Division who led the development of the database under the auspices of Gary Andersen’s lab.

Among its many hits, Stanford University scientists used the database to discover a microorganism in San Francisco Bay sediments that plays a role in the carbon and nitrogen cycles. The scientists could see the ammonia-oxidizing archaea under the microscope, but they couldn’t grow it in the lab. They extracted its DNA, sequenced it, and compared to known strains in Greengenes. It was unique, and a new organism was named: Candidatus Nitrosoarchaeum limnia SFB1.

cropped

The sediment underneath the San Francisco Bay is home to a newly discovered microorganism that plays a role in the planet's carbon cycle. Scientists saw the microbe under a microscope, but it took Greengenes to tag it as new. (image: Berkeley Lab)

A Cornell University-led team used Greengenes to identify microbes that efficiently convert industrial wastewater into methane. Their work could help scientists engineer microbial communities that are optimized to digest wastewater and emit methane for use as an energy source.

Elsewhere, a team from the University of Milan used the database to analyze bacterial DNA from stains on the pages of Leonardo da Vinci’s multi-volume Codex Atlanticus. They found matches to bacteria previously isolated from cleanrooms and human skin, which led the team to recommend new ways to protect texts from deterioration.

Mario Taddei Unique edition of the Codex Atlanticus - Codice Atlantico-www.mariotaddei.net (27b)

Italian scientists used Greengenes to detect bacteria from human skin and cleanrooms on Leonardo da Vinci's Codex Atlanticus, which has been handled by monks and historians for centuries. To better protect art and texts, the scientists now recommend rigorous monitoring of the conditions in storage facilities and improvements to handling procedures. (image: Mario Taddei via Wikimedia Commons)

And a Danish team used the database to improve the treatment of a disease, called necrotizing enterocolitis, which is marked by inappropriate bacteria colonizing an infant’s intestines.

Expect more uses from Greengenes as it continues to grow. When scientists find a 16S rRNA gene in the course of their research, they submit its sequence to one of many gene databanks. Greengenes scours these databanks for new entries. When it finds one, it uses a computer program to compare the sequence to other 16S rRNA genes and to ensure its quality. Only the best and most complete sequences are added.

“There are tens of millions of 16S-like sequences in public databases, but we only want the highest quality sequences to use as references,” says DeSantis.

Lawrence Berkeley National Laboratory addresses the world’s most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab’s scientific expertise has been recognized with 12 Nobel prizes. The University of California manages Berkeley Lab for the U.S. Department of Energy’s Office of Science. For more, visit www.lbl.gov.

Additional information:

  • Learn more about Greenngenes.
  • Greengenes is also instrumental in the development of the PhyloChip, which quickly and accurately identifies microbes in complex samples.