Contact: Jon Bashor, [email protected]

An interview with the 10-gig E team leaders

When the Institute of Electrical and Electronics Engineers (IEEE) recently adopted a new 10-gigabit-per-second standard for the Ethernet, the most widely installed local area network technology, the speed of Ethernet operations increased by an order of magnitude — at least on paper. A gigabit is a billion bits, and actually achieving that ten-fold increase in Ethernet performance remains a challenge that can only be met with leading-edge equipment and expertise.

Lawrence Berkeley National Laboratory operates some of the world’s most powerful computing, data storage, and networking resources for the U.S. Department of Energy. Recently the Lab teamed with a number of commercial vendors, including Force10 Networks for switches, SysKonnect for network interfaces, FineTec Computers for clusters, Quartet Network Storage for online storage, and Ixia for line rate monitors, to assemble a “10-gig E” demonstration system. The system runs a true scientific application on one 11-processor cluster, then sends the resulting data across a 10-gigabit Ethernet connection to another cluster, where it is rendered for visualization.

In Berkeley Lab’s Access Grid Node, a Cactus code simulation of in-spiraling black holes is rendered by Visapult, demonstrating data transfer at ten billion bits per second.

Ten-gigabit Ethernet capability was showcased in a demonstration held at Berkeley Lab on Tuesday, July 2, 2002. Ixia’s line-monitoring equipment showed that performance peaked at an actual line rate of 10.6 gigabits per second, with the total amount of real data transferred during the demonstration and preceding 12 hours of trial runs reaching nearly 60 terabytes (60 trillion bytes).

When it comes to moving huge amounts of scientific data quickly across networks, the team from Berkeley Lab has been the undisputed champion of the high-performance computing and networking world for two years running. Last November, at the SC2001 conference in Denver, the Berkeley Lab team took top honors in the High-Performance Bandwidth Challenge, moving data across the network at a sustained rate of 3.3 Gigabits in a live demonstration of computational steering and visualization. The team made use of the Albert Einstein Institute’s “Cactus” simulation code and Berkeley Lab’s Visapult parallel visualization system, running on hardware provided by Force10, SysKonnect, and FineTec.

The July 2 10-gigabit Ethernet demonstration at Berkeley Lab was assembled with help from the same vendors and served as a test run for this year’s High-Performance Bandwidth Challenge at SC2002 in Baltimore. The Berkeley Lab team and its partners will be seeking their third straight win.

The team primarily responsible for assembling the 10-gigabit demonstration system consists of four Berkeley Lab staffers, network engineers Mike Bennett and John Christman, and computer systems engineers John Shalf and George “Chip” Smith. Science Beat had a chance to talk with Bennett, Shalf, and Smith about the effort as the demonstration was being prepared.

From left, Mike Bennett, George “Chip” Smith, Raju Shah of Force10 Networks, John Shalf, and John Christman. The screens indicate a total data transfer rate of 10.6 gigabits per second.

Science Beat: “First of all, why is Lawrence Berkeley National Laboratory leading a demonstration project like this?”

Bennett: “We had been asked in January to serve as a technical advisor to a conference planned for March. The goal of the conference was to highlight the new IEEE standard for 10-gig E. For various reasons, the conference was pushed back until June to coincide with adoption of the standard. We were then asked what kind of demo we could put together that would show the difference that having 10-gig capability would make. I immediately thought of the Lab group that won the Bandwidth Challenge at SC2001 — they had a real scientific application that was bandwidth intensive.

“We put the demo system together for the conference, which was again delayed. Since we had a room full of equipment, we decided to salvage our effort and do a demo run here. It turned out to really successful. Force 10 loaned us the switches, FineTec donated enough computers to make it interesting, and Chip Smith worked with SysKonnect to get very high performance from their network interfaces. Quartet provided the network storage for the data to be visualized.

“The result is we proved that 10-gig E is a reality, not just a bunch of back-of-the-envelope calculations.”

Smith: “Also, Berkeley Lab has a long history of being on the forefront of networking, from putting the first supercomputer on ARPANET, to helping develop TCP and IP protocols, to posting one of earliest sites at the dawn of the World Wide Web. We’re carrying on that work by extension, to keep the Lab at the forefront of technology and to continue to push the capabilities of that technology.”

Science Beat: “In lay terms, what does 10-gigabit Ethernet represent?”

Bennett: “In order to put 10-gigabit Ethernet in perspective, consider that the average desktop machine connects at 100 megabits per second [100 million bits per second]. In essence, the higher speed technology is 100 times faster.

“Here’s an example of the advantage of faster data transfer: the file size of a raw digital version of The Matrix in AVI format is approximately 236 gigabits. With 10-gigabit Ethernet, transferring the entire movie file takes 23.6 seconds. In contrast, the average desktop machine transfer using Fast Ethernet takes 2,360 seconds, or roughly 39 minutes. The same transfer over a DSL line takes 66 hours. Still, the full benefit of 10-gigabit Ethernet has yet to be fully appreciated.”

Science Beat: “Is this the first real-world demonstration of 10-gigabit Ethernet capability?”

Bennett: “As far as I know. A lot of the tests that have been publicized have been interoperability-based, to show that a product from Vendor A can interoperate with equipment from Vendor B, which is the aim of the IEEE standard. What the interoperability standard doesn’t address is whether you can take one vendor’s equipment and plug it into a cluster connected to a network and get that 10-gig level of performance.

“What we are demonstrating is that it does work in the real world. And it has real-world benefits. From a network engineering perspective, 10-gig E makes building a network is much easier. You have one point-to-point connection, rather than ten 1-gig E connections to install and maintain.”

Ten billion bits a second: the real-world benefits

Shalf: “From the computing side, there’s also a real-world need and benefit. The source of data for our demonstration was the Cactus simulation code, developed by the Numerical Relativity group led by Ed Seidel at the Albert Einstein Institute of the Max Planck Institutes, in Potsdam, Germany. Cactus is a modular framework capable of supporting many different simulation applications, such as general relativity, binary neutron stars, magnetohydrodynamics, and chemistry, but in this case we were interested in binary black-hole mergers. These simulations will help us better understand what wave signatures we should be looking for in gravitational wave observatories like LIGO and VIRGO.

In Berkeley Lab’s Access Grid Node John Shalf, foreground, works with Visapult’s rendering of the huge Cactus simulation.

“Codes like Cactus can easily consume an entire supercomputer like the 3,328-processor IBM SP at NERSC. The Cactus team ran the code at NERSC for 1 million CPU-hours, or 114 CPU-years, performing the first-ever simulations of the in-spiraling coalescence of two black holes. When you make these big heroic runs, you don’t want to find out after a week that one parameter was wrong and the simulation fell apart after a few days. You need high bandwidth to keep up with the enormous data production rate of these simulations — one terabyte per time step — and with 10-gig E you can get an accurate look at how the code is running. Otherwise, you can only get low-resolution snapshots that are of limited usefulness.

“Remote monitoring and visualization require a system that can provide visualization capability over wide area network connections without compromising interactivity or the simulation performance. We used Visapult, developed by Wes Bethel of LBNL’s Visualization Group for DOE’s Next Generation Internet Combustion Corridor project several years ago. Visapult allows you to use your desktop workstation to perform interactive volume visualization of remotely computed datasets without downsampling of the original data. It does so by employing the same massively parallel distributed memory computational model employed by the simulation code in order to keep up with the data production rate of the simulation. It also uses high performance networking in order to distribute its computational pipeline across a WAN [wide area network] so as to provide a remote visualization capability that is decoupled from the cycle time of the simulation code itself.”

Science Beat: “What about other applications for this capability?”

Bennett: “Initially, I think the major interest will come from the research and university communities, until the cost comes down. Although right now we have found 10-gig E to cost about the same as aggregating ten 1-Gig E connections. One area that could benefit would be health care. Having 10-gig E capability will allow streaming video at motion picture quality, which could be useful in performing surgery and teaching. It will also make it easier to transmit high-res medical images.

“Also, services that rely on bandwidth can benefit. Data centers operating web servers or providing bandwidth on demand for commercial clients would be able to offer better service, as would metropolitan area Ethernet service providers. Basically, any place now running 1-gig E stands to benefit from this. Farther down the road, I think the financial services industry will find this capability useful.”

Smith: “A couple of colleagues who work at Pixar [Animation Studios] came by to view the demo. Their computer animations are a good candidate to benefit from higher bandwidth connections. They said they were getting a new cluster in the coming weeks and this gave them some good ideas, especially since it is going to be a Linux cluster, as are ours.”

Science Beat: “What were the obstacles to achieving true 10-gigabit Ethernet performance?”

Bennett: “The first one is getting the 1-gig network interfaces to run as close to that line rate as possible. Many of them only run at 600-700 megabits. Chip Smith worked with SysKonnect to get up to the gigabit level.”

Smith: “The speed bump was with Linux. When you run Linux with the SysKonnect card, the libraries in the kernel for the SysKonnect cards have a default behavior that would have the cards run with an average line rate of 600-700 megabits per second.

“Working with SysKonnect, I was able to change one of the libraries in the kernel and using a recent virtual Ethernet interface module, I was able to get 950 to 1,000 megabits off the single interfaces. This enabled us to run this demonstration with one-third fewer machines than it would have without the work on the kernel. In the long run, getting this to work also saves money on machines and per-port price that is factored in when purchasing new machines for those who want to set up similar systems. It also shows that 1-gig E is viable in a cluster setting.”

From left, Wes Bethel, John Christman, John Shalf, Chip Smith, and Mike Bennett, with the Force10 switch and the cluster running the Cactus application.

Bennett: “The second obstacle was getting network equipment that can deliver at that rate. Force10 was able to provide the network equipment that could handle it. Because of all the contributing vendors, the demo was a success.

“But the most work involved building the cluster and getting the applications to run on it, which are John’s and Chip’s areas of expertise.”

Shalf: “And certainly it’s a nontrivial feat to design an application like Wes’s Visapult that can fully overlap its computation with pulling data off of the network at full line rate. This requires considerable performance tuning at the application level, as well as novel visualization algorithms like the Rogers and Crawfis image-based rendering method on which Visapult is loosely based.”

Science Beat: “Any other challenges?”

Bennett: “Well, it’s definitely an exciting process. When you’re working with new technology like this, you almost hope you’ll run into a new and interesting bug — something you haven’t seen before. It’s also exciting to be able to offer this to users of our network here at the Lab.”

Science Beat: “Did you notice any similarities to previous increases in bandwidth?”

Shalf: “At SC’95 we were asked, ‘With these OC-3 lines, how are you going to deal with this infinite bandwidth?’ Our demonstration shows that 10-gig E indeed isn’t ‘infinite bandwidth.’ We are quite capable of consuming this and more using an existing production simulation and visualization application. So our excitement over the possibilities that this new technology unlocks is tempered by the fact that we remain such a long distance away from anything approximating ‘infinite bandwidth.'”

Bennett: “I saw the same cycle when 1-gig E was rolled out in 1998. People thought it was too expensive and that no one would use all that bandwidth right away. But as the cost came down, demand and usage went up. Here at the Lab, we have 1-gig E network distribution connections to the buildings. As that fills up, we’re going to be looking at upgrading to 10-gig E.”

Additional information: