Taking the Measure of Supercomputer Architectures - Berkeley Lab

Contact: Jon Bashor, [email protected]

Members of Berkeley Lab’s Computing Sciences divisions are applying their expertise in running scientific codes and evaluating high-performance computers to achieve “real world” assessments of leading supercomputers around the world. Their goal is to determine which architectures are best suited for advancing computational science.




The BlueGene/L supercomputer, made by IBM for the Department of Energy, currently the world’s fastest

With the re-emergence of viable vector computing systems such as the Earth Simulator and the Cray X1, and with IBM and DOE’s BlueGene/L taking the number-one spot on the TOP500 list of the world’s fastest computers, there is renewed debate about which architecture is best suited for running large-scale scientific applications.

In order to cut through conflicting claims, researchers from Berkeley Lab’s Computational Research and NERSC Center divisions have been putting various architectures through their paces, running benchmarks as well as scientific applications key to Department of Energy programs. The team includes Lenny Oliker, Julian Borrill, Andrew Canning, and John Shalf of CRD, Jonathan Carter and David Skinner of NERSC, and Stephane Ethier of the Princeton Plasma Physics Laboratory. Their evaluations have resulted in a half-dozen papers published in journals and presented at conferences in the United States, Norway, Japan, and Spain.

In the initial part of their study, the team traveled to Japan in December, 2004 and put five different systems through their paces, running four different scientific applications key to DOE research programs. As part of the effort, the group became the first international team to conduct a performance evaluation study of the 5,120-processor Earth Simulator.

The team also assessed the performance of

the 6,080-processor IBM Power3 supercomputer, running AIX 5.1 at the NERSC Center,
the 864-processor IBM Power4 supercomputer, running AIX 5.2 at Oak Ridge National Laboratory (ORNL),
the 256-processor SGI Altix 3000 system, running 64-bit Linux at ORNL,
and the 512-processor Cray X1 supercomputer, running UNICOS at ORNL.




	The Japan Marine Science and Technology Center’s Earth Simulator, one of the most powerful computers in the world

“This effort relates to the fact that the gap between peak and actual performance for scientific codes keeps growing,” said team leader Lenny Oliker. “Because of the increasing cost and complexity of HPC systems” — high-performance computing systems — “it is critical to determine which classes of applications are best suited for a given architecture.”

The four applications and research areas selected by the team for the evaluation were

Cactus, an astrophysics code that evolves Einstein’s equations from the Theory of Relativity using the Arnowitt-Deser-Misner method,
GTC, a magnetic-fusion application that uses the particle-in-cell approach to solve nonlinear gyrophase-averaged Vlasov-Poisson equations,
LBMHD, a plasma physics application that uses the Lattice-Boltzmann method to study magnetohydrodynamics,
and PARATEC, a first-principles materials science code that solves the Kohn-Sham equations of density-functional theory to obtain electronic wave functions.

“The four applications successfully ran on the Earth Simulator with high parallel efficiency,” Oliker said. “And they ran faster than on any other measured architecture — generally by a large margin.” However, Oliker added, only codes that scale well and are suited to the vector architecture may be run on the Earth Simulator. “Vector architectures are extremely powerful for the set of applications that map well to those architectures,” Oliker said. “But if even a small part of the code is not vectorized, overall performance degrades rapidly.”




A visualization of gravitational waves generated with the Cactus Computational Toolkit, an astrophysical application developed in Germany and requiring vast scientific computing power

One of the codes, LBMHD, ran at 67 percent of peak system performance, even when scaled up to 4,800 processors. However, as with most scientific inquiries, the ultimate solution to the problem is neither simple nor straightforward.

“We’re at a point where no single architecture is well suited to the full spectrum of scientific applications,” Oliker said. “One size does not fit all, so we need a range of systems. It’s conceivable that future supercomputers would have heterogeneous architectures within a single system, with different sections of a code running on different components.”

One of the codes the group intended to run in this study — MADCAP, the Microwave Anisotropy Dataset Computational Analysis Package — did not scale well enough to be used on the Earth Simulator. MADCAP, developed by Julian Borrill, is a parallel implementation of cosmic microwave background map-making and power spectrum estimation algorithms. Since MADCAP has high input-output (I/O) requirements, its performance was hampered by the lack of a fast global file system on the Earth Simulator.




	The Cray X1, the most powerful yet in the venerable Cray supercomputer line

Undeterred, the team retuned MADCAP and returned to Japan to try again. The results, outlined in a paper titled “Performance characteristics of a cosmology package on leading HPC architectures” and presented at the Eleventh International Conference on HPC in Bangalore, India, found that the Cray X1 had the best runtimes for MADCAP but suffered the lowest parallel efficiency. The Earth Simulator and IBM Power3 demonstrated the best scalability, and the code achieved the highest percentage of peak on the Power3. The paper concluded, “Our results highlight the complex interplay between the problem size, architectural paradigm, interconnect, and vendor-supplied numerical libraries, while isolating the I/O filesystem as the key bottleneck across all the platforms.”

BlueGene/L is currently the world’s fastest supercomputer, with the first Blue Gene system being installed at Lawrence Livermore National Laboratory. David Skinner is serving as Berkeley Lab’s representative to a new BlueGene/L consortium led by Argonne National Laboratory. The consortium aims to pull together a group of institutions active in HPC research, collectively building a community focused on the BlueGene family as a next step towards petascale computing. This consortium will work together to develop or port BlueGene applications and system software, conduct detailed performance analysis on applications, develop mutual training and support mechanisms, and contribute to future platform directions.

Additional information

“Integrated performance monitoring of a cosmology application on leading HEC platforms,” by Julian Borrill, Jonathan Carter, Leonid Oliker, David Skinner, and Rupak Biswas, appears in the Proceeding of the International Conference on Parallel Processing (ICPP), Oslo, Norway, June 14-17, 2005 (Nominated Best Paper)
“Performance evaluation of the SX-6 vector architecture for scientific computations,” by Leonid Oliker, Andrew Canning, Jonathan Carter, John Shalf, David Skinner, Stéphane Ethier, Rupak Biswas, Jahed Djomehri, and Rob Van der Wijngaart, appears in Concurrency and Computation Journal: Practice and Experience, Vol 17:1, pages 69-93
“A performance evaluation of the Cray X1 for scientific applications,” by Leonid Oliker, Rupak Biswas, Julian Borrill, Andrew Canning, Jonathan Carter, M. Jahed Djomehri, Hongzhang Shan, and David Skinner, was presented at VECPAR 2004, the Sixth International Conference on High Performance Computing for Computational Science and appears in Revised Selected Papers and Invited Talks, Lecture Notes in Computer Science LNCS 3402, 2005
“Performance of ultra-scale applications on leading vector and scalar HPC platforms,” by Leonid Oliker, Andrew Canning, Jonathan Carter, John Shalf, Horst Simon, Stephane Ethier, David Parks, Shigemune Kitawaki, Yoshinori Tsuda, and Tetsuya Sato, appears in the Journal of the Earth Simulator, Volume 3, April 2005
“Leading computational methods on scalar and vector HEC platforms,” by Leonid Oliker, Jonathan Carter, Michael Wehner, Andrew Canning, Stephane Ethier, Bala Govindasamy, Art Mirin, David Parks, Patrick Worley, Shigemune Kitawaki, and Yoshinori Tsuda, was presented at the High Performance Computing, Networking, and Storage Conference (SC 2005), in press
“Magnetohydrodynamic Turbulence Simulations on the Earth Simulator Using the lattice Boltzmann Method,” by Jonathan Carter, Min Soe, Leonid Oliker, Yoshinori Tsuda, George Vahala, Linda Vahala, and Angus Macnab, was presented at the High Performance Computing, Networking, and Storage Conference (SC 2005), Gordon Bell Finalist, in press
Carter, Borrill, and Oliker’s earlier paper on MADCAP, presented at HiPC 2004 in Bangalore, India, in December 2004