For the first time, SC conference releases a comprehensive list of all Gordon Bell winners since 1987. Here is that historical perspective.
Each year at SC, two key indicators of HPC performance are highlighted. Both were created to provide a more accurate measure of overall performance than the theoretical peak performance levels sometimes touted by vendors.
As Alan Laub, who was the first head of the Department of Energy’s SciDAC program, once said, “Peak performance – the manufacturer’s guarantee that you can’t compute faster than that.” So, coming up with metrics that provide defensible results is critical.
One, the TOP500 list, details how fast the LINPACK benchmark runs on the top 500 supercomputers of those who submit their performance data.
While the focus of the TOP500 list is on the systems themselves, the ACM Gordon Bell Prize recognizes the scientists who push those systems to get the most scientific productivity out of them, with particular emphasis on rewarding innovation in applying high-performance computing to applications in science, engineering, and large-scale data analytics.
At SC16, six finalists will compete for this year’s prize. Three of those submissions are based on calculations performed on the newest No. 1 system on the TOP500 list, the Sunway TaihuLight machine in China.
The Gordon Bell Prize, was created in 1987 by Gordon Bell, who rose to fame as a computer designer for Digital Equipment Corp. Prizes may be awarded for peak performance or special achievements in scalability and time-to-solution on important science and engineering problems. Financial support of the $10,000 award is provided by Bell and the recognition itself is highly valued by the scientists whose scientific applications push the sustained performance of leading edge supercomputers.
“Our community is fortunate to have the Gordon Bell Prize, because it documents scientific progress at the frontier of both supercomputing architectures and important computational science applications,” said Jeff Vetter, who leads the Future Technologies Group at Oak Ridge National Laboratory and was a member of the team that won the Gordon Bell Prize in 2010. In his 2013 book, “Contemporary High Performance Computing: From Petascale to Exascale,” Vetter wrote notes that the prize is “the most well-known scientific accomplishment” for HPC performance on real scientific problems.
Horst Simon, deputy director of Lawrence Berkeley National Laboratory, was a member of the teams that won the Gordon Bell Prize in 1988 and 2009. “I was absolutely elated to be a member of the winning team in 1988 – I was early in my career and thought that an award for parallel performance was a great idea,” Simon said of the 1988 prize, which still hangs in his office. “I was fortunate to be in a great group at Boeing and be part of a great team at NASA. The award came at a time when parallel computing was a hot topic and it was a great career boost – it established my credentials in HPC.”
Simon has continued to add to those credentials, including serving as one of four editors of the twice-yearly TOP500 list, which was created in 1993.
But the Gordon Bell Prizes also provide insight into how scientific computing capabilities have changed over the years. The prizes for peak performance usually go to researchers using the world’s fastest supercomputer at the time. For example, the 1998 winning team got access to a 1,024 processor Cray T3E that was still on the factory floor in Chippewa Falls.
With the debut of the Earth Simulator in 2002, teams running applications on the system took home the Gordon Bell Prize for peak performance in 2002, 2003 and 2004. A similar pattern ensued when BlueGeneL ended the Earth Simulator’s dominance in late 2004. At SC16, several of the Gordon Bell Prize entries are based on results from the Sunway TaihuLight machine in China that grabbed the top spot on the latest TOP500 list in June.
Interestingly, though, despite the importance of the prize in the HPC community, there is no comprehensive list of winners online. But there are pieces. For SC2000, Blaise Barney of Lawrence Livermore National Laboratory compiled a list of all winners from 1987 to 1999, even noting that no prize was awarded in 1991.
Starting in 2006, ACM began co-sponsoring the prize and maintains a list of “ACM Gordon Bell Prize” winners from 2006 to the present, with links to the award papers. But for 2000? 2001? Up to 2005? It’s an online scavenger hunt – and an omission that even Wikipedia hasn’t addressed.
However, thanks to the efforts of SC communicators and awards chairs, the information was recorded and virtually squirreled away on conference websites. By digging through old press releases and awards list, we have come up with the list below.
No doubt there may be are some errors/omissions and in the interest of historical accuracy, we welcome any corrections and the sources. Please send them to email@example.com
Following is the current list:
ACM Gordon Bell Prize Winners
General Purpose Computer
First Place: Robert Benner, John Gustafson, Gary Montry, Sandia National Laboratories; “Beam Stress Analysis, Surface Wave Simulation, Unstable fluid flow model,” 400 – 600 speedup on a 1,024 node N-CUBE
Honorable Mention: Robert Chervin, NCAR; “Global Ocean Model,” 450 Mflops on a Cray X/MP48
Honorable Mention: Marina Chen, Yale University; Erik Benedictus, Bell Labs; Geoffrey Fox, Caltech; Jingke Li, Yale University; David Walker, Caltech; “QCD and Circuit Simulation,” Speedups ranging from 39-458 on three applications run on CM hypercubes
Honorable Mention: Stavros Zenios, University of Pennsylvania; “Nonlinear network optimization,” 1.5 sec. Execution time on a Connection Machine
First Place: Phong Vu, Cray Research; Horst Simon, NASA Ames; Cleve Ashcraft, Yale University; Roger Grimes and John Lewis, Boeing Computer Services; Barry Peyton, Oak Ridge National Laboratory; “Static finite element analysis,” 1 Gflops on 8-proc. Cray Y-MP, Running time reduced from 15 min. to 30 sec.
Honorable Mention: Richard Pelz, Rutgers University; “Fluid flow problem using the spectral method,” 800 speedup on a 1,024 node N-CUBE Compiler Parallelization
Honorable Mention: Marina Chen, Young-il Choo, Jungke Li and Janet Wu, Yale University; Eric De Benedictus, Ansoft Corp.; “Automatic parallelization of a financial application,” 350 times speedup on a 1,024 N-CUBE and 50 times speedup on a 64 node Intel iPSC-2.
First Place: Mark Bromley, Harold Hubschman, Alan Edelman, Bob Lordi, Jacek Myczkowski and Alex Vasilevsky, Thinking Machines; Doug McCowan and Irshad Mufti, Mobil Research; “Seismic data processing,” 6 Gflops on a CM-2 (also, 500 Mflops/$1M)
Honorable Mention: Sunil Arvindam, University of Texas, Austin; Vipin Kumar, University of Minnesota; V. Nageshwara Rao, University of Texas, Austin; “Parallel search for VLSI design,” 1,100 speedup on a 1,024 processor CM
First Place: Philip Emeagwali, University of Michigan; “Oil reservoir modeling,” 400 Mflops/$1M on a CM-2
Honorable Mention: Daniel Lopresti, Brown University; William Holmes, IDA Supercomputer Research Center; “DNA sequence matching,” 77k MIPs/$1M
Honorable Mention: Mark Bromley, Steve Heller, Cliff Lasser, Bob Lordi, Tim McNerney, Jacek Myczkowski, Irshad Mufti, Guy Steele, Jr. and Alex Vasilevsky, Thinking Machines; Doug McCowan, Mobil Research; “Seismic data processing,” 14 Gflops on a CM-2
First Place: Al Geist and G. Malcom Stocks, Oak Ridge National Laboratory; Beniamino Ginatempo, University of Messina, Italy; William Shelton, U.S. Naval Research Laboratory; “Electronic structure of a high-temperature superconductor,” 800 Mflops/$1M on a 128-node Intel iPSC/860
Second Place: Gary Sabot, Lisa Tennies and Alex Vasilevsky, Thinking Machines; Richard Shapiro, United Technologies; “Grid generation program used to solve partial differential equations,” 1,900 speedup on a 2,048 node CM-2 (2.3 Gflops)
Honorable Mention: Eran Gabber, Amir Averbuch and Amiram Yihudai, Tel Aviv University; “Parallelizing Pascal Compiler,” 25x on a 25 node Sequent Symmetry
No prize awarded
First Place: Michael Warren, Los Alamos National Laboratory; John K. Salmon, Caltech; “Simulation of 9 million gravitating stars by parallelizing a tree code,” 5 Gflops on an Intel Touchstone Delta.
First Place: Hisao Nakanishi and Vernon Rego, Purdue University; Vaidy Sunderam, Emory University; “Simulation of polymer chains parallelized over a heterogeneous collection of distributed machines,” 1 Gflops/$1M.
First Place: Mark T. Jones and Paul Plassmann, Argonne National Laboratory; “Large, sparse linear system solver that enabled the solution of vortex configurations in superconductors and the modeling of the vibration of piezo-electric crystals,” 4 Gflops on an Intel Touchstone Delta. Speedups between 350 and 500.
First Place: Lyle N. Long and Matt Kamon, Penn. State University; Denny Dahl, Mark Bromley, Robert Lordi, Jack Myczkowski and Richard Shapiro, Thinking Machines; “Modeling of a shock front using the Boltzmann Equation,” 60 Gflops on a 1,024 processor CM-5
Honorable Mention: Peter S. Lomdahl, Pablo Tamayo, Niels Gronbech-Jensen and David M. Beazley, Los Alamos National Laboratory; “Simulating the micro-structure of grain boundaries in solids,” 50 Gflops on a 1,024 processor CM-5
First Place: Robert W. Means and Bret Wallach, HNC Inc.; Robert C. Lengel Jr., Tracor Applied Sciences; “Image analysis using the bispectrum analysis algorithm,” 6.5 Gflops/$1M on a custom-built machine called SNAP
First Place: David Womble, David Greenberg, Stephen Wheat and Robert Benner, Sandia National Laboratories; Marc Ingber, University of New Mexico; Greg Henry and Satya Gupta, Intel; “Structural mechanics modeling using the boundary element method,” 140 Gflops on a 1,904 node Intel Paragon
First Place: Stefan Goedecker, Cornell University; Luciano Colombo, Università di Milano; “Quantum mechanical interactions among 216 silicon atoms,” 3 Gflops/$1M on a cluster of eight HP workstations
Honorable Mention: H. Miyoshi, Foundation for Promotion of Material Science and Technology of Japan, M. Fukuda, T. Nakamura, M. Tuchiya, M. Yoshida, K. Yamamoto, Y. Yamamoto, S. Ogawa, Y. Matsuo and T. Yamane National Aerospace Laboratory; M. Takamura, M. Ikeda, S. Okada, Y. Sakamoto, T. Kitamura and H. Hatama, Fujitsu Limited; M. Kishimoto, Fujitsu Laboratories Limited; “Isotropic Turbulence and other CFD codes,” 120 Gflops on a 140 processor Numerical Wind Tunnel
First Place: Masahiro Yoshida, Masahiro Fukuda and Takashi Nakamura, National Aerospace Laboratory (Japan); Atushi Nakamura, Yamagata University; Shini Hoiki, Hiroshima University; “Quantum chromodynamics simulation,” 179 Gflops on 128 processors of the Numerical Wind Tunnel
First Place: Panayotis Skordos, MIT; “Modeling of air flow in flue pipes,” 3.6 Gflops/$1M on a cluster of 20 HP workstations
First Place: Junichiro Makino and Makoto Taiji, University of Tokyo; “Simulation of the motion of 100,000 stars,” 112 Gflops using the Grape-4 machine with 288 processors
First Place: Toshiyuki Iwamiya, Masahiro Yoshida, Yuichi Matsuo, Masahiro Fukuda and Takashi Nakamura, National Aerospace Laboratory (Japan); “ Fluid dynamics problem,” 111 Gflops on 166 processor Numerical Wind Tunnel
Honorable Mention: Toshiyuki Fukushige and Junichiro Makino, University of Tokyo; “Simulation of the motion of 780,000 stars,” 333 Gflops using the Grape-4 machine w/ 1,269 processors
First Place: Adolfy Hoisie, Cornell University; Stefan Goedecker and Jurg Hutter, Max Planck Institute; “Electronic structures calculations,” 6.3 Gflops/$1M on an SGI Power Challenge with 6 MIPS R8000 processors
First Prize-Part 1: Michael S. Warren, Los Alamos, National Laboratory; John K. Salmon, Caltech; “Simulating the motion of 322,000,000 self-gravitating particles,” 430 Gflops on ASCI Red using 4,096 processors
First Prize: Nhan Phan-Thien and Ka Yan Lee, University of Sydney; David Tullock, Los Alamos National Laboratory; “Modeling suspensions,” 10.8 Gflops/$1M on 28 DEC Alpha machines
First Prize-Part 2: Michael S. Warren, Los Alamos, National Laboratory; John K. Salmon, Caltech; Donald J. Becker, NASA Goddard; M. Patrick Goda, Los Alamos National Laboratory; Thomas Sterling, Caltech; Gregoire S. Winckelmans, Universite Catholique de Louvain (Belgium); “Two problems: vortex fluid flow modeled with 360,000 particles; galaxy formation following 10,000,000 self-gravitating particles,” 18 Gflops/$1M on a cluster of 16 Intel Pentium Pros (200 Mhz.)
First Prize: Balazs Ujfalussy, Xindong Wang, Xiaoguang Zhang, Donald M. C. Nicholson, William A. Shelton and G. Malcolm Stocks, Oak Ridge National Laboratory; Andrew Canning, Lawrence Berkeley National Laboratory; Yang Wang , Pittsburgh Supercomputing Center; Balazs L. Gyorffy, H. H. Wills Physics Laboratory, UK; “First principles calculation, of a unit cell (512 atoms) model of non-collinear magnetic arrangements for metallic magnets using a variation of the locally self-consistent multiple scattering method,” 657 Gflops on a 1,024-PE Cray T3E system (600 Mhz)
Second Prize: Mark P. Sears, Sandia National Laboratories; Ken Stanley, University of California, Berkeley; Greg Henry, Intel; “Electronic structures: a silicon bulk periodic unit cell of 3072 atoms, and an aluminum oxide surface unit cell of 2160 atoms, using a complete dense generalized Hermitian eigenvalue-eigenvector calculation,” 605 Gflops on the ASCI Red machine with 9200 processors (200 Mhz.)
First Prize: Dong Chen, MIT; Ping Chen, Norman H. Christ, George Fleming, Chulwoo Jung, Adrian Kahler, Stephen Kasow, Yubing Luo, Catalin Malureanu and Cheng Zhong Sui, Columbia University; Robert G. Edwards and Anthony D. Kennedy, Florida State University; Alan Gara, Robert D. Mawhinney, John Parsons, Pavlos Vranas and Yuri Zhestkov, Columbia University; Sten Hansen, Fermi National Accelerator Laboratory; Greg Kilcup, Ohio State University; “3 lattice quantum chromodynamics computations,” 79.7 Gflops/$1M on a custom system with 2,048 PE’s using a Texas Instruments chip (32-bit floating point ops.)
Second Prize: Michael S. Warren, Timothy C. Germann, Peter S. Lomdahl and David M. Beazley, Los Alamos National Laboratory; John K. Salmon, Caltech; “Simulation of a shock wave propagating through a structure of 61 million atoms,” 64.9 Gflops/$1M using a 70 PE system of DEC Alpha’s (533 Mhz.)
First Prize: A. A. Mirin, R. H. Cohen, B. C. Curtis, W. P. Dannevik, A. M. Dimits, M. A. Duchaineau, D. E. Eliason and D. R. Schikore, Lawrence Livermore National Laboratory; S. E. Anderson, D. H. Porter and R. Woodward, University of Minnesota; L. J. Shieh and S. W. White, IBM; “Very high resolution simulation of fluid turbulence in compressible flows,” 1.18 Tflop/s on short run on 5832 CPU’s on ASCI Blue Pacific, 1.04 Tflop/s sustained on one-hour run, 600 Gflop/s on one-week run on 3840 CPU’s
First Prize: Atsuchi Kawai, Toshiyuki Fushushige and Junichiro Makino, University of Tokyo; “Astrophysical n-body simulation,” 144 Gflops/$1M on custom-built GRAPE-5 32-processor system
First Prize, Shared: W. K. Anderson, NASA Langley Research Center; W. D. Gropp, D, K. Kaushik, B.F. Smith, Argonne National Laboratory; D. E. Keyes, Old Dominion University, Lawrence Livermore National Laboratory, and ICASE, NASA Langley Research Center; “Unstructured tetrahedral mesh fluid dynamics using PETSc library,” 156 Gflop/s on 2048 nodes of ASCI Red, using one CPU per node for computation
First Prize, Shared: H. M. Tufo, University of Chicago; P. F. Fischer, Argonne National Laboratory; “Spectral element calculation using a sparse system solver,” 319 Gflop/s on 2048 nodes of ASCI Red, using two CPU’s per node for computation
Competitors for this year’s prize for best performance tied, each achieving 1.34 teraflops.
First place: Tetsu Narumi, Ryutaro Susukita, Takahiro Koishi, Kenji Yasuoka, Hideaki Furusawa, Atsushi Kawai and Thoshikazu Ebisuzaki; “Molecular Dynamic Simulation for NaCl for a Special Purpose Computer: MDM,” 1.34 Tflops.
First place: Junichiro Makino, Toshiyuki Fukushige and Masaki Koga; “Simulation of Black Holes in a Galactic Center on GRAPE-6,” 1.349 Tflops.
First place: Douglas Aberdeen, Jonathan Baxter and Robert Edwards; “92 cents/Mflops Ultra-Large Scale Neural Network Training on a PIII Cluster.”
Honorable Mention: Thomas Hauser, Timothy I. Mattox, Raymond P. LeBeau, Henry G. Dietz and P. George Huang, University of Kentucky; “High-Cost CFD on a Low-Cost Cluster.”
Alan Calder, B.C. Curtis, Jonathan Dursi, Bruce Fryxell, G. Henry, P. MacNeice, Kevin Olson, Paul Ricker, Robert Rosner, Frank Timmes, Henry Tufo, James Truran and Michael Zingale; “High-Peformance Reactive Fluid Flow Simulations Using Adaptive Mesh Refinement on Thousands of Processors.”
Toshiyuki Fukushige and Junichiro Makino; “Simulation of black holes in a galactic center,” 11.55 Tflop/s.
Joon Hwang, Seung Kim and Chang Lee, “Study of impact locating on aircraft structure,” by low-cost cluster cost 24.6 cents/Mflop/s, or less than 25 cents per 1-million floating operations per second.
Gabrielle Allen, Thomas Dramlitsch, Ian Foster, Nick Karonis, Matei Ripeanu, Edward Seidel and Brian Toonen for supporting efficient execution in the heterogeneous distributed computing environments with Cactus and Globus.
Satoru Shingu, Yoshinori Tsuda, Wataru Ohfuchi, Kiyoshi Otsuka, Earth Simulator Center, Japan Marine Science and Technology Center; Hiroshi Takahara, Takashi Hagiwara, Shin-ichi Habata, NEC Corporation; Hiromitsu Fuchigami, Masayuki Yamada, Yuji Sasaki, Kazuo Kobayashi, NEC Informatec Systems; Mitsuo Yokokawa, National Institute of Advanced Industrial Science and Technology; Hiroyuki Itoh, National Space Development Agency of Japan. “A 26.58 Tflops Global Atmospheric Simulation with the Spectral Transform Method on the Earth Simulator,” 26.58 Tflops simulation of a complex climate system using an atmospheric circulation model called AFES.
Special Award for Language
Hitoshi Sakagami, Himeji Institute of Technology; Hitoshi Murai, Earth Simulator Center, Japan Marine Science and Technology Center; Yoshiki Seo, NEC Corporation; Mitsuo Yokokawa, Japan Atomic Energy Research Institute; “14.9 Tflops Three-dimensional Fluid Simulation for Fusion Science with HPF on the Earth Simulator,” 14.9 Tflops run of a parallelized version of IMPACT-3D, an application written in High Performance Fortran that simulates the instability in an imploding system, such as the ignition of a nuclear device.
Mitsuo Yokokawa, Japan Atomic Energy Research Institute; Ken’ichi Itakura, Atsuya Uno, Earth Simulator Center, Japan Marine Science and Technology Center; Takashi Ishihara, Yukio Kaneda, Nagoya University; “16.4-Tflops Direct Numerical Simulation of Turbulence by a Fourier Spectral Method on the Earth Simulator.” New methods for handling the extremely data-intensive calculation of a three-dimensional Fast Fourier Transform on the Earth Simulator have allowed researchers to overcome a major hurdle for high performance simulations of turbulence.
Manoj Bhardwaj, Kendall Pierson, Garth Reese, Tim Walsh, David Day, Ken Alvin, James Peery, Sandia National Laboratories; Charbel Farhat, Michel Lesoinne, University of Colorado at Boulder; “Salinas: A Scalable Software for High Performance Structural and Solid Mechanics Simulation.” The structural mechanics community has embraced Salinas, engineering software over 100,000 lines long that has run on a number of advanced systems, including a sustained 1.16 Tflops performance on 3,375 ASCI White processors.
James C. Phillips, Gengbin Zheng, Sameer Kumar, Laxmikant V. Kale, University of Illinois at Urbana-Champaign; “NAMD: Biomolecular Simulation on Thousands of Processors.” Researchers achieved unprecedented scaling of NAMD, a code that renders an atom-by-atom blueprint of large biomolecules and biomolecular systems.
Dimitri Komatitsch, Chen Ji, and Jeroen Tromp, California Institute of Technology; and Seiji Tsuboi, Institute for Frontier Research on Earth Evolution, JAMSTEC; “A 14.6 Billion Degrees of Freedom, 5 Teraflop/s, 2.5 Terabyte Earthquake Simulation on the Earth Simulator.” The researchers used 1,944 processors of the Earth Simulator to model seismic wave propagation resulting from large earthquakes.
Volkan Akcelik, Jacobo Bielak, Ioannis Epanomeritakis, Antonio Fernandez, Omar Ghattas, Eui Joong Kim, Julio Lopez, David O’Hallaron and Tiankai Tu, Carnegie Mellon University; George Biros, Courant Institute, New York University; and John Urbanic, Pittsburgh Supercomputing Center; “High Resolution Forward and Inverse Earthquake Modeling on Terascale Computers.” The researchers developed earthquake simulation algorithms and tools and used them to carry out simulations of the 1994 Northridge earthquake in the Los Angeles Basin using 100 million grid points.
Special Achievement (“lifetime”)
Junichiro Makino and Hiroshi Daisaka, University of Tokyo; Eiichiro Kokubo, National Astronomical Observatory of Japan; and Toshiyuki Fukushige, University of Tokyo; “Performance Evaluation and Tuning of GRAPE-6—Towards 40 ‘Real’ Tflop/s.” The researchers benchmarked GRAPE-6, a sixth-generation special-purpose computer for gravitational many-body problems, and presented the measured performance for a few real applications with a top speed of 35.3 teraflops.
Akira Kageyama, Masanori Kameyama, Satoru Fujihara, Masaki Yoshida, Mamoru Hyodo, and Yoshinori Tsuda, JAMSTEC; “A 15.2 TFlops Simulation of Geodynamo on the Earth Simulator,” 15.2 TFlop/s on 4,096 processors of the Earth Simulator.
Mark.F. Adams, Sandia National Laboratories; Harun H. Bayraktar, Abuqus Corp.; Tony M. Keaveny and Panayiotis Papadopoulos, University of California, Berkeley; “Ulltrascalable implicit finite element analyses in solid mechanics with over half a billion degrees of freedom.”
Frederick H. Streitz, James N. Glosli, Mehul V. Patel, Bor Chan, Robert K. Yates, Bronis R. de Supinski, Lawrence Livermore National Laboratory; James Sexton and John A. Gunnels, IBM; “100+ TFlop Solidification Simulations on BlueGene/L.” The team achieved up to 107 teraflop/s (trillion operations per second) with a sustained rate of 101.7 teraflop/s over a seven-hour run on the IBM BlueGeneL’s 131,072 processors.
Francois Gygi University of California, Davis; Erik W. Draeger, Martin Schulz and Bronis R. de Supinski, Lawrence Livermore National Laboratory; John A. Gunnels, Vernon Austel and James C. Sexton, IBM Watson Research Center; Franz Franchetti, Carnegie Mellon University; Stefan Kral, Christoph W. Ueberhuber and Juergen Lorenz; Vienna University of Technology; “Large-scale electronic structure calculations of high-Z metals on the BlueGene/L platform.” A sustained peak performance of 207.3 TFlop/s was measured on 65,536 nodes, corresponding to 56.5% of the theoretical full machine peak using all 128k CPUs.
Honorable Mention: Tetsu Narumi, Yousuke Ohno, Noriaki Okimoto, Takahiro Koishi, Atsushi Suenaga, Futatsugi, Ryoko Yanai, Ryutaro Himeno, Shigenori Fujikawa and Makoto Taiji, all of RIKEN; and Mitsuru Ikei, Intel Corp.; “A 185 Tflop/s Simulation of Amyloid-forming Peptides from Yeast Prion Sup35 with the Special-Purpose Computer System MD-GRAPE3.”
Pavlos Vranas, Gyan Bhanot, Matthias Blumrich, Dong Chen, Alan Gara, Philip Heidelberger, Valentina Salapura and James C. Sexton, all of IBM Watson Research Center; “The BlueGene/L supercomputer and quantum ChromoDynamics,” QCD simulation that achieved 12.2 Teraflops sustained performance with perfect speedup to 32K CPU cores.
James N. Glosli, David F. Richards, Kyle J. Caspersen, Robert E. Rudd and Frederick H. Streitz, all of Lawrence Livermore National Laboratory; and John Gunnels of IBM Watson Research Center; “Extending Stability Beyond CPU Millennium: A Micron-Scale Simulation of Kelvin-Helmholtz Instability.” The team that won the 2005 Gordon Bell Prize for a simulation investigating the solidification in tantalumand uranium at extreme temperatures and pressure, with simulations ranging in size from 64,000 atoms to 524 million atoms, used an expanded machine to conduct simulations of up to 62.5 billion atoms. The optimized ddcMD code is benchmarked at 115.1 Tflop/s in their scaling study and 103.9 Tflop/s in a sustained science run.
Gonzalo Alvarez, Michael S. Summers, Don E. Maxwell, Markus Eisenbach, Jeremy S. Meredith, Thomas A. Maier, Paul R. Kent, Eduardo D’Azevedo and Thomas C. Schulthess, all of Oak Ridge National Laboratory; and Jeffrey M. Larkin and John M. Levesque, both of Cray, Inc.; “New Algorithm to Enable 400+ TFlop/s Sustained Performance in Simulations of Disorder Effects in High-Tc.”
Lin-Wang Wang, Byounghak Lee, Hongzhang Shan, Zhengji Zhao, Juan Meza, Erich Strohmaier, and David H. Bailey, Lawrence Berkeley National Laboratory; “Linear Scaling Divide-and-Conquer Electronic Structure Calculations for Thousand Atom Nanostructures,” for special achievement in high performance computing for their research into the energy harnessing potential of nanostructures. Their method, which was used to predict the efficiency of a new solar cell material, achieved impressive performance and scalability.
Markus Eisenbach and Donald M. Nicholson, Oak Ridge National Laboratory; Cheng-gang Zhou, J.P. Morgan Chase; Gregory Brown, Florida State University; Jeffrey Larkin, Cray Inc.; and Thomas Schulthess, ETH Zurich; “A scalable method for ab initio computation of free energies in nanoscale systems,” on the Cray XT5 system at ORNL, sustaining 1.03 Petaflop/s in double precision on 147,464 cores.
Tsuyoshi Hamada, Nagasaki University; Tetsu Narumi, University of Electro-Communications, Tokyo; Rio Yokota, University of Bristol; Kenji Yasuoka, Keio University, Yokohama; Keigo Nitadori and Makoto Taiji, RIKEN Advanced Science Institute; “42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence.” The maximum corrected performance is 28.1TFlops for the gravitational simulation, which results in a cost performance of 124 MFlops/$1M.
David E. Shaw, Ron O. Dror, John K. Salmon, J. P. Grossman, Kenneth M. Mackenzie, Joseph A. Bank, Cliff Young, Martin M. Deneroff, Brannon Batson, Kevin J. Bowers, Edmond Chow, Michael P. Eastwood, Douglas J. Ierardi, John L. Klepeis, Jeffrey S. Kuskin, Richard H. Larson, Kresten Lindorff-Larsen, Paul Maragakis, Mark A. Moraes, Stefano Piana, Yibing Shan and Brian Towles, all of D.F. Shaw Research; “Millisecond-scale molecular dynamics simulations on Anton.”
Abtin Rahimian and Ilya Lashuk, Georgia Tech; Shravan Veerapaneni, NYU; Aparna Chandramowlishwaran, Dhairya Malhotra, Logan Moon and Aashay Shringarpure, Georgia Tech; Rahul Sampath and Jeffrey Vetter, Oak Ridge National Laboratory; Richard Vuduc and George Biros, Georgia Tech; Denis Zorin, NYU; “Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures,” achieved 0.7 Petaflops/s of sustained performance on Jaguar.
Honorable mention (first): Anton Kozhevnikov, Institut for Theoretical Physics, ETH Zurich; Adolfo G. Eguiluz, The University of Tennessee, Knoxville; and Thomas C. Schulthess, Swiss National Supercomputer Center and Oak Ridge National Laboratory; “Toward First Principles Electronic Structure Simulations of Excited States and Strong Correlations in Nano- and Materials Science.”
Honorable mention (second): Tsuyoshi Hamada, Nagasaki University; and Keigo Nitadori, RIKEN Advanced Science Institute; “190 TFlops Astrophysical N-body Simulation on a Cluster of GPUs.”
Yukihiro Hasegawa, Next-Generation Supercomputer R&D Center, Riken; Jun-Ichi Iwata, Miwako Tsuji and Daisuke Takahashi, University of Tskuba; Atsushi Oshiyama, University of Tokyo; Kazuo Minami, Taisuke Boku, University of Tskuba; Fumiyoshi Shoji, Atsuya Uno and Motoyoshi Kurokawa, Next-Generation Supercomputer R&D Center, Riken; Hikaru Inoue and Ikuo Miyoshi, Fujitsu Ltd.; and Mitsuo Yokokawa, Next-Generation Supercomputer R&D Center, Riken; “First-principles calculations of electron states of a silicon nanowire with 100,000 atoms on the K computer.” A 3.08 petaflops sustained performance was measured for one iteration of the SCF calculation in a 107,292-atom Si nanowire calculation using 442,368 cores, which is 43.63% of the peak performance of 7.07 Pflop/s.
Scalability and Time to Solution
Takashi Shimokawabe, Takayuki Aoki, Tomohiro Takaki, Toshio Endo, Akinori Yamanaka, Naoya Maruyama, Akira Nukada, and Satoshi Matsuoka, all of Tokyo Institute of Technology; “Petascale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer,” simulations on the GPU-rich TSUBAME 2.0 supercomputer at the Tokyo Institute of Technology have demonstrated good weak scaling and achieved 1.017 PFlops in single precision for our largest configuration, using 4,000 GPUs along with 16,000 CPU cores.
Because of the unusually high quality of all of the ACM Gordon Bell Prize finalists, the committee took the unusual step of awarding Honorable Mentions to the remaining three finalists papers:
“Atomistic nanoelectronics device engineering with sustained performances up to 1.44 Pflop/s” by Mathieu Luisier et al.,
“Petaflop biofluidics simulations on a two million-core system,” by Simone Melchionna et al., and
“A new computational paradigm in multiscale simulations: Application to brain blood flow,” by Leopold Grinberg et al.
Scalability and Time to Solution
Tomoaki Ishiyama, Keigo Nitadori, University of Tskuba; and Junichiro Makino, Tokyo Institute of Technology; “4.45 Pflops astrophysical N-body simulation on K computer: the gravitational trillion-body problem,” The average performance on 24576 and 82944 nodes of K computer are 1.53 and 4.45 Pflop/s, which correspond to 49% and 42% of the peak speed.
Best Performance of a High Performance Application
Diego Rossinelli, Babak Hejazialhosseini, Panagiotis Hadjidoukas and Petros Koumoutsakos, all of ETH Zurich; Costas Bekas and Alessandro Curioni of IBM Zurich Research Laboratory; and Steffen Schmidt and Nikolaus Adams of Technical University Munich; “11 Pflop/s simulations of cloud cavitation collapse,” high throughput simulations of cloud cavitation collapse on 1.6 million cores of Sequoia reaching 55% of its nominal peak performance, corresponding to 11 Pflop/s.
Best Performance of a High Performance Application
David E. Shaw, J.P. Grossman, Joseph A. Bank, Brannon Batson, J. Adam Butts, Jack C. Chao, Martin M. Deneroff, Ron O. Dror, Amos Even, Christopher H. Fenton, Anthony Forte, Joseph Gagliardo, Gennette Gill, Brian Greskamp, C. Richard Ho, Douglas J. Ierardi, Lev Iserovich, Jeffrey S. Kuskin, Richard H. Larson, Timothy Layman, Li-Siang Lee, Adam K. Lerer, Chester Li, Daniel Killebrew, Kenneth M. Mackenzie, Shark Yeuk-Hai Mok, Mark A. Moraes, Rolf Mueller, Lawrence J. Nociolo, Jon L. Peticolas, Terry Quan, Daniel Ramot, John K. Salmon, Daniele P. Scarpazza, U. Ben Schafer, Naseer Siddique, Christopher W. Snyder, Jochen Spengler, Ping Tak Peter Tang, Michael Theobald, Horia Toma, Brian Towles, Benjamin Vitale, Stanley C. Wang and Cliff Young: all of D.E. Shaw Research; “Anton 2: raising the bar for performance and programmability in a special-purpose molecular dynamics supercomputer.” Anton 2 is the first platform to achieve simulation rates of multiple microseconds of physical time per day for systems with millions of atoms. Demonstrating strong scaling, the machine simulates a standard 23,558-atom benchmark system at a rate of 85 μs/day—180 times faster than any commodity hardware platform or general-purpose supercomputer.
Outstanding Achievement in High-performance Computing Scalability
Johann Rudi and Tobin Isaac, Omar Ghattas University of Texas at Austin; A. Cristiano I. Malossi, Peter W. J. Staar, Yves Ineichen, Costas Bekas, Alessandro Curioni, IBM Research, Zurich; Georg Stadler, New York University; and Michael Gurnis, Caltech; “An extreme-scale implicit solver for complex PDEs: highly heterogeneous flow in earth’s mantle,” scaled to 1.5 million cores for severely nonlinear, ill-conditioned, heterogeneous, and anisotropic PDEs.
List compiled by Jon Bashor, SC16 Communications Committee member from Lawrence Berkeley National Lab.