Contained in the Gordon Bell Prize Finalist Initiatives

The ACM Gordon Bell Prize, which comes with a $10,000 award courtesy of HPC luminary Gordon Bell, is broadly thought-about the very best prize in high-performance computing. Annually, six finalists are chosen who symbolize the head of excellent analysis achievements in HPC. Final month, listings on the SC22 schedule revealed these finalists. Over the previous couple of weeks, HPCwire obtained in contact with members of the six finalist groups to study extra about their tasks.

Final yr, for the primary time, the Gordon Bell Prize nominees included two tasks powered by exascale computing — particularly, China’s “new Sunway supercomputer,” also referred to as OceanLight. These analysis papers, on the time, constituted essentially the most substantively “official” reveal of the system (which stays unranked). A type of OceanLight-powered papers — a problem to Google’s quantum supremacy declare — gained that yr’s Gordon Bell Prize.

In 2022, OceanLight has exascale-caliber competitors: not one however two of the opposite 5 finalist tasks used the brand new American exascale supercomputer, Frontier, which launched earlier this yr at Oak Ridge Nationwide Lab (ORNL). And, past OceanLight and Frontier, earlier Top500-toppers Fugaku (RIKEN) and Summit (ORNL) each return to the listing underneath a number of finalist groups, together with Perlmutter (at NERSC, the Nationwide Vitality Analysis Scientific Computing Heart) and Shaheen-2 (at KAUST, the King Abdullah College of Science and Expertise).

And now: the finalist tasks.

Utilizing OceanLight to simulate thousands and thousands of atoms

This yr sees OceanLight return to the stage as the only supercomputer behind a paper titled 2.5 Million-Atom Ab Initio Digital-Construction Simulation of Advanced Metallic Heterostructures with DGDFT — a undertaking involving simulations of thousands and thousands of atoms that made use of tens of thousands and thousands of cores on OceanLight.

Summary: Over the previous three a long time, ab initio digital construction calculations of enormous, complicated and metallic techniques are restricted to tens of 1000’s of atoms in each numerical accuracy and computational effectivity on management supercomputers. We current a massively parallel discontinuous Galerkin density useful concept (DGDFT) implementation, which adopts adaptive native foundation capabilities to discretize the Kohn-Sham equation, leading to a block-sparse Hamiltonian matrix. A extremely environment friendly pole growth and chosen inversion (PEXSI) sparse direct solver is applied in DGDFT to realize O(^1.5) scaling for quasi two-dimensional techniques. DGDFT permits us to compute the digital buildings of complicated metallic heterostructures with 2.5 million atoms (17.2 million electrons) utilizing 35.9 million cores on the brand new Sunway supercomputer. Specifically, the height efficiency of PEXSI can obtain 64 PFLOPS (∼5 p.c of theoretical peak), which is unprecedented for sparse direct solvers. This accomplishment paves the best way for quantum mechanical simulations into mesoscopic scale for designing next-generation power supplies and digital units.

Per the SC22 schedule, this crew consists of researchers from the Chinese language Academy of Sciences, Peking College, the Pilot Nationwide Laboratory for Marine Science and Expertise, the Nationwide Analysis Heart of Parallel Pc Engineering and Expertise, the Qilo College of Expertise and the College of Science and Expertise of China.

“Our crew is extremely excited [to be] nominated for the Gordon Bell Prize finalists as we began preparation for this work since final yr,” stated Qingcai Jiang, a researcher on the College of Science and Expertise of China (USTC), in an electronic mail to HPCwire. “Our work for the primary time achieves plane-wave precision digital construction calculation for large-scale complicated metallic heterostructures containing 2.5 million atoms (17.2 million electrons), and our optimization strategies make our work capable of obtain peak efficiency of 64 PFLOPS (∼5 p.c of theoretical peak), which is unprecedented for sparse direct solvers.”

Frontier powers biomedical literature analytics

The primary of tasks powered by Frontier, titled ExaFlops Biomedical Data Graph Analytics, additionally made use of ORNL’s earlier chart-topper, Summit, and focuses on large-scale mining of biomedical analysis literature.

The Frontier supercomputer, one in all six represented within the Gordon Bell Prize finalists.

Summary: We’re motivated by newly proposed strategies for mining large-scale corpora of scholarly publications (e.g., full biomedical literature), which consists of tens of thousands and thousands of papers spanning a long time of analysis. On this setting, analysts search to find relationships amongst ideas. They assemble graph representations from annotated textual content databases after which formulate the relationship-mining downside as an all-pairs shortest paths (APSP) and validate connective paths in opposition to curated biomedical information graphs (e.g., SPOKE). On this context, we current COAST (Exascale Communication-Optimized All-Pairs Shortest Path) and reveal 1.004 EF/s on 9,200 Frontier nodes (73,600 GCDs). We develop hyperbolic efficiency fashions (HYPERMOD), which information optimizations and parametric tuning. The proposed COAST algorithm achieved the reminiscence fixed parallel effectivity of 99 p.c within the single-precision tropical semiring. Wanting ahead, COAST will allow the mixing of scholarly corpora like PubMed into the SPOKE biomedical information graph.

Per the SC22 schedule, this crew consists of researchers from AMD, the Georgia Institute of Expertise, ORNL and the College of California, San Francisco.

“The flexibility to determine paths between any pair of biomedical ideas with the richness of PubMed in an inexpensive time has the potential to revolutionize biomedical analysis and apply nationwide analysis funds extra successfully,” stated Ramakrishnan Kannan, group chief for discrete algorithms at ORNL, in an electronic mail to HPCwire. “The comparability of data encoded inside SPOKE, which is essentially human-curated, in opposition to idea relationships that is likely to be mined routinely from a scholarly database like PubMed will end in quicker and automatic integration of biomedical data at scale.”

In line with the crew, this undertaking is “the primary exascale graph AI demonstration” to run at over one exaflops. “This primary demonstration of exascale computation pace will rework the best way we at present conduct search in complicated heterogeneous information graphs like SPOKE,” the analysis crew informed HPCwire. “Particularly, it would allow a brand new class of algorithms to be applied in graphs of unprecedented dimension and complexity. This may significantly enhance the standard of biomedical analysis inquiry, and speed up the time to affected person analysis and care like by no means earlier than.”

4 top-ten supercomputers allow plasma simulations

The second undertaking to make use of Frontier: Pushing the Frontier within the Design of Laser-Based mostly Electron Accelerators with Groundbreaking Mesh-Refined Particle-In-Cell Simulations on Exascale-Class Supercomputers. Although the title of the paper — which revolved round kinetic plasma simulations — winks at its use of Frontier, the crew really used 4 supercomputers: Frontier, Fugaku (RIKEN), Summit and Perlmutter (NERSC), which means that this one paper used 4 of the highest seven supercomputers on the newest Top500 listing. In an electronic mail to HPCwire, Jean-Luc Vay — a senior scientist at Lawrence Berkeley Nationwide Lab — outlined the science runs of the analysis, which have been performed on Frontier (as much as 8,192 nodes), Fugaku (as much as ~93,000 nodes) and Summit (as much as 4,096 nodes).

The Perlmutter supercomputer.

Summary: We current a first-of-kind mesh-refined (MR) massively parallel Particle-In-Cell (PIC) code for kinetic plasma simulations optimized on the Frontier, Fugaku, Summit, and Perlmutter supercomputers. Main improvements, applied within the WarpX PIC code, embody: (i) a 3 degree parallelization technique that demonstrated efficiency portability and scaling on thousands and thousands of A64FX cores and tens of 1000’s of AMD and Nvidia GPUs (ii) a groundbreaking mesh refinement functionality that gives between 1.5x to 4x financial savings in computing necessities on the science case reported on this paper, (iii) an environment friendly load balancing technique between a number of MR ranges. The MR PIC code enabled 3D simulations of laser-matter interactions on Frontier, Fugaku, and Summit, which have to this point been out of the attain of normal codes. These simulations helped take away a serious limitation of compact laser-based electron accelerators, that are promising candidates for subsequent technology high-energy physics experiments and ultra-high dose fee FLASH radiotherapy.

Per the SC22 schedule, this crew consists of researchers from Arm, Atos, CEA-Université Paris-Saclay, ENSTA Paris, GENCI, Lawrence Berkeley Nationwide Lab and RIKEN.

“Plasma accelerator applied sciences have the potential to offer particle accelerators which are far more compact than present ones, opening the door to thrilling novel functions in science, trade, safety and well being,” Vay defined. “Exploiting essentially the most highly effective supercomputers on the planet to spice up the analysis to make these complicated machines a actuality is so stimulating to all of us.”

“It’s thrilling for your entire crew to be chosen as finalist of the Gordon Bell Prize, even for the one in all us (Axel Huebl), for whom it’s ‘déjà vu’ as he was already a finalist in 2012 with one other (PIConGPU) crew,” Vay added. “It’s the vindication of years of exhausting work from the U.S. DOE Exascale Computing Undertaking contributors and longstanding collaborators from CEA Saclay in France, coupled to the more moderen exhausting work with colleagues from numerous labs and personal firms in France (Genci, Arm, Atos) and RIKEN in Japan.”

Geostatistics get a lift from Shaheen-2 and Fugaku

The exascale-enabled analysis solely constitutes half the listing. One other finalist paper — Reshaping Geostatistical Modeling and Prediction for Excessive-Scale Environmental Purposes — used Shaheen-2 in addition to Fugaku.

The Shaheen-2 supercomputer.

Summary: We prolong the potential of space-time geostatistical modeling utilizing algebraic approximations, illustrating application-expected accuracy worthy of double precision from majority low-precision computations and low-rank matrix approximations. We exploit the mathematical construction of the dense covariance matrix whose inverse motion and determinant are repeatedly required in Gaussian log-likelihood optimization. Geostatistics augments first-principles modeling approaches for the prediction of environmental phenomena given the supply of measurements at numerous places; nevertheless, conventional Cholesky-based approaches develop cubically in complexity, gating sensible extension to continental and international datasets now accessible. We mix the linear algebraic contributions of mixed-precision and low-rank computations inside a tilebased Cholesky solver with on-demand casting of precisions and dynamic runtime help from PaRSEC to orchestrate duties and information motion. Our adaptive strategy scales on numerous techniques and leverages the Fujitsu A64FX nodes of Fugaku to realize upto 12X efficiency speedup in opposition to the extremely optimized dense Cholesky implementation.

Per the SC22 schedule, this crew consists of researchers from KAUST, ORNL and the College of Tennessee. Maybe notably, the crew additionally consists of Jack Dongarra, one in all SC22’s keynote audio system.

“For our exploratory science runs, and to reveal the suitable accuracy of our algorithmic variations on Cholesky factorization and additional manipulation of large covariance matrices, we used Shaheen-2 at KAUST,” defined David Keyes, director of the Excessive Computing Analysis Heart at KAUST, in an electronic mail to HPCwire. “Shaheen-2 has solely 6,192 nodes, so we utilized to make use of Fugaku at RIKEN to scale additional and have been generously thought-about by RIKEN. Fugaku has 158,976 nodes, about 25 instances greater than Shaheen-2, and every node has 48 cores, 1.5 instances greater than a Shaheen-2 node. Nonetheless, every Fugaku node is provided with solely 32GB of reminiscence, one-quarter as a lot as Shaheen-2’s 128GB per node, thus solely one-sixth as a lot per core, which required us to make software program variations.”

“Getting into the Gordon Bell competitors was thrilling for the entire crew members, particularly the scholars and postdocs,” Keyes stated. “It supplied a possibility to run on the world’s second ranked laptop. The required algorithmic variations to structure led to enhancements in our instruments that shall be helpful in any respect scales. Extra importantly, the nomination created pleasure with the statistics neighborhood since 2022 seems to be the primary time after 35 years of the prize that any important spatial statistics computation, environmental or in any other case, has thus superior.”

Simulating earthquakes with Fugaku

The ultimate of Fugaku’s three appearances among the many finalist listing comes courtesy of Excessive Scale Earthquake Simulation with Uncertainty Quantification, which used the second-ranked system to advance scientific understanding of earthquakes and fields with comparable dynamics.

The Fugaku supercomputer.

Summary: We develop a stochastic finite component technique with ultra-large levels of freedom that discretize probabilistic and bodily areas utilizing unstructured second-order tetrahedral components with double precision utilizing a mixed-precision implicit iterative solver that scales to the total Fugaku system and permits quick Uncertainty Quantification (UQ). The developed solver designed to achieve excessive efficiency on quite a lot of CPU/GPU-based supercomputers enabled fixing 37 trillion degrees-of-freedom downside with 19.8 p.c peak FP64 efficiency on full Fugaku (89.8 PFLOPS) with 87.7 p.c weak scaling effectivity, akin to 224-fold speedup over the state-of-the-art solver working on full Summit. This technique, which has proven its effectiveness through fixing big (32-trillion degrees-of-freedom) sensible issues, is predicted to be a breakthrough in injury mitigation, and is predicted to facilitate the scientific understanding of earthquake phenomena and have a ripple impact on different fields that equally require UQ.

Per the SC22 schedule, this crew consists of researchers from Fujitsu, the Japan Company for Marine-Earth Science and Expertise, RIKEN and the College of Tokyo.

“We’re very completely satisfied to be chosen as finalists,” wrote Tsuyoshi Ichimura, a professor with the Earthquake Analysis Institute on the College of Tokyo, in an electronic mail to HPCwire. “We imagine that this has an awesome affect in displaying that functionality computing can contribute to an unprecedented Uncertainty Quantification (UQ).”

Leveraging Summit to look proteins

Final, however actually not least: Excessive-Scale Many-against-Many Protein Similarity Search, which used the Summit supercomputer to carry out protein similarity calculations throughout a whole bunch of thousands and thousands of proteins in only a few hours.

The Summit supercomputer.

Summary: Similarity search is without doubt one of the most elementary computations which are usually carried out on ever-increasing protein datasets. Scalability is of paramount significance for uncovering novel phenomena that happen at very giant scales. We unleash the ability of over 20,000 GPUs on the Summit system to carry out all-vs-all protein similarity search on one of many largest publicly accessible datasets with 405 million proteins, in lower than 3.5 hours, chopping the time-to-solution for a lot of use circumstances from weeks. The variability of protein sequence lengths, in addition to the sparsity of the house of pairwise comparisons, make this a difficult downside in distributed reminiscence. As a result of must assemble and preserve a knowledge construction holding indices to all different sequences, this software has an enormous reminiscence footprint that makes it exhausting to scale the issue sizes. We overcome this reminiscence limitation by revolutionary matrix-based blocking strategies, with out introducing further load imbalance.

Per the SC22 schedule, this crew consists of researchers from Indiana College, the Institute for Basic Biomedical Analysis, the Division of Vitality’s Joint Genome Institute, Lawrence Berkeley Nationwide Lab, Microsoft, NERSC and the College of California, Berkeley.

In an electronic mail to HPCwire, the crew pressured the significance of this analysis space to essential fields. “Many-against-many sequence search is the spine of organic sequence evaluation utilized in drug discovery, healthcare, bioenergy, and environmental research,” they wrote. “Our work is maybe the primary [Gordon Bell] finalist for a organic sequence evaluation downside, which is stunning as a result of sequence evaluation is an ideal supercomputing software resulting from its information and compute intensive nature.”

“Our pipeline, PASTIS, performs a novel software of sparse matrices to slender down the search house and to keep away from quadratic variety of sequence comparisons. Sparse matrix computations are a lot more durable to map effectively to trendy supercomputing {hardware}, particularly to GPU-equipped supercomputers such because the Summit system we’ve used on this work. Our strategy cuts again the turnaround time from days to minutes in discovering comparable sequences in big protein datasets to finish the following analytical steps in bioinformatics and permit for exploratory evaluation of knowledge units underneath completely different parameter settings.”

What’s subsequent

That’s all of them. For these protecting rating at dwelling: three finalist groups used Fugaku; three used Summit; two used Frontier; and OceanLight, Perlmutter and Shaheen-2 have been every utilized by one finalist crew. We’re nonetheless anticipating the reveal of the finalists for the Gordon Bell Particular Prize for Excessive Efficiency Computing-Based mostly Covid-19 Analysis, which shall be awarded for the third time at SC22. At SC22 itself — set to be held in Dallas from November 13-18 — the finalists for each Gordon Bell Prizes will current their analysis forward of the award ceremony.

Supply hyperlink

Leave a Reply

Your email address will not be published.