X-Stack FP: ComPort

For showing good ComPort by ensuring that numerically-intense code can be reliably ported across heterogeneous hardware and software.

About

Website for the DOE X-Stack award for reliable software addressing upcoming machines.

RESEARCH TEAM

* University of Utah: PI: Ganesh Gopalakrishnan, Co-PI: Pavel Panchekha
* University of California, Davis: PI: Cindy Rubio-Gonzalez
* University of Washington, Seattle: PI: Zachary Tatlock
* Lawrence Livermore National Laboratory: PI: Ignacio Laguna
* Pacific Northwest National Laboratory: PI: Ang Li

RESEARCH SUMMARY As part of the recently announced DOE-investment in Research on Adapting Scientific Software to Run on Next-Generation Supercomputers, the ComPort project develops Rigorous Testing Methods to Safeguard Software Porting.
Modern research is crucially based on the use of high-performance computing (HPC) working in tandem with machine learning (ML). With growing heterogeneity of hardware (CPUs, GPUs, accelerators) and software (various parallelism models), the overall numerical integrity of code can be affected by a variety of causes, not all of which are fully understood or even have been encountered in existing HPC systems. Problems such as innocuous-looking changes to compiler optimization flags leading to aberrant climatic predictions are already being faced by researchers. Similar variations can cause ML systems to misclassify data or program states, leading to incorrect HPC software behaviors in combined HPC/ML systems already in use, such as for characterizing the Sars COV-2 virus.
The ComPort project develops rigorous methods to verify – after each software upgrade or port – whether computational results agree with expected answers (say, as delivered by prior versions that have stood the test of time). It empowers the user to define how to rigorously test the numerical behavior of hardware and software, and also specify what results to accept. The ComPort software tool suite will support all this while also providing a high degree of automation and high-level user feedback to diagnose and repair software applications to facilitate software numerical correctness maintenance despite changing hardware and compilers.

This diagram from CACM February 2021, ``Keeping Science on Keel when Software Moves,’’ is one example of several that might be produced to characterize large codebases. This visual shows how a climate simulation code called CESM was analyzed and portrayed by Dan Millroy (LLNL postdoc) during his PhD (visualization credit: Liam Krauss of LLNL). This figure shows a three-dimensional, undirected representation of the example from Figure 6 in the CACM paper. Nodes are colored by community membership and sized based on a threshold centrality value. The red nodes represent model variables sensitive to specific CPU instructions. All nodes with eigenvector centrality ≤ 0.4 have a constant size, and those above the threshold are scaled and highlighted by increased reflectance.

X-Stack FP: ComPort

X-Stack FP: ComPort

About

Blog Posts

Welcome Blog

test blog post 00

Test Title

test blog post 01

Test Title