Data Understanding

  • DIY is a block-parallel library for implementing scalable algorithms that can execute both in-core and out-of-core.

  • GraphBLAS is an open effort, including an API, to define standard building blocks for graph algorithms in the language of linear algebra.

  • ParaView is an open-source multiple-platform application for interactive, scientific visualization. Catalyst, its in situ use case library, orchestrates the delicate alliance between simulation, analysis, and visualization tasks.

  • Tess is a parallel Delaunay and Voronoi tessellation library. It includes support for density estimation.

  • VisIt is an open-source interactive, scalable visualization, animation, and analysis tool. libsim enables its use in situ with the simulations.

  • VTK-m is a toolkit of scientific visualization algorithms for emerging processor architectures. It supports the fine-grained concurrency for data analysis and visualization algorithms required to drive extreme scale computing.

  • EDDA is a distribution based data analysis and visualization software for in situ analytics. Based on Gaussian Mixture Models, Probability Distributions, and Information Theory, EDDA provides both C++ and Python APIs that can help scientists preserve salient information from their simulation output while delivering high quality visualization and achieving significant data reduction.

  • FTK is a library that scales, simplifies, and delivers feature tracking algorithms for scientific datasets.

  • MFA is an open-source package for modeling scientific data with functional approximations based on high-dimensional multivariate B-spline and NURBS bases.

  • Henson is a cooperative multitasking system for in situ processing, supporting programmable iterative execution of tasks.

  • Decaf is a dataflow system for the parallel communication of coupled tasks in an HPC workflow.

  • SENSEI is a system for scalable in situ analysis, visualization and code coupling, and is part of the DOE HPC Data-Vis SDK.

  • Seer is a lightweight wrapper library for enabling customized in situ capabilities to simulations.

Platform Readiness

  • Roofline is a visually-intuitive performance model and set of tools developed to understand how computation, data movement, and locality constrain performance on modern multicore, manycore, and GPU-accelerated systems.

  • TAU Performance System is a portable profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C, C++, UPC, Java, Python, and others. SOSflow (Scalable Observation System for Scientific Workflows) provides a flexible, scalable, and programmable framework for observation, introspection, feedback, and control of HPC applications. APEX (Autonomic Performance Environment for eXascale) is a profiling and tracing library for asynchronous user-level threading systems without conventional call stacks or callgraphs. APEX can capture performance data and adapt runtime behavior for applications written in HPX, OpenMP, OpenACC, Kokkos, CUDA, HIP/ROCm, and more.

  • Papyrus is a programming system that provides features for scalable, aggregate, persistent memory in an extreme-scale system for typical HPC usage scenarios. Papyrus provides a portable and scalable programming interface to access and manage parallel data structures on the distributed NVM storage.

  • DRAGON is a solution that enables all classes of GP-GPU applications to transparently compute on terabyte datasets residing in NVM while ensuring the integrity of data buffers as necessary for NVM. DRAGON leverages the page-faulting mechanism on the recent NVIDIA GPUs by extending capabilities of CUDA Unified Memory (UM). Further, DRAGON improves overall performance by dynamically optimizing accesses to NVM.

  • OpenARC is the first open-sourced, OpenACC/OpenMP compiler supporting Altera FPGAs, in addition to NVIDIA/AMD GPUs and Intel Xeon Phis. OpenARC has various additional directives/environment variables for internal tracing and architecture-specific optimizations. Combined with its built-in tuning tools, OpenARC allows users to control overall OpenACC-to-accelerator translation and optimization in a fine-grained, but still abstract manner, offering very high tunability.

  • Clacc is an OpenACC-to-OpenMP4 translation framework, which builds on clang’s existing OpenMP compiler/runtime support and allows OpenACC programs to be compiled by the production-quality clang/LLVM programming system. OpenACC support in clang/LLVM will facilitate the programming of GPUs and other accelerators in DOE applications, and it will provide a popular compiler platform on which to perform research and development for related optimizations and tools (e.g., static analyzers, debuggers, editor extensions).

  • CSPACER is an efficient lightweight communication runtime for distributed memory computing. It implements the space consistency abstraction and can be used to implement custom communication patterns for irregular communication patterns.

  • CIVL is a formal software correctness verification tool, using symbolic execution and model checking.

  • Orio is an extensible autotuning framework that supports the rapid definition of new domain languages and code generators and enables efficient exploration of the optimization search space through a variety of numerical optimization strategies.

  • Performance Portability Solution is a collection of tools that provide mechanisms to (1) unify code variants meant for different devices; (2) orchestrate data movement between devices; and (3) generate a map of computation to devices through coarse dependency analysis. The tools can work with codes written in any programming language.

Scientific Data Management

  • ADIOS provides a simple, flexible way for scientists to describe the data in their code that may need to be written, read, or processed outside of the running simulation. By providing an external to the code XML file describing the various elements, their types, and how you wish to process them this run, the routines in the host code (either Fortran or C) can transparently change how they process the data.

  • DataSpaces is a middleware library and runtime providing asynchronous coupling of codes using RDMA for memory-memory data transfer.

  • Darshan is a toolkit for characterizing the I/O behavior of applications, used in production at many DOE compute facilities.

  • FastBit is an open-source data processing library following the spirit of NoSQL movement. It offers a set of searching functions supported by compressed bitmap indexes. It treats user data in the column-oriented manner, and is able to accelerate user's data selection tasks without imposing undue requirements.

  • HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of data types and is designed for flexible and efficient I/O and high volume and complex data. HDF5 is portable and is extensible, allowing applications to evolve in their use of HDF5. The HDF5 Technology suite includes tools and applications for managing, manipulating, viewing, and analyzing data in the HDF5 format.

  • MGARD (MultiGrid Adaptive Reduction of Data) is a technique for multilevel lossy compression of scientific data based on the theory of multigrid methods.

  • Mochi is an open ecosystem enabling the development of a variety of distributed services supporting the data-related needs of DOE scientists.

  • PnetCDF is a high performance, parallel I/O library for storing and accessing data in the NetCDF format.

  • ROMIO is a portable implementation of the I/O portion of the MPI standard, included in most vendor MPI implementations.

Artificial Intelligence / Machine Learning

  • DeepHyper is a scalable, open-source software package for automated machine/deep learning. It comprises two components: Neural Architecture Search (NAS) for fully-automated search for high-performing deep neural network architectures; Hyperparameter Search (HPS) for optimizing hyperparameters for a given reference model.

  • AutoMOMML is an end-to-end, machine-learning-based framework to build predictive models for objectives such as performance, and power. The framework adopts statistical approaches to reduce the modeling complexity and automatically identifies and configures the most suitable learning algorithm to model the required objectives based on hardware and application signatures.

  • CAGNET (Communication-Avoiding Graph Neural nETworks) is a family of parallel algorithms for training GNNs that can asymptotically reduce communication compared to previous parallel GNN training methods. CAGNET algorithms are based on 1D, 1.5D, 2D, and 3D sparse-dense matrix multiplication, and are implemented with torch.distributed on GPU-equipped clusters. We also implement these parallel algorithms on a 2-layer GCN.

  • PCNN (Parallel Convolutional Neural Network) is designed to run CNN model training on multiple CPUs in parallel. It implementation uses the MPI and OpenMP programming model.