Graph Algorithms in the Language of Linear Algebra Approaches C++ Standardization
September 24, 2020 • Article
September 24, 2020—Graph algorithms and large-scale machine learning (ML) algorithms are key components of high-performance computing (HPC) analysis of large datasets, such as in intelligence, power grid analysis, health care, genetics, and chemistry. A number of different HPC graph frameworks are written in C++, but there is currently no agreement on the application programming interface (API) that should be used to develop graph analysis applications. A project headed by the SEI Emerging Technology Center’s (ETC) research scientist Scott McMillan recently took a step toward standardizing graph algorithm application development in C++.
The GraphBLAS, Basic Linear Algebra Subprograms for Graphs, spearheaded by McMillan and collaborators from industry, government, and academia, is a community-driven, open programming specification for graph analysis. The specification makes the development of high-performance graph algorithms simpler and hardware agnostic by defining the algorithms in terms of higher-level linear algebraic operations.
“Graph computations are fundamental to many important defense and national security applications,” said Matt Gaston, director of the ETC. These applications include logistics, intelligence, and mission planning. “GraphBLAS enables standardized and scalable development of graph applications as well as the ready adoption of future advanced computing capabilities.”
This summer, the ETC released version 3.0 of the GraphBLAS Template Library (GBTL), which adds functionality to the implementation of GraphBLAS in C++, one of the most common high-performance programming languages. “A number of researchers and developers performing graph analysis work have requested a C++ library because they have large-scale systems that are written in C++,” said McMillan. Some of these researchers, from the Department of Energy, are working large-scale graph and machine-learning problems on behalf of the U.S. government.
The ETC, with past help from Indiana University, initially developed the GBTL almost five years ago as a testbed for the development of the GraphBLAS API for the C programming language. With recent help from the Department of Energy’s Pacific Northwest National Laboratory (PNNL) and Carnegie Mellon University’s Electrical and Computer Engineering Department, the ETC has been working to improve the performance of the GBTL.
Though GraphBLAS was originally intended to facilitate graph analysis, its ability to perform a large class of linear algebra operations means it has many other potential applications. GraphBLAS can be used for applications requiring special algebraic operations based on semirings. It can perform these operations on both dense matrices as well as large sparse and hypersparse matrices populated by a relatively small amount of data, such as occur in social networks, cybersecurity, and biology.
The complexities of these special algebraic operations, coupled with different matrix storage types, make the proposed GraphBLAS API especially suited for efficient and more maintainable implementation in C++. “The efficient implementations in C are much more complicated,” said McMillan, drawing a comparison with the C programming language, which the team is also using. “A C implementation can take more than 10 times the amount of code to implement than in C++.”
Another beneficiary of GraphBLAS’ mathematical specification and the GBTL implementation is Spiral, a project led by CMU’s Franz Franchetti. Spiral aims to automatically generate code that will allow platform developers to realize faster and cheaper high-performance artificial intelligence and ML applications on leading-edge hardware architectures. The SEI and CMU team is expanding Spiral’s capabilities to tackle the irregular, data-intensive computations required for graphs, artificial intelligence (AI), and ML algorithms that can be expressed with GBTL.
With the updated library in hand, the GraphBLAS Forum committee is currently creating a formal specification for a C++ API. When complete, the library and API together will allow application developers to start building portable graph analysis programs in C++.
“Scott’s contributions to the GraphBLAS community in general and his specific contributions to the creation of both the GraphBLAS C and C++ APIs make the power of GraphBLAS available to software developers in popular languages for high-performance application development,” said Gaston. “Drawing on the collective expertise and knowledge of the Software Engineering Institute, Scott provides leadership and deep expertise to the GraphBLAS community, including a strong mathematical foundation, extensive software engineering knowledge and experience, and a passion for scaling mission-critical technologies.”
The next step for the SEI and CMU GraphBLAS team is to further improve the performance of GBTL for multithreaded systems. Version 3.0 of the library already significantly improved the single-threaded runtime of the GraphBLAS primitive operations, or standardized building blocks of graph algorithms. But there is still much room for improvement. “When the C++ specification is released, the plan is for GBTL to be a quality reference implementation, not only in terms of functionality but also in terms of speed of computation,” said McMillan.
The GBTL v3.0 is available at https://github.com/cmu-sei/gbtl. Learn more about the GraphBLAS Forum at http://graphblas.org. Read more about GraphBLAS from McMillan and his collaborators at https://resources.sei.cmu.edu/library/asset-view.cfm?assetid=644365, and read McMillan’s SEI Blog posts at https://insights.sei.cmu.edu/author/scott-mcmillan/.