Overview
H5CPP is a modern C++ interface to HDF5 for persisting vectors, matrices, tensors, strings, and structs with concise template-based code, while preserving full interoperability with the native HDF5 C API. In practice, it removes much of the needless plumbing that direct HDF5 use in C and C++ so often entails. From research and simulation to market data and production, storage lies on the critical path of many quantitative systems. With With LLVM-based compiler-assisted reflection, even complex structs can often be handled with a one-line expression. Direct HDF5 use in C and C++ often means:
H5CPP began from a practical requirement: efficient storage for large numerical datasets with both indexed block access and sequential streaming. Existing serialization systems handled streams reasonably well, but not the kind of partial, multidimensional, file-backed access needed in numerical computing and financial engineering. HDF5 already had most of the right storage primitives: partial I/O, extendable datasets, compression, and broad interoperability across operating systems and scientific environments. What was missing was a modern C++ interface.
The earliest implementation began as a collection of templates. Later, contact with The HDF Group helped shape the first H5CPP11 project, followed in time by the C++17 version. I am especially thankful to Gerd Heber for his sustained guidance and generosity over the years; to Elena Pourmal and David Pareah for their encouragement, support, and influence on the project’s direction; and, from Fermilab, to Mark Paterno and Chris for their thoughtful input. Many of the stronger ideas in H5CPP were sharpened through those discussions; any mistakes, omissions, or rough edges are entirely my own.
What You End Up Writing by Hand
- manual datatype and dataspace construction
- repetitive shape bookkeeping
- verbose read/write setup
- resource lifetime management
- impedance mismatch with modern C++ container types
- added complexity when moving to MPI-enabled parallel I/O
Where That Complexity Shows Up
- structured event data such as trades, quotes, fills, and order updates
- dense numerical datasets such as returns, features, factors, and signal matrices
- simulation, backtesting, and numerical research code
- shared datasets reused across research, replay, and production workflows
What You Get with H5CPP
- fast to integrate
- natural in modern C++
- safer through RAII
- well suited to structured records
- well suited to dense numerical arrays
- ready to scale from local workflows to parallel I/O
H5CPP has been presented in HDF5 and C++ community venues over multiple years, including HUG sessions, HDF Group events, C++ community talks, and ISC-related material. Topics include compiler-assisted reflection, POD introspection, MPI/parallel I/O, throughput/latency trade-offs, and practical HDF5 workflows.