Overview

H5CPP is a modern C++ interface to HDF5 for persisting vectors, matrices, tensors, strings, and structs with concise template-based code, while preserving full interoperability with the native HDF5 C API. In practice, it removes much of the needless plumbing that direct HDF5 use in C and C++ so often entails. From research and simulation to market data and production, storage lies on the critical path of many quantitative systems. With With LLVM-based compiler-assisted reflection, even complex structs can often be handled with a one-line expression. Direct HDF5 use in C and C++ often means:

H5CPP began from a practical requirement: efficient storage for large numerical datasets with both indexed block access and sequential streaming. Existing serialization systems handled streams reasonably well, but not the kind of partial, multidimensional, file-backed access needed in numerical computing and financial engineering. HDF5 already had most of the right storage primitives: partial I/O, extendable datasets, compression, and broad interoperability across operating systems and scientific environments. What was missing was a modern C++ interface.

The earliest implementation began as a collection of templates. Later, contact with The HDF Group helped shape the first H5CPP11 project, followed in time by the C++17 version. I am especially thankful to Gerd Heber for his sustained guidance and generosity over the years; to Elena Pourmal and David Pareah for their encouragement, support, and influence on the project’s direction; and, from Fermilab, to Mark Paterno and Chris for their thoughtful input. Many of the stronger ideas in H5CPP were sharpened through those discussions; any mistakes, omissions, or rough edges are entirely my own.

What You End Up Writing by Hand

  • manual datatype and dataspace construction
  • repetitive shape bookkeeping
  • verbose read/write setup
  • resource lifetime management
  • impedance mismatch with modern C++ container types
  • added complexity when moving to MPI-enabled parallel I/O

Where That Complexity Shows Up

  • structured event data such as trades, quotes, fills, and order updates
  • dense numerical datasets such as returns, features, factors, and signal matrices
  • simulation, backtesting, and numerical research code
  • shared datasets reused across research, replay, and production workflows

What You Get with H5CPP

  • fast to integrate
  • natural in modern C++
  • safer through RAII
  • well suited to structured records
  • well suited to dense numerical arrays
  • ready to scale from local workflows to parallel I/O

H5CPP has been presented in HDF5 and C++ community venues over multiple years, including HUG sessions, HDF Group events, C++ community talks, and ISC-related material. Topics include compiler-assisted reflection, POD introspection, MPI/parallel I/O, throughput/latency trade-offs, and practical HDF5 workflows.