|
H5CPP
v1.14.0
Modern C++ templates for HDF5 serial and parallel I/O
|
|
Compiler-assisted reflection — h5cpp's strategy for turning user-defined C++ types into HDF5 compound descriptors without intrusive macros. C++26 horizon, today's h5cpp-compiler, and the POD shortcut.
C++ doesn't (yet) ship reflection in its standard library. h5cpp needs reflection to map user-defined struct types onto HDF5 H5T_COMPOUND descriptors automatically — without forcing users to write boilerplate registration macros for every type.
H5CPP's answer is two parallel paths to the same user-facing surface, organised by what the compiler in your hand can do:
| Path | Mechanism | C++ standard needed | What you write |
|---|---|---|---|
| Native reflection (header-only, future) | std::meta::* from P2996 + annotations from P3394 | C++26 | Just your struct. Optionally [[=h5::name{"x"}]] annotations. |
| External tooling (today) | h5cpp-compiler (Clang-based AST walker) | C++17 / 20 / 23 | Just your struct. Pre-build step emits the descriptor. |
| POD macro (today, no tooling) | H5CPP_REGISTER_STRUCT(T) macro at runtime registration time | C++17+ | Your struct + one macro call. |
The user-visible API (h5::write(fd, "x", my_struct)) is identical across all three. Migrating from h5cpp-compiler to the C++26 path when your compiler catches up requires changing no application code.
Top-down — from the most general future state, down to the narrowest current shortcut:
C++26 will (finally) ship language-level reflection. Two proposals drive it:
std::meta::* introspection: walk a type's non-static data members, query their types, names, offsets, all at constexpr time.[[=annotation_value]], readable through std::meta::annotations_of.Combined, these are enough for h5cpp to walk any user type at constexpr time and emit the HDF5 compound descriptor with no external tool:
At constexpr time, h5::compound_type_for<record>() walks std::meta::nonstatic_data_members_of(^^record), reads each member's std::meta::offset_of, std::meta::type_of, std::meta::identifier_of, and any attached std::meta::annotations_of, then assembles the H5T_COMPOUND descriptor as a static const hid_t per type (lazy, one-shot).
Key strategic point: the annotation handles are the same value types users already use at call sites. h5::chunk{1024} works the same way attached to a field (C++26) as it does inside an h5::write(...) call (today). One vocabulary, two usage sites — the syntax envelope changes ((...) at call sites, {...} inside [[=...]]), but the semantics don't.
Timeline: P2996 expected to land in C++26 (committee draft 2025, ratification 2026). GCC 16.1+, Clang 21+, MSVC 2026 are the likely first compilers with the feature complete. h5cpp is ready to add a h5cpp/reflection/ header layer the day the first major compiler ships a complete implementation.
See h5cpp-compiler Multi-Backend Architecture for the broader strategy document, including the multi-backend rollout plan that the C++26 reflection path will eventually feed.
The same descriptor that C++26 reflection will eventually emit inline, today's h5cpp-compiler emits via Clang Tooling at pre-build time. The user-facing experience is identical — write your struct, build, get HDF5 read/write support — but the type walker runs in a separate process.
Project:
vargalabs/h5cpp-compiler— Clang LibTooling-based AST walker. Runs as a pre-build step, emits the descriptor + scatter/gather specialisations as a header the main h5cpp dispatch picks up automatically.
For a non-trivial type — one with std::vector, std::string, nested compounds, std::map<scalar, vector>, etc. — the compiler emits three pieces of generated code:
gather<T> specialisation that walks the struct, packs variable-length fields into hvl_t relays, and hands the compound buffer to H5Dwrite — zero-copy on write (hvl_t.p points directly into vector.data())scatter<T> specialisation that reverses the process after H5Dread — one copy on read (HDF5's VLEN allocator produces buffers; scatter .assign()s into the user vector, then H5Treclaims)Attribute on record / its fields | Emitted into |
|---|---|
[[h5::dataset("/records")]] | Default storage path baked into the call-site dispatch |
[[h5::version("2.1")]] | @version attribute on the dataset; consumed by on-read schema-migration logic |
[[h5::doc(...)]] (class) | Dataset-level description attribute |
[[h5::name("ID")]] | HDF5 field name "ID" instead of "id" in the compound |
[[h5::index]] | Side-band index registration in gather |
[[h5::doc(...)]] (field) | Per-field description attribute |
[[h5::chunk(1024)]] | default_dcpl_for<record>() chunk shape |
[[h5::gzip(8)]] \| h5::shuffle | Filter pipeline composed into the dcpl |
[[h5::name("display_name")]] | HDF5 field name override |
[[h5::on_missing("default")]] | scatter falls back to T{} when the on-disk record lacks this field — backward-compat with v1 readers |
[[h5::tag("schema_v2")]] | Schema migration tag — drives the multi-backend producer to emit equivalent versioning in Protobuf / JSON Schema / SQL |
[[h5::ignore]] | Field omitted from the staging compound; not persisted |
h5cpp-compiler reads C++ attributes attached to fields to drive per-field behaviour. The same vocabulary that becomes annotations under C++26:
| Attribute | Effect |
|---|---|
[[h5::name("x")]] | Override the on-disk field name |
[[h5::ignore]] | Skip this field — don't include in the compound |
[[h5::chunk(1024)]] | Set chunk shape (per-dataset, applied to vector fields) |
[[h5::gzip(8)]] | Compress this field |
[[h5::on_missing("default")]] | Behaviour when an older file lacks this field |
[[h5::tag("schema_v2")]] | Per-field schema tag for migrations |
[[h5::doc("...")]] | Attach a documentation string visible to other backends |
h5cpp-compiler walks the type once but can emit artefacts for multiple backends in the same pass: HDF5 compound descriptor, Protobuf .proto, JSON Schema, SQL DDL, Avro. Same struct, many on-disk and over-the-wire forms — one source of truth. See h5cpp-compiler Multi-Backend Architecture for the design.
Zero-copy is guaranteed for the write side of any tier. On the read side:
| Type tier | Read-side zero-copy? | Why |
|---|---|---|
| POD (contiguous) | ✔ | H5Dread writes into the destination directly |
| Non-POD with VLEN | ✘ (one copy) | HDF5's VLEN allocator owns the buffer; scatter assigns to user vector then calls H5Treclaim |
The read-side copy on tier-2 types can be eliminated later via H5Pset_vlen_mem_manager, but that's follow-up work. Don't promise zero-copy reads for non-POD types — promise zero-copy writes and one-copy reads.
H5CPP_REGISTER_STRUCT)For pure POD structs — trivially-copyable, standard-layout, no virtuals, no private members beyond plain data — neither C++26 reflection nor the external compiler is strictly necessary. The in-memory layout already equals the on-disk layout, so a single runtime registration macro is enough:
The macro:
h5::dt_t<sample> to build an H5T_COMPOUND whose field offsets come from offsetof(sample, field) and whose field types come from dt_t<decltype(field)>h5::write(fd, "/x", samples) uses the registered compound exactly as it would for any built-in typeThe macro works on any type satisfying:
| Constraint | Check |
|---|---|
| Trivially copyable | std::is_trivially_copyable_v<T> |
| Standard layout | std::is_standard_layout_v<T> |
| No virtuals | (subset of standard-layout) |
No std::vector / std::string / smart pointers in fields | Those require scatter/gather (tier 2) |
All fields recognisable to dt_t<F> | arithmetic, enum, nested registered struct, std::array<T,N>, std::complex<T> |
If any of these are violated, switch to the C++26 path or the external h5cpp-compiler. The static_assert inside H5CPP_REGISTER_STRUCT will tell you which constraint you missed.
| You have… | Use… |
|---|---|
| C++26 compiler (when shipped) | C++26 reflection (header-only, zero external tools) |
| C++17/20/23 + a clang toolchain available | h5cpp-compiler (full reflection, all type tiers) |
| C++17/20/23 + only POD types to persist | H5CPP_REGISTER_STRUCT (no external tools, one macro per type) |
| Already on h5cpp-compiler, migrating to C++26 | No application code change — same vocabulary, syntax envelope changes |
h5::* attribute vocabulary the HDF5 backend consumes