|
H5CPP
v1.14.0
Modern C++ templates for HDF5 serial and parallel I/O
|
|
User-facing attribute set for HDF5 annotations on plain C++ structs. Vocabulary is intentionally identical to pb::* where the concept overlaps (rename, ignore, doc, version, alias, on_missing, name_all) — different namespace, same words. The HDF5-specific surface lives only under h5::*, with h5::chunk and h5::compress being storage-layer concerns that have no protobuf counterpart.
C++17 attribute syntax today; one-line lift to C++26 typed annotations tomorrow.
| Surface today (C++17 standard-attribute) | C++26 reflection form |
|---|---|
[[h5::name("on_disk")]] | [[=h5::name{"on_disk"}]] |
[[h5::ignore]] | [[=h5::ignore{}]] |
[[h5::chunk(256)]] | [[=h5::chunk{256}]] |
[[h5::compress("gzip", 6)]] | [[=h5::compress{h5::algo::gzip, 6}]] |
[[h5::doc("description")]] | [[=h5::doc{"description"}]] |
[[h5::alias("Session")]] | [[=h5::alias{"Session"}]] |
[[h5::version("1")]] | [[=h5::version{"1"}]] |
[[h5::name_all("pfx_", "_sfx")]] | [[=h5::name_all{"pfx_", "_sfx"}]] |
[[h5::on_missing("ignore")]] | [[=h5::on_missing{h5::missing_policy::ignore}]] |
[[h5::serialize_full]] | [[=h5::serialize_full{}]] |
Only syntactic shift is (args) → {args} under the [[=...]] form. Names stay put.
h5:: namespaceThese attributes use vocabulary identical to pb::* (defined in tasks/h5cpp-compiler-pb-attribute-taxonomy.md §2). They live in h5:: so the namespace stays self-contained for HDF5-only users; a user wanting both backends writes both [[h5::name(...)]] and [[pb::name(...)]] (typically with the same string).
| Attribute | Purpose | Example |
|---|---|---|
[[h5::name("on_disk_name")]] | Rename a field for on-disk storage. Decouples C++ identifier from H5Tinsert field name. Drives the column name in HDF5 compound types and the dataset member name in VLEN layouts. | [[h5::name("temp_K")]] double temperature; |
[[h5::ignore]] | Skip this field entirely. No H5Tinsert emitted; field absent from the generated row_t mirror and the compound type. | [[h5::ignore]] int debug_counter; |
| Attribute | Purpose | Example |
|---|---|---|
[[h5::doc("description")]] | Trailing // comment in the generated header. Self-documenting generated code. | [[h5::doc("nanoseconds since epoch")]] std::uint64_t ts; |
[[h5::alias("Name")]] | Alternative name for the generated namespace. The C++ template specialization still uses the real qualified type name; the alias only affects h5::generated::Name_::row_t and h5::generated::Name_::compound_type(). | struct [[h5::alias("Session")]] session_t { ... }; |
[[h5::name_all("prefix", "suffix")]] | Class-level naming convention. Applies prefix + field_name + suffix to every field's on-disk name. Per-field h5::name overrides. | struct [[h5::name_all("sn_", "")]] sensor_t { ... }; |
| Attribute | Purpose |
|---|---|
[[h5::version("N")]] | Schema version; emitted as a // version: "N" comment in the generated header. Pure metadata today; future h5cpp library versions may expose it through H5Oget_comment or custom attributes. |
[[h5::on_missing("error" \| "create" \| "ignore")]] | Class-level. Runtime behavior when the target dataset does not exist. "create" (default) creates a chunked, extendable dataset. "error" throws std::runtime_error. "ignore" returns early from scatter<T> / gather<T>. Only affects tier-2 scatter/gather emission where the compiler emits dataset open/create logic. |
The full universal list mirrors tasks/h5cpp-compiler-pb-attribute-taxonomy.md §2. Any universal pb::* attribute not listed above has no HDF5 semantics (e.g. pb::field(N), pb::wire, pb::adapter are protobuf-wire concerns).
Without these, the HDF5 backend either can't emit valid VLEN storage (chunk is mandatory for extendable datasets) or loses access to features that are core to HDF5 storage semantics (compression filters).
| Attribute | Purpose | Example |
|---|---|---|
[[h5::chunk(N)]] | Class-level. Dataset chunk size for tier-2 scatter/gather emission. Required because VLEN datasets must be chunked in HDF5. Defaults to 64 if omitted. | struct [[h5::chunk(256)]] session_t { ... }; |
[[h5::compress("gzip", level)]] | Class-level. Add a compression filter to the dataset creation property list. Today only "gzip" is supported, emitting H5Pset_deflate(dcpl, level). Future algorithms (shuffle, szip, zstd) layer in as additional string → H5P dispatch entries. | struct [[h5::compress("gzip", 6)]] session_t { ... }; |
**compress algorithm inference.** If the user writes [[h5::compress(6)]] (integer only, no algorithm string), the compiler infers "gzip". This is the common case and avoids forcing users to quote "gzip" for the overwhelming majority of usage.
| Attribute | Purpose | Example |
|---|---|---|
[[h5::serialize_full]] | Class-level. Force register_struct<T> (POD compound-type) emission even when the struct contains std::vector or std::string fields. Non-POD fields are implicitly skipped (as if [[h5::ignore]]); the resulting compound type has size sizeof(T) and no members, effectively serializing the full struct memory layout as an opaque blob. Useful when the user wants raw-byte round-trip fidelity and accepts platform-dependent memory layout. | struct [[h5::serialize_full]] raw_session_t { std::uint64_t ts; std::vector<double> v; }; |
Same pattern as pb:: for protobuf: class-level attributes set defaults; field-level overrides them.
Field-level h5::name overrides h5::name_all. In the example above:
timestamp_ns → on-disk name sn_timestamp_ns (from name_all)label → on-disk name lbl (from name, overriding name_all)debug_samples → skipped entirely (from ignore)readings → on-disk name sn_readings (from name_all)Class-level h5::doc, h5::alias, h5::version are emitted as C++ comments in the generated header. They do not affect HDF5 API calls.
Class-level h5::chunk, h5::compress, h5::on_missing affect the scatter<T> / gather<T> specializations emitted for tier-2 types. They have no effect on tier-1 register_struct<T>() emission because register_struct only creates a compound type, not a dataset.
The compiler classifies every matched struct into one of three tiers before emission:
| Tier | Criteria | Emission path | Attributes that apply |
|---|---|---|---|
| Tier 1 (POD) | All fields are builtin scalars, enums, fixed-size arrays, or nested POD structs. No std::vector, no std::string. | register_struct<T>() — H5Tcreate(H5T_COMPOUND, sizeof(T)) + H5Tinsert per field. | name, ignore, name_all, doc, alias, version, serialize_full (no-op on already-POD types) |
| Tier 2 (scatter) | Contains at least one std::vector<T> (where T is scalar/enum) or std::string field; all other fields are POD-compatible. | scatter<T>() + gather<T>() — row_t mirror with hvl_t, chunked extendable dataset, VLEN compound type. | All attributes |
| Invalid | Contains unsupported field types (e.g. std::vector<std::vector<T>>, pointers, non-POD nested classes without serialize_full). | Skipped — no emission. | None |
**serialize_full escape hatch.** When present at class scope, serialize_full bypasses tier classification and forces tier-1 emission. Non-POD fields are silently skipped. The emitted compound type has sizeof(T) but zero members, making it an opaque blob:
C++26 ships P2996R13 (Reflection for C++26) and P3394R4 (Annotations for Reflection). Under the typed annotation form, the same names above are constructor calls of structural-type values.
Implementation sketch for the h5:: value types (each must be a structural type — final struct, public data members, constexpr constructors):
Under C++17 attribute syntax [[h5::chunk(256)]], h5cpp-compiler's Clang-Tooling backend parses the namespace-scoped attribute directly (today via the rewriter → clang::annotate envelope). The implementation work to drop the rewriter is mechanical — a single namespace-aware attribute matcher swapped in for the current per-string parser.
Under C++26 typed annotations [[=h5::chunk{256}]], the value gets reflected via std::meta::annotations_of(^member) and read at constexpr time from inside h5cpp itself — h5cpp-compiler becomes an optional convenience, not a required tool.
C++26 annotations require the annotated value to be of structural type (no virtual functions, no mutable, no private non-static data, literal type for constexpr construction). The sketches above are designed to satisfy this — fixed-size std::array buffers instead of std::string, public data members, constexpr constructors.
Observations from the emitted code:
timestamp_ns → sn_timestamp_ns (name_all applied)label → lbl (name overrides name_all)debug_samples → absent entirely (ignore)readings → sn_readings (name_all applied)on_missing("ignore") → early return in both scatter and gather when dataset absentalias("Session") → generated namespace is h5::generated::Session_ instead of sn__sensor__session_t_chunk(256) + compress("gzip", 6) → emitted in the dataset-creation branch (not shown above for brevity)Per Steven's architecture:
h5::*** is the canonical namespace when h5cpp is used directly — the common case. This doc specifies it.h5::proto::*** stays in tasks/h5cpp-compiler-multi-backend-architecture.md as the namespace for the multi-backend roof's protobuf sub-scope.A user writing pure HDF5 code uses [[h5::name(...)]], [[h5::chunk(...)]], etc. A user writing multi-backend code uses [[h5::proto::field(N)]] for protobuf concerns and [[h5::name(...)]] for HDF5 concerns on the same struct. The two scopes don't compete.
| Attribute | Today | Status |
|---|---|---|
h5::name("x") | Rewriter + consumer + producer wired for tier-1 and tier-2 | ✔ Complete |
h5::ignore | Rewriter + consumer + producer wired for tier-1 and tier-2 | ✔ Complete |
h5::chunk(N) | Class-level int read; passed to scatter_type_impl; emits H5Pset_chunk | ✔ Complete |
h5::compress("gzip", level) | Class-level string+int read; emits H5Pset_deflate | ✔ Complete |
| Attribute | Today | Status |
|---|---|---|
h5::doc("...") | Class-level string read; emitted as // doc: "..." comment | ✔ Complete |
h5::alias("Name") | Class-level string read; emitted as comment; drives generated namespace name | ✔ Complete |
h5::version("N") | Class-level string read; emitted as // version: "N" comment | ✔ Complete |
h5::name_all("pre", "suf") | Class-level strings read; applied to all field on-disk names; per-field h5::name overrides | ✔ Complete |
h5::on_missing("error" \| "ignore" \| "create") | Class-level string read; modifies scatter/gather exists logic | ✔ Complete |
h5::serialize_full | Class-level flag; forces tier-1 emission; non-POD fields implicitly skipped | ✔ Complete |
No additional attributes are currently in scope for issue #32. Potential future additions (not designed, not requested):
| Attribute | Potential Purpose |
|---|---|
h5::shuffle | Enable byte-shuffle filter before compression (common pre-filter for h5::compress). |
h5::fletcher32 | Enable Fletcher32 checksum on the dataset. |
h5::scaleoffset(int, int) | Enable scale-offset filter for lossy compression of floating-point data. |
h5::dimension_labels("time", "channel") | Label dataset dimensions for self-describing data cubes. |
h5::reject | Compile-error if the HDF5 producer is asked to emit this type. Counterpart of pb::reject. |
h5::tier(N) | Escape hatch for the tier classifier — force a class into a specific tier when auto-detection is wrong. |
h5_attr_translator.hpp) + AST reader (h5_attr_reader.hpp). Rewrites [[h5::xxx(...)]] → [[clang::annotate("h5::xxx", ...)]]. Generic enough to share across backends.consumer.hpp) — tier classification (utils::determine_tier), attribute reading, routing to correct emission path.producer_h5.hpp) — register_struct for tier-1, scatter_type / gather_type for tier-2. Wired name, ignore, chunk, compress.doc, alias, version, name_all, on_missing, serialize_full. All wired into both tier-1 and tier-2 paths.