|
H5CPP
v1.14.0
Modern C++ templates for HDF5 serial and parallel I/O
|
|
User-facing attribute set for protobuf annotations on plain C++ structs. Vocabulary is intentionally identical to h5::* where the concept overlaps (rename, ignore, doc, version, alias, on_missing, name_all) — different namespace, same words. The protobuf-specific surface lives only under pb::*, with pb::field(N) echoing the runtime template pb::field<N, &T::m>{} so the same word means the same thing at the annotation layer and the descriptor layer.
C++17 attribute syntax today; one-line lift to C++26 typed annotations tomorrow.
| Surface today (C++17 standard-attribute) | C++26 reflection form |
|---|---|
[[pb::field(2)]] | [[=pb::field{2}]] |
[[pb::field(5, 6, 7)]] (variadic — variant member → oneof) | [[=pb::field{5, 6, 7}]] |
[[pb::wire(sint32)]] | [[=pb::wire{pb::spec::sint32}]] |
[[pb::adapter(Timestamp)]] | [[=pb::adapter{pb::adapter_kind::Timestamp}]] |
[[pb::ignore]] | [[=pb::ignore{}]] |
[[pb::name("on_wire")]] | [[=pb::name{"on_wire"}]] |
[[pb::doc("comment")]] | [[=pb::doc{"comment"}]] |
Only syntactic shift is (args) → {args} under the [[=...]] form. Names stay put.
pb:: namespaceThese attributes use vocabulary identical to h5::* (defined in h5cpp-compiler-scatter-gather-design.md §"Tier 1 Must-have" and §"Tier 2 High value, low cost"). They live in pb:: so the namespace stays self-contained for pb-only users; a user wanting both backends writes both [[h5::name(...)]] and [[pb::name(...)]] (typically with the same string).
| Attribute | Purpose | Example |
|---|---|---|
[[pb::name("on_wire_name")]] | Rename a field or message for wire emission; decouples C++ identifier from .proto field name. Drives the field name in the emitted .proto and the JSON serialization name (unless pb::json_name overrides). Wire bytes themselves use tag numbers, not names — so a rename is .proto-schema-level. | [[pb::name("temperature_k"), pb::field(2)]] double temperature; |
[[pb::ignore]] | Skip this field entirely. Maps to pb::ignore<&T::m>{} in the emitted descriptor; field absent from the .proto; runtime never touches it. | [[pb::ignore]] int debug_counter; |
[[pb::on_missing(value)]] | Default value the decoder writes into the C++ member when the field is absent on the wire. proto3 has zero-defaults on the wire format itself, so this drives the C++-side initializer in the generated shim, not the schema. | [[pb::field(3), pb::on_missing(0.0)]] double sample_rate; |
| Attribute | Purpose | Example |
|---|---|---|
[[pb::doc("description")]] | Trailing // comment on the field in the emitted .proto. Round-trips through protoc into the FileDescriptorSet.SourceCodeInfo. Self-documenting schemas. | [[pb::doc("nanoseconds since epoch"), pb::field(1)]] std::uint64_t ts; |
[[pb::alias("old_name")]] | Backward-compat for schema evolution. Emitted as reserved "old_name"; at message scope so the legacy name can never be reused. | [[pb::name("temp"), pb::alias("temperature_c"), pb::field(2)]] float temp; |
[[pb::name_all("snake_case" \| "camelCase" \| "PascalCase")]] | Class-level naming convention. Protobuf style guide calls for snake_case fields; name_all("snake_case") flips bookmarkUrl (C++) to bookmark_url (wire) uniformly. Per-field pb::name overrides. | struct [[pb::name_all("snake_case")]] user_event_t { ... }; |
| Attribute | Purpose |
|---|---|
[[pb::version(N)]] | Schema version; emitted as option (pb.schema_version) = N; at message scope. Lets the decoder consult a version policy on read. |
The full universal list mirrors h5cpp-compiler-scatter-gather-design.md §"User-Facing Attribute System". Any universal h5::* attribute not listed above has no protobuf semantics (e.g. h5::chunk, h5::compress are HDF5-storage concerns).
Without these, the protobuf producer either can't emit a valid .proto (field numbers are mandatory) or loses access to features that are core to proto3 semantics (oneof, alternate wire types, well-known-type adapters, map auto-detection).
| Attribute | Purpose | Example |
|---|---|---|
[[pb::field(N)]] | Required for every wire-emitted field. proto3 mandates explicit numbers; auto-assignment is disallowed because source reordering would break wire compatibility. Range [1, 2^29 - 1]. Argument may be an integer literal OR an enum value of any underlying type convertible to std::uint32_t — see §6 for the rationale and worked example. | [[pb::field(2)]] std::int32_t id; or [[pb::field(user_profile_tag::id)]] std::int32_t id; |
[[pb::field(N1, N2, N3, ...)]] | Variadic form on std::variant members. Compiler sees a variant + multiple tags → emits pb::oneof<&T::v, pb::alt<T1,N1>, pb::alt<T2,N2>, ...>{}. Group name defaults to the C++ member identifier; override with pb::oneof_name(...). Variant alternative 0 must be std::monostate (the absent state). Same int-or-enum flexibility as the single-tag form. | [[pb::field(5, 6, 7)]] std::variant<std::monostate, std::string, std::int64_t, double> payload; |
[[pb::wire(spec)]] | Pin the wire encoding when the C++ type → proto3 mapping isn't natural. Valid spec: sint32, sint64 (zigzag), fixed32, fixed64 (4/8-byte LE unsigned), sfixed32, sfixed64 (4/8-byte LE signed). | [[pb::field(3), pb::wire(sint32)]] std::int32_t delta; |
[[pb::adapter(name)]] | Route the field through pb::<name>_adapter to bridge a non-protobuf C++ type to a well-known-type message. Today: Timestamp (std::chrono::system_clock::time_point ↔ google.protobuf.Timestamp), Duration (std::chrono::duration ↔ google.protobuf.Duration). User-extensible by defining a new pb::Foo_adapter. | [[pb::field(4), pb::adapter(Timestamp)]] std::chrono::system_clock::time_point when; |
Map fields auto-detect from std::map<K,V> / std::unordered_map<K,V> via is_pb_map_v (shipped on 3-pb-map). The emitted pb::field<N, &T::m>{} is unchanged; the runtime trait decides what to do. No new attribute required for maps at tier 1.
| Attribute | Purpose | Example |
|---|---|---|
[[pb::reserved(N1, N2, "old_name")]] | Class-level. Reserves tag numbers and/or names so they can't be reused by future fields — proto3's standard schema-evolution mechanism. Mixed numbers and names allowed in one attribute. Per-field reservation isn't a thing in proto3; if listed at field scope, the compiler aggregates into one message-level reserved clause. | struct [[pb::reserved(10, 11, "obsolete")]] event_t { ... }; |
[[pb::packed]] | Force packed wire encoding for std::vector<scalar>. proto3 defaults numerics to packed already; this is explicit-intent documentation. Useful for std::vector<bool> (which is packed but where users sometimes forget). | [[pb::field(2), pb::packed]] std::vector<bool> flags; |
[[pb::deprecated]] | Marks the field with [deprecated = true] in the emitted .proto. Pure metadata; pb.hpp's wire path treats it as a comment. | [[pb::field(3), pb::deprecated]] std::int32_t legacy_count; |
[[pb::package("foo.bar")]] | Class-level. Emits package foo.bar; at the top of the generated .proto. Default: anonymous package, which is functional but generally undesirable for any project that ships beyond one binary. | struct [[pb::package("com.vargalabs.events")]] event_t { ... }; |
[[pb::oneof_name("explicit")]] | Override the auto-derived oneof group name (which otherwise mirrors the C++ variant-member identifier). Useful when the C++ name is private/internal and the wire-side name should be different. | [[pb::field(5, 6, 7), pb::oneof_name("payload_kind")]] std::variant<...> _v; |
[[pb::map_key_wire(spec)]] / [[pb::map_value_wire(spec)]] | Override the wire encoding for a map's key or value. Deferred to v1.1 per the 3-pb-map design call ("natural mapping only" was Steven's selection 2026-05-22). Listed here for taxonomy completeness; do not implement before user demand surfaces. | *(deferred)* |
| Attribute | Purpose |
|---|---|
[[pb::json_name("alternateJsonName")]] | Override the JSON serialization name — drives protoc's json_name = "..." field option. Useful when matching an existing JS/Python API naming convention without touching the C++ name. |
[[pb::unknown_field_set(preserve \| skip)]] | Per-class policy. preserve (future) round-trips unknown fields on decode; skip (current default per FR7) drops them. Tracks the open v1.1 decision in tasks/pb-feature-coverage-and-gaps.md §5. |
[[pb::target_syntax(proto3 \| edition2023)]] | Class-level. Selects the .proto syntax line. Default: proto3. edition2023 lands when we move past libprotoc 25.1 to 26+. |
[[pb::descriptor_set_out("file.desc")]] | Class-level. Emit a binary FileDescriptorSet (the .desc / .pb file) alongside the .proto source. Matches protoc's --descriptor_set_out. Useful for runtime reflection bridges. |
[[pb::service("ServiceName")]] | Class-level, advanced. If the struct contains members of type std::function<Response(Request)> (or a similar trait-detectable RPC shape), emit a service ServiceName { ... } block with RPC declarations. Wire path untouched. Own design pass warranted before implementation. |
| Attribute | Purpose |
|---|---|
[[pb::encode_with(fn)]] / [[pb::decode_with(fn)]] | Custom per-field codec functions. Lighter alternative to defining a full pb::<Name>_adapter struct — pass a free function directly. Lands as a thin adapter_field variant in the descriptor. |
[[pb::tier(N)]] | Escape hatch for the future tier classifier — force a class into a specific tier when auto-detection is wrong. Mirrors h5::tier(N) from the scatter-gather doc. |
[[pb::reject]] | Compile-error if the protobuf producer is asked to emit this type. Counterpart of h5::reject. Lets a class be HDF5-only without ever leaking onto a wire. |
Same pattern as h5:: for HDF5: class-level attributes set defaults; field-level overrides them.
C++26 ships P2996R13 (Reflection for C++26, voted in June 2025) and P3394R4 (Annotations for Reflection, adopted at Sofia 2025). Under the typed annotation form, the same names above are constructor calls of structural-type values.
Implementation sketch for the pb:: value types (each must be a structural type — final struct, public data members, constexpr constructors):
Under C++17 attribute syntax [[pb::field(2)]], h5cpp-compiler's Clang-Tooling backend parses the namespace-scoped attribute directly (no clang::annotate envelope). The implementation work is mechanical — a single namespace-aware attribute matcher swapped in for the current per-string parser.
Under C++26 typed annotations [[=pb::field{2}]], the value gets reflected via std::meta::annotations_of(^member) and read at constexpr time from inside pb.hpp itself — h5cpp-compiler becomes an optional convenience, not a required tool.
C++26 annotations require the annotated value to be of structural type (no virtual functions, no mutable, no private non-static data, literal type for constexpr construction). The sketches above are designed to satisfy this — fixed-size std::array buffers instead of std::string, public data members, constexpr constructors. Verification needed before publishing the C++26 syntax against an actual g++-16 build, paralleling the verification gate in the scatter-gather doc §"Structural-type prerequisite".
pb::field(...) accepts integer literals, enum values, and any mix of the two. The compiler (Phase B) folds whatever you pass to its underlying std::uint32_t at AST-evaluation time — Clang's Expr::EvaluateAsInt(ctx) handles integer literals and enum constant references uniformly — and the emitted descriptor sees plain integers either way. Under C++26 reflection (Phase C) the same flexibility lives in the templated pb::field constructor in §5; same vocabulary, same payload.
The runtime is unaffected — the emitted pb::field<2, &T::m>{} template gets a plain integer regardless of which form the user wrote.
Enums collapse a class's tag assignments into a single named source of truth. The typical pattern:
The benefits, vs. raw integers:
user_profile_tag::scores updates every annotation referencing it — same C++ name-binding as any other identifier; same IDE support as any other rename.// reserved (was: legacy_email) comments where they belong. Pairs naturally with [[pb::reserved(...)]] at class scope.[[pb::field(other_struct_tag::name)]]) compile fine under the permissive default — but tooling, linters, or future opt-in modes can layer stricter binding on top without the library getting in the way.Variant/oneof works the same way — the variadic pb::field(...) form takes any mix of enum and integer arguments:
[[pb::reserved(...)]] accepts the same forms:
The library accepts integers, enums, or any mix. No #define-gated strict mode, no class-level "must reference this enum" binding. Whether to commit to an enum convention is a project-level decision; the library stays out of it.
We are the guides — this is the user's story. The library's job is to clear the path, not to choose the route.
Here is the example from the 3-pb-map push, transcoded from today's clang::annotate form to the proposed pb:: attribute vocabulary:
30-pb-producer)Independently of which annotation phase the user is on, the tag argument itself can be an enum value rather than a raw integer. This is the convention recommended in §6 for projects that want a single named source of truth for tag assignments:
Mix and match is allowed — [[pb::field(my_tag:/home/steven/projects/vargalabs-workspace/tasks/h5cpp-compiler-pb-attribute-taxonomy.md:name)]] next to [[pb::field(7)]] next to [[pb::field(other_tag::foo, 11)]]. The library doesn't care; the compiler folds everything to integers before emission.
What h5cpp-compiler emits is the same regardless of whether the user wrote raw integers, enum values, or any mix — the descriptor is in terms of compile-time integer NTTPs:
Per Steven's call on 2026-05-22:
pb::*** is the canonical namespace when pb.hpp is used directly — the common case. This doc specifies it.h5::proto::*** stays in tasks/h5cpp-compiler-multi-backend-architecture.md as the namespace for the multi-backend roof's protobuf sub-scope (alongside h5::sql::*, h5::json::*, h5::avro::*).A future cleanup task can either (a) teach the multi-backend roof to also accept pb::* directly, or (b) leave the two scopes distinct. No work needed now.
In practice the user-facing impact is small — anyone using pb.hpp standalone writes [[pb::field(N)]]; anyone using the multi-backend roof writes [[h5::proto::field(N)]]. The two names mean the same thing in their respective scope; they don't compete.
Tier-by-tier delta between today's 30-pb-producer and this proposed surface:
| Attribute | Today | Status |
|---|---|---|
pb::field(N) | pb::field=N (string-encoded via clang::annotate) | ✔ Implemented; needs syntax-form migration to standard-attribute |
pb::field(N1, N2, ...) (variadic for variant→oneof) | pb::oneof_tags=N1,N2,... (separate string-encoded annotation) | ◇ Implemented under a different name; needs consolidation |
pb::wire(spec) | pb::wire=spec | ✔ Implemented |
pb::adapter(name) | pb::adapter=Name | ✔ Implemented |
Universal pb::name(...) | not recognized | ✘ Missing |
Universal pb::ignore | not recognized; runtime pb::ignore<> exists in pb.hpp but the compiler doesn't emit it from annotations | ✘ Missing |
Universal pb::doc(...) | not recognized | ✘ Missing |
Universal pb::on_missing(...) | not recognized | ✘ Missing |
pb::reserved, pb::packed, pb::deprecated, pb::package, pb::oneof_name, pb::alias, pb::name_all — all require a proper .proto text emitter alongside the existing pb::meta::descriptor_t<T> emitter. The descriptor emitter exists; the .proto emitter is the next major piece.
pb::json_name, pb::unknown_field_set, pb::target_syntax, pb::descriptor_set_out, pb::service, pb::encode_with/pb::decode_with, pb::tier, pb::reject — all deferred.
Self-contained refactor of src/consumer_pb.hpp:181-244. The existing parse_pb_*_attr_ family wraps clang::AnnotateAttr lookups; the new path needs a namespace-scoped-attribute matcher that recognizes pb::* attributes parsed by Clang into the AST directly (no clang::annotate envelope). Mechanical work; would unblock dropping the clang::annotate form entirely.
consumer_pb.hpp to recognize standard-attribute syntax [[pb::field(N)]] etc. alongside the existing [[clang::annotate("pb::field=N")]]. Both forms accepted during a transition window; new form preferred in docs and tests. Zero impact on emitted descriptor output. Argument resolution uses Expr::EvaluateAsInt(ctx) so integer literals and enum constant references (§6) are handled by the same code path — no second pass needed.pb::name, pb::ignore, pb::doc, pb::on_missing). Today these are unsupported; recognizing them costs ~one helper per attribute and feeds the upcoming .proto emitter..proto text emitter as a sibling to the existing descriptor emitter. Same AST walk, second producer. Drives the need for class-level attributes (pb::package, pb::name, pb::reserved, pb::version) and the universal pb::doc / pb::alias propagation..proto emitter. Each is small once the foundation exists.