|
H5CPP
v1.14.0
Modern C++ templates for HDF5 serial and parallel I/O
|
|
User-facing attribute set for Avro annotations on plain C++ structs. Vocabulary is intentionally identical to h5::*, json::*, msgpack::*, cbor::*, and bson::* where the concept overlaps (rename, ignore, doc, alias, required) — different namespace, same words. The Avro-specific surface lives only under avro::*, with logical types (datetime, timestamp, decimal, uuid, date, time) and the fixed type being the only backend-specific keywords.
C++17 attribute syntax today; one-line lift to C++26 typed annotations tomorrow.
| Surface today (C++17 standard-attribute) | C++26 reflection form |
|---|---|
[[avro::name("on_wire")]] | [[=avro::name{"on_wire"}]] |
[[avro::ignore]] | [[=avro::ignore{}]] |
[[avro::required]] | [[=avro::required{}]] |
[[avro::datetime]] | [[=avro::datetime{}]] |
[[avro::timestamp]] | [[=avro::timestamp{}]] |
[[avro::decimal(10, 2)]] | [[=avro::decimal{10, 2}]] |
[[avro::uuid]] | [[=avro::uuid{}]] |
[[avro::date]] | [[=avro::date{}]] |
[[avro::time]] | [[=avro::time{}]] |
[[avro::fixed("Name")]] | [[=avro::fixed{"Name"}]] |
[[avro::doc("description")]] | [[=avro::doc{"description"}]] |
[[avro::alias("Name")]] | [[=avro::alias{"Name"}]] |
Only syntactic shift is (args) → {args} under the [[=...]] form. Names stay put.
avro:: namespaceThese attributes use vocabulary identical to h5::*, json::*, msgpack::*, cbor::*, and bson::*. They live in avro:: so the namespace stays self-contained for Avro-only users; a user wanting multiple backends writes [[h5::name(...)]], [[json::name(...)]], [[msgpack::name(...)]], [[cbor::name(...)]], [[bson::name(...)]], and [[avro::name(...)]] (typically with the same string).
| Attribute | Purpose | Example |
|---|---|---|
[[avro::name("on_wire_name")]] | Rename a field for the Avro wire format. Decouples C++ identifier from the record field name used during encode/decode. Drives the key string in the emitted descriptor's json_name field. | [[avro::name("display_name")]] std::string label; |
[[avro::ignore]] | Skip this field entirely. Property absent from the descriptor's fields[] array; runtime never encodes or decodes it. | [[avro::ignore]] int debug_counter; |
[[avro::required]] | Field must be present during deserialization. In Avro, all non-optional fields are implicitly required; this flag is metadata for the runtime. | [[avro::required]] std::int32_t id; |
| Attribute | Purpose | Example |
|---|---|---|
[[avro::doc("description")]] | Emitted as the doc pointer in the field descriptor. Self-documenting generated code; future tooling may extract it for schema documentation. | [[avro::doc("nanoseconds since epoch")]] std::uint64_t ts; |
[[avro::alias("Name")]] | Class-level. Emitted as the alias[] string in the descriptor. The C++ type name still drives the template specialization; the alias is metadata for tooling. | struct [[avro::alias("Session")]] session_t { ... }; |
The full universal list mirrors h5cpp-compiler-h5-attribute-taxonomy.md §2 and h5cpp-compiler-pb-attribute-taxonomy.md §2. Any universal attribute not listed above has no Avro semantics (e.g. h5::chunk, h5::compress are HDF5-storage concerns; pb::field(N), pb::wire are protobuf-wire concerns; json::format, json::pattern are JSON Schema validation concerns; msgpack::ext is a MessagePack-specific concern; cbor::tag is a CBOR-specific concern; bson::binary is a BSON-specific concern).
Without avro::datetime, avro::timestamp, avro::decimal, avro::fixed, and avro::uuid, the Avro backend cannot express Avro's rich logical-type system — a core requirement for any Avro codec.
| Attribute | Purpose | Example |
|---|---|---|
[[avro::datetime]] | Field-level. Forces the field to avro_type_t::timestamp_millis (logical type on long). Auto-detected for std::chrono::system_clock::time_point without the attribute. | [[avro::datetime]] std::int64_t created_at; |
[[avro::timestamp]] | Field-level. Forces the field to avro_type_t::timestamp_micros (logical type on long). | [[avro::timestamp]] std::int64_t updated_at; |
[[avro::decimal(P, S)]] | Field-level. Forces the field to avro_type_t::decimal with precision P and scale S (std::uint8_t each). Avro decimal is a logical type on bytes or fixed. If scale is omitted, it defaults to 0. | [[avro::decimal(10, 2)]] double price; |
[[avro::uuid]] | Field-level. Forces the field to avro_type_t::uuid (logical type on string). | [[avro::uuid]] std::string id; |
[[avro::date]] | Field-level. Forces the field to avro_type_t::date (logical type on int, days since epoch). | [[avro::date]] std::int32_t birth_date; |
[[avro::time]] | Field-level. Forces the field to avro_type_t::time_millis (logical type on int, milliseconds since midnight). | [[avro::time]] std::int32_t opening_time; |
[[avro::fixed("Name")]] | Field-level. On std::array<std::uint8_t, N>, forces the field to avro_type_t::fixed with the given name and size N (inferred from the array type). An optional second argument validates the size: [[avro::fixed("Name", 16)]]. | [[avro::fixed("MD5")]] std::array<std::uint8_t, 16> hash; |
Datetime semantics. std::chrono::system_clock::time_point is auto-detected as timestamp_millis even without [[avro::datetime]]. The emitted descriptor carries avro_type_t::timestamp_millis. The runtime serializes the time_point as milliseconds since Unix epoch, encoded as a signed 64-bit integer with the Avro timestamp-millis logical type.
Decimal semantics. Avro decimal follows IEEE 754-2008 decimal128 semantics in many implementations. The descriptor carries decimal_precision and decimal_scale in the field_desc. The runtime converts the C++ numeric value into the appropriate decimal encoding (typically a big-endian byte sequence interpreted as an unscaled integer).
Fixed semantics. Avro fixed is a named, sized binary type. Unlike bytes (variable-length), fixed has exactly size bytes on the wire. The descriptor carries fixed_name and fixed_size. The name is required because Avro schemas reference fixed types by name.
| C++ type | avro_type_t | Avro physical type | Notes |
|---|---|---|---|
bool | boolean | boolean | |
char, signed char, short, int | int32 | int | |
long, long long | int64 | long | |
unsigned char, unsigned short, unsigned int | int32 | int | Unsigned ≤ 32-bit widened to signed 32-bit |
unsigned long, unsigned long long | int64 | long | Unsigned 64-bit widened to signed 64-bit |
float | float32 | float | |
double, long double | float64 | double | long double truncated to 64-bit |
std::string | string | string | UTF-8 |
std::vector<unsigned char> | bytes | bytes | Variable-length binary |
std::array<unsigned char, N> | bytes or fixed | bytes or fixed | fixed if [[avro::fixed]] present; otherwise bytes |
std::vector<T> | array | array | item descriptor points to element type |
T[N] (C array) | array | array | Same emission as std::vector<T> |
std::map<K,V> | map | map | key and value descriptors; Avro requires string keys |
std::optional<T> | optional | [null, T] union | item descriptor points to inner type; required is always false |
enum class | int32 | int | Emitted as underlying integer type; Avro native enum is a future enhancement |
Nested struct S | object | record | Recursively serialized as nested Avro record |
std::chrono::system_clock::time_point | timestamp_millis | long with timestamp-millis logicalType | Auto-detected |
std::variant<...> | null | null | Gap. Not yet implemented. |
The compiler emits a self-contained C++ header defining avro::meta::descriptor<T> specializations. The runtime (deferred to a future issue) will include these headers and walk the descriptors at encode/decode time.
Key differences from the BSON/MessagePack/CBOR descriptors:
binary_subtype (BSON-specific) or ext_type (MessagePack-specific) or tag_type (CBOR-specific).fixed_name and fixed_size support Avro's named fixed types.decimal_precision and decimal_scale support Avro's decimal logical type.std::optional<T> emits as avro_type_t::optional with item pointer; the runtime interprets this as the Avro union [null, T].Example specialization for a record with logical types and fixed:
| Attribute | Where read | Where emitted | Test fixture |
|---|---|---|---|
avro::ignore | h5_attr_reader::has_attr(fld, "avro::ignore") | Skips field in fields[] | avro_primitives |
avro::required | h5_attr_reader::has_attr(fld, "avro::required") | Sets required = true in field desc | avro_primitives, avro_strings |
avro::name("...") | h5_attr_reader::read_field_string(fld, "avro::name") | Overrides json_name in field desc | avro_strings |
avro::doc("...") | h5_attr_reader::read_class_string(node, "avro::doc") | Emitted as doc pointer in field desc | avro_primitives, avro_nested |
avro::alias("...") | h5_attr_reader::read_class_string(node, "avro::alias") | Emitted as alias[] in descriptor | avro_primitives |
avro::datetime | h5_attr_reader::has_attr(fld, "avro::datetime") | Emits avro_type_t::timestamp_millis; also auto-detected for std::chrono::time_point | avro_datetime |
avro::timestamp | h5_attr_reader::has_attr(fld, "avro::timestamp") | Emits avro_type_t::timestamp_micros | avro_datetime |
avro::decimal(P, S) | h5_attr_reader::read_field_ints(fld, "avro::decimal") | Emits avro_type_t::decimal with precision/scale | avro_decimal |
avro::uuid | h5_attr_reader::has_attr(fld, "avro::uuid") | Emits avro_type_t::uuid | *(coverage in doc example)* |
avro::date | h5_attr_reader::has_attr(fld, "avro::date") | Emits avro_type_t::date | *(coverage in doc example)* |
avro::time | h5_attr_reader::has_attr(fld, "avro::time") | Emits avro_type_t::time_millis | *(coverage in doc example)* |
avro::fixed("Name", N) | h5_attr_reader::read_field_string(fld, "avro::fixed") + read_field_ints | Emits avro_type_t::fixed with fixed_name and fixed_size | avro_fixed |
| Attribute | Reason |
|---|---|
avro::on_missing | Avro has no schema-level default-value mechanism in the descriptor. Absence semantics live in the runtime decoder. (Same as JSON, MessagePack, CBOR, and BSON.) |
avro::chunk | HDF5 storage concern. |
avro::compress | HDF5 storage concern. |
avro::serialize_full | HDF5 tier-1 emission concern. |
avro::format | JSON Schema validation concern. |
avro::pattern | JSON Schema validation concern. |
avro::min / avro::max | JSON Schema validation concern. |
avro::version | No Avro schema format to version in the descriptor. |
avro::name_all | No wire naming convention needed; Avro uses record field names. |
avro::ext | MessagePack-specific concern. |
avro::tag | CBOR-specific concern. |
avro::binary | BSON-specific concern. Avro uses bytes and fixed instead. |
event_t carries alias = "Event" from [[avro::alias("Event")]]. The C++ template specialization still uses sn::sensor::event_t; the alias is metadata.id → avro_type_t::int64 with required = true from [[avro::required]].when → avro_type_t::timestamp_millis because std::chrono::system_clock::time_point is auto-detected. The runtime will encode it as milliseconds since epoch with the Avro timestamp-millis logical type.price → avro_type_t::decimal with decimal_precision = 10 and decimal_scale = 2 from [[avro::decimal(10, 2)]].hash → avro_type_t::fixed with fixed_name = "MD5" and fixed_size = 16 from [[avro::fixed("MD5")]] on std::array<std::uint8_t, 16>. The size is inferred from the array type.debug_counter → absent entirely (ignore).label → avro_type_t::string. Standard UTF-8 string.readings → avro_type_t::array with item = &item_1 where item_1.type = float64. The runtime walks the array, encoding each element as Avro double.flags → avro_type_t::optional with item = &opt_2 where opt_2.type = uint16. The runtime emits Avro null when the optional is empty, or the uint16 value (widened to int32 per Avro convention) when present. required is false regardless of any [[avro::required]] attribute because the field is std::optional.doc pointer is nullptr on all fields because no [[avro::doc]] was applied at field scope. Class-level doc is not wired into field descriptors today.The h5cpp architectural pattern is compiler emits descriptors → runtime consumes descriptors → I/O happens. The Avro backend follows this exactly.
Same rationale as HDF5, JSON, MessagePack, CBOR, and protobuf backends:
.cpp bloat: Descriptors are constexpr tables.The actual runtime will use a lightweight custom encoder/decoder (not an external library like avro-c++) to maintain the h5cpp philosophy of minimal dependencies and zero-copy where possible.