|
H5CPP
v1.14.0
Modern C++ templates for HDF5 serial and parallel I/O
|
|
User-facing attribute set for BSON annotations on plain C++ structs. Vocabulary is intentionally identical to h5::*, json::*, msgpack::*, cbor::*, and pb::* where the concept overlaps (rename, ignore, doc, alias, required) — different namespace, same words. The BSON-specific surface lives only under bson::*, with MongoDB-native types: datetime, timestamp, decimal, and binary(N).
C++17 attribute syntax today; one-line lift to C++26 typed annotations tomorrow.
| Surface today (C++17 standard-attribute) | C++26 reflection form |
|---|---|
[[bson::name("on_wire")]] | [[=bson::name{"on_wire"}]] |
[[bson::ignore]] | [[=bson::ignore{}]] |
[[bson::required]] | [[=bson::required{}]] |
[[bson::datetime]] | [[=bson::datetime{}]] |
[[bson::timestamp]] | [[=bson::timestamp{}]] |
[[bson::decimal]] | [[=bson::decimal{}]] |
[[bson::binary(4)]] | [[=bson::binary{4}]] |
[[bson::doc("description")]] | [[=bson::doc{"description"}]] |
[[bson::alias("Name")]] | [[=bson::alias{"Name"}]] |
Only syntactic shift is (args) → {args} under the [[=...]] form. Names stay put.
bson:: namespaceThese attributes use vocabulary identical to h5::*, json::*, msgpack::*, cbor::*, and pb::*. They live in bson:: so the namespace stays self-contained for BSON-only users; a user wanting multiple backends writes [[h5::name(...)]], [[json::name(...)]], [[msgpack::name(...)]], [[cbor::name(...)]], and [[bson::name(...)]] (typically with the same string).
| Attribute | Purpose | Example |
|---|---|---|
[[bson::name("on_wire_name")]] | Rename a field for the BSON wire format. Decouples C++ identifier from the map key used during encode/decode. Drives the key string in the emitted descriptor's json_name field. | [[bson::name("display_name")]] std::string label; |
[[bson::ignore]] | Skip this field entirely. Property absent from the descriptor's fields[] array; runtime never encodes or decodes it. | [[bson::ignore]] int debug_counter; |
[[bson::required]] | Field must be present during deserialization. The runtime can use this to emit an error (or a default value) when the key is absent from the BSON document. | [[bson::required]] std::int32_t id; |
| Attribute | Purpose | Example |
|---|---|---|
[[bson::doc("description")]] | Emitted as the doc pointer in the field descriptor. Self-documenting generated code; future tooling may extract it for schema documentation. | [[bson::doc("nanoseconds since epoch")]] std::uint64_t ts; |
[[bson::alias("Name")]] | Class-level. Emitted as the alias[] string in the descriptor. The C++ type name still drives the template specialization; the alias is metadata for tooling. | struct [[bson::alias("Session")]] session_t { ... }; |
The full universal list mirrors h5cpp-compiler-h5-attribute-taxonomy.md §2 and h5cpp-compiler-pb-attribute-taxonomy.md §2. Any universal attribute not listed above has no BSON semantics (e.g. h5::chunk, h5::compress are HDF5-storage concerns; pb::field(N), pb::wire are protobuf-wire concerns; json::format, json::pattern are JSON Schema validation concerns; msgpack::ext is a MessagePack-specific concern; cbor::tag is a CBOR-specific concern).
Without bson::datetime, bson::timestamp, bson::decimal, and bson::binary, the BSON backend cannot express MongoDB's native extended types — a core requirement for any BSON codec.
| Attribute | Purpose | Example |
|---|---|---|
[[bson::datetime]] | Field-level. Forces the field to bson_type_t::utc_datetime. Auto-detected for std::chrono::system_clock::time_point without the attribute, but explicit [[bson::datetime]] guarantees the type regardless of the C++ type. | [[bson::datetime]] std::int64_t created_at; |
[[bson::timestamp]] | Field-level. Forces the field to bson_type_t::timestamp. BSON Timestamp is an internal MongoDB type (not a user-facing datetime); it carries both a seconds component and an increment. | [[bson::timestamp]] std::uint64_t op_ts; |
[[bson::decimal]] | Field-level. Forces the field to bson_type_t::decimal128. Maps any C++ numeric type to IEEE 754-2008 decimal128. | [[bson::decimal]] double price; |
[[bson::binary(N)]] | Field-level. Forces std::vector<std::uint8_t> to bson_type_t::bin with BSON binary subtype N (std::uint8_t, range [0, 255]). Standard subtypes: 0 generic, 1 function, 2 binary (old), 3 UUID (old), 4 UUID, 5 MD5, 128 user-defined. | [[bson::binary(4)]] std::vector<std::uint8_t> uuid; |
Datetime semantics. std::chrono::system_clock::time_point is auto-detected as utc_datetime even without [[bson::datetime]]. The emitted descriptor carries bson_type_t::utc_datetime. The runtime serializes the time_point as milliseconds since Unix epoch (BSON convention), encoded as a signed 64-bit integer in the BSON utc_datetime wire format.
Timestamp semantics. BSON Timestamp is a 64-bit composite: high 32 bits are seconds since epoch, low 32 bits are an increment ordinal. It is not interchangeable with utc_datetime. The runtime packs the field value into this composite form.
Decimal semantics. BSON Decimal128 follows IEEE 754-2008 decimal128 (34 significant digits, exponent range −6143 to +6144). The runtime converts the C++ numeric value into the 128-bit BSON decimal128 encoding.
Binary semantics. std::vector<std::uint8_t> without [[bson::binary(N)]] still emits as bson_type_t::bin with subtype 0 (generic binary). The attribute overrides the subtype. Unlike MessagePack ext, BSON binary does not describe payload layout — it is always an opaque byte vector.
| C++ type | bson_type_t | BSON wire type | Notes |
|---|---|---|---|
bool | boolean | 0x08 (boolean) | |
char, signed char, short, int | int32 | 0x10 (int32) | Signed integers ≤ 32-bit |
long, long long | int64 | 0x12 (int64) | Signed 64-bit integers |
unsigned char, unsigned short, unsigned int, unsigned long, unsigned long long | int64 | 0x12 (int64) | Unsigned values widened to signed 64-bit (BSON has no unsigned types) |
float, double, long double | float64 | 0x01 (double) | long double truncated to 64-bit |
std::string | string | 0x02 (string) | UTF-8 string |
std::vector<unsigned char> | bin | 0x05 (binary) | Raw binary blob; subtype 0 unless overridden by [[bson::binary(N)]] |
std::vector<T> | array | 0x04 (array) | item descriptor points to element type |
T[N] (C array) | array | 0x04 (array) | Same emission as std::vector<T> |
std::map<K,V> | map | 0x03 (document) | key and value descriptors; keys must be strings for valid BSON |
std::optional<T> | optional | absent or <T> | item descriptor points to inner type; encoded as absent when empty |
enum class | int32 | 0x10 (int32) | Emitted as underlying integer type; no string mapping today |
Nested struct S | object | 0x03 (document) | Recursively serialized as nested BSON document |
std::chrono::system_clock::time_point | utc_datetime | 0x09 (utc_datetime) | Auto-detected; milliseconds since epoch |
std::variant<...> | nil | 0x0A (null) | Gap. Not yet implemented. |
The compiler emits a self-contained C++ header defining bson::meta::descriptor<T> specializations. The runtime (deferred to a future issue) will include these headers and walk the descriptors at encode/decode time.
Key differences from the CBOR/MessagePack descriptors:
binary_subtype is std::uint8_t (not tag_type or ext_type) because BSON binary carries a subtype byte per the BSON spec.float16 or *_indef variants — BSON has no half-precision float or indefinite-length containers.datetime, timestamp, decimal, binary) drive the BSON type directly.utc_datetime, timestamp, and decimal128 are first-class enum values because they are native BSON wire types.Example specialization for a record with MongoDB extended types:
| Attribute | Where read | Where emitted | Test fixture |
|---|---|---|---|
bson::ignore | h5_attr_reader::has_attr(fld, "bson::ignore") | Skips field in fields[] | bson_primitives |
bson::required | h5_attr_reader::has_attr(fld, "bson::required") | Sets required = true in field desc | bson_primitives, bson_strings |
bson::name("...") | h5_attr_reader::read_field_string(fld, "bson::name") | Overrides json_name in field desc | bson_strings |
bson::doc("...") | h5_attr_reader::read_class_string(node, "bson::doc") | Emitted as doc pointer in field desc | bson_primitives, bson_nested |
bson::alias("...") | h5_attr_reader::read_class_string(node, "bson::alias") | Emitted as alias[] in descriptor | bson_primitives |
bson::datetime | h5_attr_reader::has_attr(fld, "bson::datetime") | Emits bson_type_t::utc_datetime; also auto-detected for std::chrono::time_point | bson_datetime |
bson::timestamp | h5_attr_reader::has_attr(fld, "bson::timestamp") | Emits bson_type_t::timestamp | bson_datetime |
bson::decimal | h5_attr_reader::has_attr(fld, "bson::decimal") | Emits bson_type_t::decimal128 | bson_decimal |
bson::binary(N) | h5_attr_reader::read_field_ints(fld, "bson::binary") | Emits bson_type_t::bin with binary_subtype = N | bson_binary |
| Attribute | Reason |
|---|---|
bson::on_missing | BSON has no schema-level default-value mechanism. Absence semantics live in the runtime decoder, not the descriptor. (Same as JSON, MessagePack, and CBOR.) |
bson::chunk | HDF5 storage concern. |
bson::compress | HDF5 storage concern. |
bson::serialize_full | HDF5 tier-1 emission concern. |
bson::format | JSON Schema validation concern. |
bson::pattern | JSON Schema validation concern. |
bson::min / bson::max | JSON Schema validation concern. |
bson::version | No BSON schema format to version. |
bson::name_all | No wire naming convention needed; BSON uses document keys, not field names. |
bson::ext | MessagePack-specific concern. BSON uses binary with subtypes instead. |
bson::tag | CBOR-specific concern. BSON does not have tagged values. |
trade_t carries alias = "Trade" from [[bson::alias("Trade")]]. The C++ template specialization still uses sn::mongo::trade_t; the alias is metadata.id → bson_type_t::int64 with required = true from [[bson::required]].created → bson_type_t::utc_datetime because std::chrono::system_clock::time_point is auto-detected. The key is renamed to "created_at" via [[bson::name("created_at")]].oplog_ts → bson_type_t::timestamp from [[bson::timestamp]].price → bson_type_t::decimal128 from [[bson::decimal]].uuid → bson_type_t::bin with binary_subtype = 4 from [[bson::binary(4)]] (RFC 4122 UUID).raw → bson_type_t::bin with binary_subtype = 0 (generic binary) because no [[bson::binary]] attribute is present.debug_counter → absent entirely (ignore).symbol → bson_type_t::string. Standard UTF-8 string (BSON type 0x02).tags → bson_type_t::array with item = &item_1 where item_1.type = float64. The runtime walks the array, encoding each element as BSON double.flags → bson_type_t::optional with item = &opt_2 where opt_2.type = uint16. The runtime omits the field when the optional is empty, or emits it as int32 (widened from uint16) when present.doc pointer is nullptr on all fields because no [[bson::doc]] was applied at field scope. Class-level doc is not wired into field descriptors today.The h5cpp architectural pattern is compiler emits descriptors → runtime consumes descriptors → I/O happens. The BSON backend follows this exactly.
Same rationale as HDF5, JSON, MessagePack, CBOR, and protobuf backends:
.cpp bloat: Descriptors are constexpr tables.The actual runtime will use a lightweight custom encoder/decoder (not an external library like libbson) to maintain the h5cpp philosophy of minimal dependencies and zero-copy where possible.