|
H5CPP
v1.14.0
Modern C++ templates for HDF5 serial and parallel I/O
|
|
h5cpp routes C++ containers to HDF5 layouts by capability rather than by name. The dispatch is the Walter Brown detection idiom: at compile time, the type is probed for the expressions that should be valid on it (value_type, iterator, begin/end, size, data, key_type, mapped_type), and the matching storage representation is selected from the result. std::vector<T> writes as a rank-1 dataset because it exposes contiguous data with a known size and an element type the library can serialise — not because it is std::vector. Any type with the same surface routes the same way.
| File | Purpose |
|---|---|
container.cpp | Round-trips the supported std:: container families |
tiny_containers.hpp | Header-only minimal custom containers: vec, flist, set, dict |
detected.cpp | Demonstrates Walter Brown detection-idiom dispatch on non-std:: containers |
Both examples are header-only from the user side:
No generated compound descriptor header is needed here. The compound layouts used for pair, tuple, and map-like containers are handled by h5cpp traits.
The useful mental model is:
For example:
No inheritance, no virtual interfaces, no adapter layer — only compile-time shape recognition. The library detects the expressions that are valid for T and lets overload resolution pick the storage path.
std::vector<T> writes directly. std::deque<T> and std::list<T> are staged into a temporary contiguous buffer first. The on-disk result is identical: a rank-1 HDF5 dataset of T. The C++ container type does not appear in the file.
container.cpp writes and reads:
std::vector<T>, std::deque<T>, std::list<T>std::vector<std::array<T,N>>std::set<T>, std::multiset<T>std::unordered_set<T>, std::unordered_multiset<T>std::map<K,V>, std::multimap<K,V>std::unordered_map<K,V>, std::unordered_multimap<K,V>std::vector<std::pair<K,V>>, std::vector<std::tuple<...>>std::forward_list<T> through h5::appendstd::vector<std::string>std::vector<std::vector<T>>The compact model:
h5cpp does not store “a C++ container implementation.” It stores the data model implied by the container.
std::vector<T> is the fast path: contiguous memory, direct pointer, one H5Dwrite.
std::deque<T> and std::list<T> are not contiguous, so h5cpp copies them into a staging buffer first. The file layout is still a rank-1 dataset of T.
A vector<array<T,N>> is flattened into a rectangular dataset. The outer vector gives the row count; the inner array gives the fixed column count.
Conceptually:
Fixed-size rows, no variable-length machinery — the cleanest of the multi-dimensional paths.
Sorted sets are stored in their iteration order:
The result is a rank-1 dataset. For set, duplicates disappear. For multiset, duplicates remain.
Hash sets round-trip as containers, but their order is not meaningful:
The file records the observed iteration order, which depends on the hash table's internal state and is not portable across runs, compilers, or library versions. Do not depend on it.
std::forward_list<T> has no .data() and is not a good match for ordinary contiguous writes.
Use an unlimited chunked dataset and append:
Read it back as a normal sequence:
This is the right pattern for single-pass input, packet streams, and append-style workflows.
Map-like containers are stored as rank-1 datasets of compound records:
Supported examples:
Sorted maps are written in key order. Unordered maps are written in bucket iteration order. Round-trip reconstruction preserves the source container's semantics; the on-disk order reflects whatever the writer's container produced at the moment of write.
Sequences of pair and tuple become compound datasets.
The pair layout is:
Tuples are packed into an internal C-struct mirror before writing:
Conceptually:
The exact field naming is implementation-defined by h5cpp traits, but the model is simple: one tuple element becomes one compound record.
detected.cpp is the important proof.
It shows that h5cpp’s container dispatch is structural, not name-based. The example uses custom containers from tiny_containers.hpp:
None of these derive from std::vector, std::list, std::set, or std::map.
Yet h5cpp still routes them correctly because the Walter Brown detection idiom recognizes their type surface.
h5cpp probes the type:
The demo prints a trait card:
This is the key message:
The trait outputs are the dispatch's input: valid expressions define the category, the category selects the storage representation, the representation determines the HDF5 layout.
After detection, h5cpp picks the storage model:
| Custom type | Detected shape | Storage representation | HDF5 layout |
|---|---|---|---|
tiny::vec<T> | contiguous sequence | linear_value_dataset | rank-1 dataset of T |
tiny::flist<T> | iterator sequence | linear_value_dataset | rank-1 dataset of T |
tiny::set<T> | set-like | linear_value_dataset | rank-1 dataset of T |
tiny::dict<K,V> | map-like | key_value_dataset | rank-1 compound dataset { key, value } |
The write path:
The read-back path:
Current asymmetry:
| Case | Current behavior |
|---|---|
| Contiguous custom vector-shape | Can write and read back into the custom type itself |
| Iterator-only custom sequence | Can write structurally; read back through std::vector<T> |
| Custom set-shape | Can write structurally; read back through std::set<T> |
| Custom map-shape | Can write structurally; read back through std::map<K,V> |
The write side is structural across all four shapes. The read side is structural for contiguous custom containers (they round-trip into themselves) but currently uses the matching std:: counterpart for iterator-only, set-like, and map-like custom containers. This is a construction-policy gap in the dispatcher, not a file-format limitation — the HDF5 layout on disk is identical regardless of which C++ container reads it back.
tiny::vec, tiny::flist, tiny::set, and tiny::dict are toy demos. The same detection-idiom dispatch picks up real-world third-party containers automatically — no specialization, no opt-in macro, no registration step.
Any container with .data() + .size() + value_type + T(size_t) ctor:
| Library | Type |
|---|---|
| Abseil | absl::FixedArray<T> |
| Abseil | absl::InlinedVector<T, N> |
| Folly | folly::small_vector<T, N> |
| Folly | folly::fbvector<T> |
| Boost.Container | boost::container::vector<T> |
| Boost.Container | boost::container::small_vector<T,N> |
| Boost.Container | boost::container::static_vector<T,N> |
| eve | eve::aligned_vector<T> |
These ride the read-side structural fallback added to H5Dread.hpp and round-trip into the custom type itself.
std::set today)Any container with key_type + value_type, no mapped_type:
| Library | Types |
|---|---|
| Abseil | absl::flat_hash_set<T>, absl::node_hash_set<T>, absl::btree_set<T> |
| TSL | tsl::robin_set<T>, tsl::hopscotch_set<T>, tsl::sparse_set<T> |
| parallel-hashmap | phmap::flat_hash_set<T>, phmap::parallel_flat_hash_set<T> |
| Boost | boost::container::flat_set<T>, boost::unordered_set<T> |
| Folly | folly::F14ValueSet<T>, folly::F14NodeSet<T> |
These ride the new is_set_like fallback in storage_representation_impl. Before today they resolved to unsupported and triggered a static_assert.
std::map today)Any container with key_type + mapped_type + value_type:
| Library | Types |
|---|---|
| Abseil | absl::flat_hash_map<K,V>, absl::node_hash_map<K,V>, absl::btree_map<K,V> |
| TSL | tsl::robin_map<K,V>, tsl::hopscotch_map<K,V>, tsl::sparse_map<K,V> |
| parallel-hashmap | phmap::flat_hash_map<K,V>, phmap::parallel_flat_hash_map<K,V> |
| Boost | boost::container::flat_map<K,V>, boost::unordered_map<K,V> |
| Folly | folly::F14ValueMap<K,V>, folly::F14NodeMap<K,V> |
These have been working since the original is_map_like fallback landed; listed here for completeness alongside their set counterparts.
size_t constructor (e.g. only accept iterator pairs) won't round-trip on the structural read path — the dispatcher needs T(std::size_t).Scalar nested-type veto, which is desirable: those types have their own operator<<.| Type | kind | storage | Write mechanism |
|---|---|---|---|
int, float, enums | object | scalar | direct H5Dwrite |
std::string, std::string_view | text | vlen_text_dataset | direct write via variable-length text type |
std::vector<T>, std::array<T,N>, T[N] | contiguous | linear_value_dataset / c_array | direct H5Dwrite |
std::vector<std::array<T,N>> | contiguous | fixed_inner_extent_dataset | direct write as rows × N |
std::vector<std::complex<T>> | contiguous | linear_value_dataset | direct H5Dwrite |
| Linear algebra types | contiguous | linear_value_dataset | direct H5Dwrite |
std::vector<std::string> | pointers | vlen_text_dataset | char* relay + H5T_VARIABLE |
std::vector<std::vector<T>> | pointers | ragged_vlen_dataset | hvl_t relay + H5Tvlen_create |
std::vector<NonTrivialPod> | pointers | linear_value_dataset | h5::gather then flat write |
std::list<T>, std::deque<T>, std::forward_list<T> | iterators | linear_value_dataset | staging vector then flat write |
std::set<T>, std::multiset<T> | iterators | linear_value_dataset | staging vector then flat write |
std::unordered_set<T>, std::unordered_multiset<T> | iterators | linear_value_dataset | staging vector then flat write |
std::map<K,V>, std::multimap<K,V> | iterators | key_value_dataset | kv_t compound + H5T_COMPOUND |
std::unordered_map<K,V>, std::unordered_multimap<K,V> | iterators | key_value_dataset | kv_t compound + H5T_COMPOUND |
std::tuple<Ts...> | composite | scalar | traits::pack(ref, buf) then compound scalar write |
std::vector<std::tuple<Ts...>> | pointers | linear_value_dataset | pack each tuple then rank-1 compound write |
std::list<std::tuple<Ts...>> | iterators | linear_value_dataset | pack each tuple through staging |
std::pair<K,V> | object | scalar | direct compound write via dt_t<pair> |
std::vector<std::pair<K,V>> | contiguous | linear_value_dataset | direct compound rank-1 write |
std::complex<T> | object | scalar | direct write via dt_t<complex> |
User aggregate registered via H5CPP_REGISTER_STRUCT(Foo) | object | scalar | direct compound write |
| Compiler-reflected tier-2 type | — | — | generated scatter path |
tiny::vec<T>, absl::FixedArray<T>, folly::small_vector, boost::container::vector, … | detected contiguous | linear_value_dataset | structural write |
tiny::flist<T> and any custom iterator-only sequence | detected sequential-like | linear_value_dataset | structural write through staging |
tiny::set<T>, absl::flat_hash_set, tsl::robin_set, boost::flat_set, folly::F14ValueSet, … | detected set-like | linear_value_dataset | structural write through staging |
tiny::dict<K,V>, absl::flat_hash_map, tsl::robin_map, boost::flat_map, folly::F14ValueMap, … | detected map-like | key_value_dataset | structural write as { key, value } |
std::vector<bool> | — | unsupported | rejected correctly |
| deeply nested containers | — | mostly unsupported | compile-time stopper |
| unregistered POD aggregate | — | unsupported | requires registration or compiler-generated descriptor |
| Type | kind | storage | Read mechanism |
|---|---|---|---|
int, float, enums | object | scalar | direct H5Dread |
std::string, std::string_view | text | vlen_text_dataset | direct variable-length text read |
std::vector<T>, std::array<T,N>, T[N] | contiguous | linear_value_dataset / c_array | direct H5Dread |
std::vector<std::array<T,N>> | contiguous | fixed_inner_extent_dataset | direct read as rows × N |
std::vector<std::complex<T>> | contiguous | linear_value_dataset | direct H5Dread |
| Linear algebra types | contiguous | linear_value_dataset | direct H5Dread |
std::vector<std::string> | pointers | vlen_text_dataset | char* relay + reclaim |
std::vector<std::vector<T>> | pointers | ragged_vlen_dataset | hvl_t relay + reclaim |
std::list<T>, std::deque<T>, std::forward_list<T> | iterators | linear_value_dataset | read to staging vector, then assign/copy |
std::set<T>, std::multiset<T> | iterators | linear_value_dataset | read to staging vector, then insert |
std::unordered_set<T>, std::unordered_multiset<T> | iterators | linear_value_dataset | read to staging vector, then insert |
std::map<K,V>, std::multimap<K,V> | iterators | key_value_dataset | read kv_t compound records, then insert |
std::unordered_map<K,V>, std::unordered_multimap<K,V> | iterators | key_value_dataset | read kv_t compound records, then insert |
std::tuple<Ts...> | composite | scalar | compound read then traits::unpack |
std::vector<std::tuple<Ts...>> | pointers | linear_value_dataset | read packed compounds, unpack each |
std::list<std::tuple<Ts...>>, set<...>, deque<...> | iterators | linear_value_dataset | unpack each, then assign/insert |
std::pair<K,V> | object | scalar | direct compound read |
std::vector<std::pair<K,V>> | contiguous | linear_value_dataset | direct compound rank-1 read |
std::complex<T> | object | scalar | direct read via dt_t<complex> |
User aggregate registered via H5CPP_REGISTER_STRUCT(Foo) | object | scalar | direct compound read |
| Compiler-reflected tier-2 type | — | — | generated gather path |
tiny::vec<T>, absl::FixedArray<T>, folly::small_vector, boost::container::vector, … | detected contiguous vector-shape | linear_value_dataset | structural read into custom type |
tiny::flist<T> and any custom iterator-only sequence | detected iterator-only sequence | linear_value_dataset | read back through std::vector<T> today |
tiny::set<T>, absl::flat_hash_set, tsl::robin_set, … | detected set-like | linear_value_dataset | read back through std::set<T> today |
tiny::dict<K,V>, absl::flat_hash_map, tsl::robin_map, … | detected map-like | key_value_dataset | read back through std::map<K,V> today |
std::vector<bool> | — | unsupported | rejected correctly |
| deeply nested containers | — | unsupported / guarded | compile-time stopper |
| unregistered POD aggregate | — | unsupported | requires descriptor |
Two nested forms backed by HDF5 variable-length storage are wired and round-trip cleanly:
| C++ type | HDF5 model | Mechanism |
|---|---|---|
std::vector<std::string> | variable-length text dataset | char* relay array + H5T_VARIABLE |
std::vector<std::vector<T>> | ragged variable-length dataset | hvl_t relay array + H5Tvlen_create |
The example also demonstrates h5cpp’s STL stream output helpers:
Containers can be inserted directly into std::ostream. Long containers are truncated according to H5CPP_CONSOLE_WIDTH, with a trailing ....
Useful for examples, tests, and sanity checks. Not a serialisation format — use h5::write for that.
The iterable pretty-printer vetoes types exposing a Scalar nested alias, so Eigen / blaze / xtensor matrices keep their own operator<< rather than getting hijacked by a generic begin/end print loop. Linalg libraries name their element Scalar; STL containers name it value_type — clean discriminator.
Deeper nesting still stops at compile time:
HDF5 itself can represent many of these shapes with nested variable-length and compound types, but h5cpp intentionally rejects them until the recursive packer/unpacker path is explicit.
Other intentional stoppers:
| C++ type | Why |
|---|---|
std::vector<bool> | bit-packing specialization; no real contiguous bool* |
std::array<std::string, N> | fixed array of variable-length elements needs explicit policy |
std::array<std::vector<T>, N> | array-of-container guard |
| Unregistered POD aggregate | needs H5CPP_REGISTER_STRUCT(Foo) or generated descriptor |
| Arbitrary nested containers | recursive VLEN/compound chain not wired yet |
The point of all this: h5cpp stores the data model implied by the container, not the container implementation. The detection idiom makes that possible — types with the same shape route to the same HDF5 layout, regardless of which library they come from. Hand-written overloads aren't needed for each new container type; the structural surface is the contract.
container.cpp — rendered with syntax highlightingdetected.cpp — rendered with syntax highlightingtiny_containers.hpp — rendered with syntax highlighting