|
H5CPP
v1.14.0
Modern C++ templates for HDF5 serial and parallel I/O
|
|
A dataset is the unit HDF5 stores typed multi-dimensional arrays in. The point of this example is simple: everything you'd reach for the HDF5 C API for — H5Dcreate, H5Dwrite, H5Dread, H5Sselect_hyperslab, H5Pset_chunk, H5Pset_deflate, H5Pset_fill_value, H5Dextend — has a small composable C++ surface in h5cpp.
The whole vocabulary fits on one slide:
| File | Purpose |
|---|---|
datasets.cpp | Ten sections that exercise the full dataset surface (incl. std::mdspan if available) |
datasets.h5 | Output container (./datasets-h5dump -pH datasets.h5) |
Pure STL. The dataset surface itself is type-agnostic — if you want arma::mat, xt::xarray, Eigen::MatrixXd, or std::mdspan (C++23) instead of std::vector, the same h5::write / h5::read calls accept them. See examples/attributes for the linalg variants and the note at the bottom of this README about std::mdspan.
Every dataset has four pieces of header information plus the data array. h5cpp exposes them as composable arguments to h5::create / h5::write:
| HDF5 concept | h5cpp | Notes |
|---|---|---|
| Name | the path string | /group/subgroup/dataset — missing groups can be auto-created with h5::create_path |
| Datatype | template parameter T | Scalar, compound (via H5CPP_REGISTER_STRUCT), string, complex, fixed array |
| Dataspace | h5::current_dims{...}, h5::max_dims{...} | Use H5S_UNLIMITED for extendable axes |
| Storage layout | h5::chunk{...} + filters | Contiguous by default; chunking required for filters or unlimited dimensions |
Hyperslab selection (partial I/O) is the orthogonal vocabulary:
| HDF5 concept | h5cpp | What it means |
|---|---|---|
start | h5::offset{} | First selected cell |
count | h5::count{} | How many (block, block) groups |
stride | h5::stride{} | Distance between successive group starts |
block | h5::block{} | Shape of each group |
With no stride / block, count is the simple "size of the selection".
When you don't need explicit control over chunk size or filters, hand a value to h5::write and h5cpp picks shape and policy from the value:
Result: contiguous storage, fixed shape {5}, no filters.
When you want chunking, compression, attributes, or unlimited dimensions, create the dataset first, then write into it:
h5::create<T> returns an h5::ds_t — a managed dataset handle. Attributes attach to it; the packet-table view h5::pt_t is the same handle from a different angle.
Same dataset, three reader shapes. h5::read<T> dispatches on T:
The first form lets h5cpp allocate. The second hands h5cpp a pre-sized buffer + count describing the shape on disk. The third uses offset + count to read just a sub-region — same hyperslab vocabulary as the write side.
Chunking is required for compression, fletcher32 checksums, or unlimited dimensions. The filter chain runs per chunk:
| Filter | What it does |
|---|---|
h5::chunk{r, c} | rectangular chunk shape |
h5::gzip{N} | DEFLATE level N (1..9) |
h5::shuffle | byte-shuffle before compression |
h5::fletcher32 | per-chunk checksum |
h5::nbit | strip insignificant bits |
h5::fill_value<T>{v} | pre-fill value for uninitialised cells |
Compression ratio depends on the data. Slowly-varying signals (sine, images, structured records) get 3-10x; high-entropy data (already-compressed payloads, random noise) gets ~1x and the pipeline overhead dominates.
Pre-create with a fill value, then read before writing — every cell shows the fill. Common idioms: NaN for floats, sentinel integers for indices.
Set max_dims to H5S_UNLIMITED on the axis you want to grow. Chunking is mandatory. h5::pt_t is the packet-table view of the dataset — it buffers appends and flushes them as chunks.
Two gotchas:
pt_t destructor runs returns the dataset as last flushed. Scope the pt explicitly or call its flush.Write a small block into a larger dataset. Background is 0.0 (fill value), patch is 9.0:
Result:
The raw-buffer form (patch.data() + explicit h5::count) is the unambiguous way to write a sub-region: source layout is row-major, destination layout is row-major. Writing from arma::mat (column-major) also works, but the round-trip through h5::read<arma::mat> will appear transposed unless you compensate.
The same hyperslab vocabulary on the read side. Request a window of shape count starting at offset:
Reads the upper-left 3×4 corner — the patch from section 7 is visible at its (1,1) corner.
Property-list fragments compose with |. The result is a real dcpl_t / lcpl_t you can store, reuse, and pass to many h5::create calls:
h5::create_path auto-creates missing intermediate groups. h5::utf8 marks link names as UTF-8.
Wired into CMake as examples-datasets. Pure STL — no linalg dependency. Running the binary writes datasets.h5 in the current directory:
std::mdspan (C++23)std::mdspan<T, Extents, LayoutPolicy, AccessorPolicy> (P0009, <mdspan> since C++23) is a non-owning multi-dimensional view over a contiguous buffer. Structurally it's exactly what h5cpp passes around internally: pointer + extents.
Wired in h5cpp/H5Mmdspan.hpp, gated on the __cpp_lib_mdspan feature-test macro. The mapper provides:
access_traits_t<std::mdspan<...>> with kind = contiguousstorage_representation_impl resolving to linear_value_datasetimpl::data, impl::size, impl::rank for the legacy raw pathsmdspan is non-owning, so the read path always uses a caller-owned buffer:
The mapper is a no-op if the standard library doesn't ship <mdspan>. H5CPP_HAS_MDSPAN is defined only when __cpp_lib_mdspan >= 202207L is. Section 10 of datasets.cpp reflects that — when mdspan isn't available, it prints skipped: this TU was not built with __cpp_lib_mdspan instead of failing the build.
| Toolchain | Ships <mdspan> | Section 10 runs |
|---|---|---|
| libstdc++ 15+ | yes | yes |
| libstdc++ ≤ 14 | no | skipped |
| libc++ 17+ | yes | yes |
| libc++ ≤ 16 | no | skipped |
The examples-datasets target is built at C++23 (target_compile_features(... cxx_std_23)) so the gate trips automatically when the toolchain catches up — no CMake-level toggle needed.
h5::read<std::mdspan<...>>(fd, path) (allocating return form) is not supported — mdspan is non-owning. Use the buffer-out overload with view.data_handle() as shown above.std::extents<std::size_t, N, M>), dynamic (std::dextents<std::size_t, R>), and mixed extents. Layout policy is layout_right (row-major) by default, which matches HDF5's on-disk layout. layout_left (column-major) round-trips correctly but the on-disk shape will appear transposed in h5dump.AccessorPolicy is accepted but only the default accessor is exercised in the example.The pieces are orthogonal. Type and path are mandatory. Dataspace is the shape. Policy is the property-list bundle. Hyperslab args are the per-call selection. You compose only what you need; defaults cover the rest.
datasets.cpp — rendered with syntax highlighting