|
H5CPP
v1.14.0
Modern C++ templates for HDF5 serial and parallel I/O
|
|
Templated dataset I/O — create, open, read, write, append (packet table), and the sparse-matrix CSC group form.
The dataset API is the bulk of H5CPP's surface. All operations take an element type T from the Supported Types matrix and return / populate that type directly — no manual buffer management, no shape bookkeeping when the destination is a value or container.
Datasets live below an open h5::fd_t and are addressed by POSIX-style path. Lifetime is RAII via h5::ds_t. Optional h5::offset / h5::stride / h5::count / h5::block arguments select a hyperslab for partial I/O; omitting them touches the whole extent.
| Operation | Function | Returns |
|---|---|---|
| Allocate | h5::create<T>(fd, path, args...) | h5::ds_t |
| Open existing | h5::open(fd, path, dapl?) | h5::ds_t |
| Read — return by value | auto v = h5::read<T>(ds, args...) | T |
| Read — into reference | h5::read(ds, T& ref, args...) | void |
| Read — into raw pointer | h5::read(ds, T* ptr, h5::count{...}, args...) | void |
| Streaming read (C++20 ranges view) | for (auto v : h5::view<T>(ds)) … | std::ranges::input_range |
| Write a value | h5::write(ds, value, args...) | h5::ds_t |
| Write + create on demand | h5::write(fd, path, value, args...) | h5::ds_t |
| Append (packet table) | h5::append(pt, value) + h5::flush(pt) | void |
| Write sparse (CSC) | h5::write(parent, path, sparse_src) | h5::gr_t |
| Read sparse | auto m = h5::read<arma::SpMat<T>>(parent, path) | T (sparse) |
Every overload follows the same [fd, path] / [ds] parent dispatch matrix — pick the one that matches what you already have open.
h5::create — allocate a new datasetAllocates a new HDF5 dataset of element type T. Shape comes from h5::current_dims; extendable dimensions from h5::max_dims; chunking, compression, and filter pipelines from h5::chunk{...} / h5::gzip{N} / h5::fletcher32 / h5::shuffle etc. (see Property Lists).
Parameters
| Name | Type | Description |
|---|---|---|
fd | const h5::fd_t& | Open file descriptor. |
dataset_path | const std::string& | POSIX-style path inside the file; intermediate groups are created when h5::default_lcpl is used. |
args... | variadic | Optional: h5::current_dims, h5::max_dims, h5::chunk, filter pipeline, h5::dcpl_t, h5::lcpl_t, h5::dapl_t, explicit h5::dt_t<T>. |
Returns — h5::ds_t RAII handle.
Throws — h5::error::io::dataset::create on H5Dcreate2 failure (path conflict, invalid type, insufficient permissions).
Example
h5::open — open an existing datasetIf the dataset's access property list carries the H5CPP high-throughput pipeline tag, the pipeline's per-chunk cache is initialised here from the dataset's element size — subsequent reads / writes pick up the pre-warmed filter chain transparently.
Throws — h5::error::io::dataset::open (not present, no read permission, invalid DAPL).
Example
h5::read — read into a value, container, or bufferThree flavours, chosen by what you have on hand:
Each form has an (fd, path, ...) and (file_path, dataset_path, ...) convenience overload that opens the dataset (or file + dataset) and forwards. Nine overloads in total — pick whichever matches your call context.
| Form | Element count derived from… | Use when |
|---|---|---|
| Return-by-value | dataset's on-disk shape | One-shot read; you want the right T instance back |
By-reference (T&) | ref's container size | You already have a target object (avoids allocation) |
Raw pointer (T*) | h5::count{...} (required) | Interop with C buffers, scatter/gather pipelines |
T follows Supported Types. Optional h5::offset / h5::stride / h5::block arguments select a hyperslab.
Throws — h5::error::io::dataset::read on H5Dread failure (type-conversion error, rank mismatch, invalid hyperslab).
Examples
h5::view<T> — C++20 ranges streaming readA streaming view over a rank-1 chunked dataset. Returns a view_range satisfying std::ranges::input_range — usable in any range-for loop or std::ranges algorithm — that walks the dataset one chunk at a time through the standard h5cpp filter pipeline. The whole dataset is never materialised in memory.
The iterator pulls a chunk on demand, decompresses it through the configured filter chain (gzip / shuffle / Gorilla / custom — see FILTERS), and yields elements one at a time until exhausted. When the next chunk is needed, the iterator fetches it; the previous chunk is released. Memory footprint is one chunk + the iterator's small bookkeeping, regardless of the dataset's total size.
Constraints
| Constraint | Reason |
|---|---|
| Dataset must be rank-1 | The view yields scalars in storage order — higher ranks would need indexing semantics |
| Dataset must be chunked | The streaming model requires chunk-boundary I/O — contiguous and compact layouts can't be streamed by chunk |
| C++ standard ≥ C++20 | Uses <ranges> + concept-constrained iterators; gated on __cplusplus >= 202002L |
Element type T must be HDF5-native | Pulled into the iterator's value buffer via the standard dt_t<T> pipeline |
Filter compatibility
Any filter chain that h5::impl::basic_pipeline_t supports is transparently handled — uncompressed, gzip / deflate, LZ4, Zstd, Gorilla, shuffle, fletcher32, custom filters. Filtered datasets benefit the most from the streaming view because they're typically the ones that don't fit in memory.
Throws — std::runtime_error on h5::view construction if the dataset isn't rank-1; h5::error::io::dataset::read on chunk-fetch failures during iteration.
Examples
view vs. ordinary read| Use case | Choose |
|---|---|
| Dataset fits comfortably in memory | h5::read<T> |
| Dataset is multi-GB / unbounded | h5::view<T> |
| Need random access by index | h5::read<T> with hyperslab offset/count |
| Need sequential pass + reduction (sum / count / max / fold) | h5::view<T> |
Want to pipe through std::ranges algorithms | h5::view<T> |
| Need rank-2+ dataset | h5::read<T> — view is rank-1 only |
| Non-chunked (contiguous / compact) dataset | h5::read<T> — view requires chunked layout |
h5::view complements rather than replaces the by-value / by-reference / by-pointer overloads — they handle the "load into a
container" use case; view handles the "iterate without loading" use case.
h5::write — deposit a value into a datasetThe dispatch is compile-time SFINAE on T's storage representation — contiguous container, ragged VLEN, fixed-length string, sparse matrix, etc. all route through dedicated branches.
Throws — h5::error::io::dataset::write on H5Dwrite failure; h5::error::io::dataset::create if create-on-demand failed.
Examples
h5::append / h5::flush / h5::reset — streaming appendh5::pt_t is a packet-table descriptor wrapping an extendable chunked dataset. h5::append buffers values into the active in-memory chunk; when it fills, the chunk flushes to disk along the first (slowest-growing) dimension. Multi-rank packet tables write a hyperplane per call — ref's shape must match chunk_dims[1..rank-1].
Create the packet table via h5::create<T>(fd, path, max_dims{H5S_UNLIMITED}, chunk{N}).
Throws — h5::error::io::dataset::write on a flush failure; h5::error::io::dataset::close if the destructor's implicit flush fails.
Example — streaming loop
SFINAE-gated on is_sparse_v<T> — the dense and sparse overloads do not conflict. Source types: arma::SpMat / SpRow / SpCol, Eigen SparseMatrix<T, ColMajor, I> / SparseVector<T, ColMajor, I>.
On-disk layout (see Supported Linear Algebra Types § Sparse storage layout):
Byte-compatible with scipy.sparse.csc_matrix, Julia HDF5.jl, and the 10x Genomics / Loompy convention.
Preconditions (not enforced implicitly — both would require mutating a const &):
arma::SpMat: SpMat::sync() must have completed.Eigen::SparseMatrix: makeCompressed() must have been called; ColMajor is enforced via static_assert.Example
h5::error::io::dataset::* hierarchyfd_t the parent of every dataset