|
H5CPP
v1.14.0
Modern C++ templates for HDF5 serial and parallel I/O
|
|
A sparse matrix is written as an HDF5 group containing four datasets in canonical Compressed Sparse Column form. The call shape is the same as for dense linalg, but it returns h5::gr_t instead of h5::ds_t:
The on-disk layout is the convention scipy / 10x Genomics / Loompy / h5sparse already use, so the file round-trips with Python consumers without an out-of-band schema. is_sparse<T> selects this overload via SFINAE; non-sparse types stay on the dense h5::write path.
| File | Library | What it covers |
|---|---|---|
arma.cpp | Armadillo | arma::SpMat<double> matrix + arma::SpCol<double> vector round-trip |
eigen.cpp | Eigen3 | Eigen::SparseMatrix<double> + Eigen::SparseVector<double> round-trip |
Makefile | — | Standalone build that resolves h5cpp/ + vendored arma/eigen from the worktree |
Sparse traits live in the per-library mapper headers (H5Marma.hpp, H5Meigen.hpp); the I/O dispatch is in H5Dsparse.hpp; the on-disk constants and the is_sparse<T> / sparse_traits<T> contract are in H5Tsparse.hpp.
Indices are fixed uint32 (10x Genomics / Loompy convention). h5::write throws if nnz, n_rows, or n_cols exceeds 2^32 - 1. Vectors (Arma SpRow/SpCol, Eigen SparseVector) are promoted to 1xN / Nx1 CSC matrices so a scipy reader sees a normal csc_matrix.
Via the project's CMake build:
Or standalone via the local Makefile (does not need the CMake build tree):
Expected output:
Inspecting the result with h5dump arma.h5 shows the four datasets and two attributes per CSC group exactly as the layout block above describes.
The arma example sticks to a hand-rolled max(abs(...)) over arma::SpMat::const_iterator rather than arma::norm(...), because norm pulls in the BLAS dnrm2_ symbol and we want the example to link without an external BLAS dependency.
Both libraries lazy-cache inserts; h5cpp does not call sync() / makeCompressed() implicitly to avoid mutating a user's const SpMat&.
| Library | Required call before h5::write |
|---|---|
| Armadillo | M.sync() after M(i,j) = x insertions |
| Eigen3 | M.makeCompressed() (or use setFromTriplets which leaves it compressed) |
A non-compressed Eigen matrix passed to h5::write will produce a wrong file — the runtime check is on the trait's valuePtr() access path. The arma sync() precondition is identical to what direct values / row_indices / col_ptrs access requires per the Arma docs.
| Library | Sparse matrix | Sparse vector | Storage order |
|---|---|---|---|
| Armadillo | ✔ arma::SpMat<T> | ✔ arma::SpRow<T>, arma::SpCol<T> | CSC (native) |
| Eigen3 | ✔ Eigen::SparseMatrix<T, ColMajor, I> | ✔ Eigen::SparseVector<T, ColMajor, I> | CSC (native) |
| Eigen3 | ✘ Eigen::SparseMatrix<T, RowMajor, I> | ✘ row-vector form | rejected at compile time — would need a transpose on write |
is_sparse<T> evaluates false for RowMajor Eigen types, so the dense h5::write path is selected instead — which then fails its own unsupported static_assert with a clear message. If you need CSR on disk, transpose to ColMajor first (SpMat tmp = src.transpose(); tmp.makeCompressed();).
scipy accepts the uint32 indices without conversion. The file also matches the 10x Genomics layout (/matrix/{data, indices, indptr, shape}), so scanpy / cellranger / Loompy-adjacent tooling that targets 10x format reads it directly.
The one caveat is the h5sparse Python library — it dispatches on an attribute named h5sparse_format, not format, so its auto-detection won't fire. The group still reads fine with the snippet above.
.+ 1The two .+ 1 are intrinsic to crossing the C/Python ↔ Julia boundary — scipy and the C convention are 0-based, SparseMatrixCSC is 1-based. Writing both 0-based and 1-based copies to disk would double the index storage for one ecosystem's convenience, so the file stays 0-based and the Julia reader adjusts.
JLD.jl / JLD2.jl won't auto-recognize the file — they use their own ad-hoc layout (the historical _refs directory the earlier sparse note complained about). For Julia, route through HDF5.jl directly as above.
| Reader | Reads our files | Notes |
|---|---|---|
Python h5py + scipy.sparse.csc_matrix | ✔ ok | 4-line snippet above; no conversion needed |
Python scanpy / cellranger (10x layout) | ✔ ok | Same dataset names as 10x Genomics matrix format |
Python h5sparse | ◇ partial | Auto-dispatch keyed off h5sparse_format not format; manual read works |
Julia HDF5.jl + SparseArrays | ✔ ok | After .+ 1 to convert 0-based indices |
Julia JLD.jl / JLD2.jl | ✘ na | Uses its own non-standard layout; bypass via HDF5.jl |
@format and @axis are advisory attributes — they let format-aware readers like Loompy dispatch correctly; plain h5py / HDF5.jl ignore them and read the four datasets directly.
Lives in examples/CMakeLists.txt right after the linalg block. The sparse-arma target is gated on ARMADILLO_FOUND (same gate as the dense arma example); sparse-eigen has no gate because Eigen3 is required by the wider build. Output binaries land in <build>/examples-sparse-arma and <build>/examples-sparse-eigen.
| Target | Status |
|---|---|
examples-sparse-arma | ✔ ok — SpMat 8x12 round-trip + SpCol vector round-trip, residual = 0 |
examples-sparse-eigen | ✔ ok — SparseMatrix 10x15 + SparseVector round-trip, residual = 0 |
Both targets also build via the local Makefile in this directory (resolves headers from ../../h5cpp plus the vendored thirdparty/armadillo and thirdparty/eigen3 trees) for standalone iteration outside the CMake build tree.
examples/linalg/** — dense linalg-container round-trip; same h5::write / h5::read<T> call shape, returns h5::ds_t.h5cpp/H5Tsparse.hpp** — is_sparse<T> / sparse_traits<T> contract, on-disk name constants.h5cpp/H5Dsparse.hpp** — h5::write(parent, name, spmat) returning h5::gr_t, plus the symmetric h5::read<Sparse>.h5cpp/H5Marma.hpp** / **h5cpp/H5Meigen.hpp** — per-library sparse_traits specializations.