|
H5CPP
v1.14.0
Modern C++ templates for HDF5 serial and parallel I/O
|
|
Two families of MPI + HDF5, with different dependencies:
| Family | Pattern | Needs MPI | Needs parallel HDF5 | Needs parallel FS |
|---|---|---|---|---|
| Parallel HDF5 (3 examples) | One shared file, MPI-IO collective/independent transfer | ✔ | ✔ | recommended |
| MPI + serial HDF5 (1 example) | One file per rank, plain serial HDF5 | ✔ | ✘ | ✘ |
If your HDF5 build doesn't have --enable-parallel (most distro packages don't), the parallel-HDF5 examples are skipped by CMake with a clear status message; the file-per-rank example still builds and runs.
| File | Tier | What it teaches |
|---|---|---|
collective.cpp | parallel HDF5 | Collective transfer mode — all ranks must participate in each I/O call |
independent.cpp | parallel HDF5 | Independent transfer mode — each rank issues its own MPI-IO op |
throughput.cpp | parallel HDF5 | Aggregate write/read MB/s benchmark across ranks |
file_per_rank.cpp | MPI + serial HDF5 | Embarrassingly-parallel pattern — each rank writes its own private .h5 file |
CMake prints one of:
or:
or:
Then:
mpirun -n <N> launches N ranks. For SLURM-managed clusters, the same launch shape is srun -n <N> ./examples-mpi-<name> — CMake autodetects SLURM and prints the right command at configure time.
Each rank writes its own slab into a single shared .h5 file via HDF5's MPI-IO virtual driver. Two transfer modes:
| Mode | API | When to use |
|---|---|---|
h5::collective | H5FD_MPIO_COLLECTIVE | Regular, predictable slabs (every rank touches every collective call). Highest throughput on a parallel filesystem. |
h5::independent | H5FD_MPIO_INDEPENDENT | Irregular workloads where ranks may opt out of individual calls. Lower latency, lower aggregate throughput. |
Open the file with h5::mpiio({MPI_COMM_WORLD, MPI_INFO_NULL}) to attach the parallel driver. Pass the transfer mode as the last argument to h5::write / h5::read.
Achievable throughput scales linearly with the number of OSTs/stripes on a Lustre/GPFS/BeeGFS volume; on a node-local POSIX disk it plateaus at the disk's sequential bandwidth divided by world_size.
Sample output from mpi-throughput (4 ranks, 80 MB / rank = 320 MB total, local SSD + page cache):
Numbers this high reflect the Linux page cache absorbing the 320 MB working set — real disk bandwidth only becomes the bottleneck when the per-rank slab exceeds available RAM. To benchmark the parallel filesystem itself (not the cache), bump nrows until the total dataset is at least 2× system RAM, or run with posix_fadvise(DONTNEED) between write and read.
Each rank writes to output_<rank>.h5. No cross-rank file coordination, so a stock serial HDF5 build is enough — the same library that ships in libhdf5-dev on Debian/Ubuntu.
This is the right pattern for:
Each output file is a complete, standalone HDF5 container:
To present the per-rank files as a single logical dataset post-run, use an HDF5 virtual dataset (VDS) — see HDF5 docs; outside the scope of this example.
| Question | Use |
|---|---|
| Do you have a parallel filesystem (Lustre/GPFS/BeeGFS)? | Tier 1 |
| Are ranks contributing to a single canonical dataset? | Tier 1 |
| Are ranks doing independent compute (MC, ensembles)? | Tier 2 |
Is the HDF5 build serial (--enable-parallel off)? | Tier 2 (only choice) |
Tier 2's "post-process the per-rank files into one" overhead is usually amortised quickly when ranks run on heterogeneous nodes or different disks.
All four targets ✔ ok on this machine — OpenMPI + parallel HDF5 (HDF5 1.12.3 at /usr/local/HDF_Group/HDF5/1.12.3/ with --enable-parallel).
| Target | Status | Notes |
|---|---|---|
examples-mpi-file-per-rank | ✔ ok | Per-rank file written + readback-verified |
examples-mpi-collective | ✔ ok | 4 ranks → (10 × 4) shared dataset, each rank reads its own column back |
examples-mpi-independent | ✔ ok | Same shape, independent transfer mode |
examples-mpi-throughput | ✔ ok | 4 ranks × 80 MB = 320 MB; ~4.2 GB/s write, ~5.0 GB/s read (local SSD + page cache) |
Gated on MPI_FOUND (all four) and HDF5_IS_PARALLEL (first three). When either is missing CMake prints exactly which dep is unavailable and skips the affected targets.
examples/raw_memory/** — raw-pointer write/read shape that file-per-rank uses under the hoodexamples/packet-table/** — streaming append; an alternative for ranks producing data over time without coordinating offsetscollective.cpp — rendered with syntax highlightingfile_per_rank.cpp — rendered with syntax highlightingindependent.cpp — rendered with syntax highlightingthroughput.cpp — rendered with syntax highlighting