|
H5CPP
v1.14.0
Modern C++ templates for HDF5 serial and parallel I/O
|
|
This example shows the small pattern for streaming rows from a CSV file into an HDF5 packet table. The point is simple: a row-at-a-time text source becomes a compressed, chunked, attribute-annotated HDF5 dataset without anyone touching H5Tinsert by hand.
The CSV reader is the header-only Fast C++ CSV Parser. The sample data is a public-domain Monroe County crash dataset.
| File | Purpose |
|---|---|
csv2hdf5.cpp | Reads input.csv row by row, appends each row to a packet table |
struct.h | POD input_t — the on-disk row layout |
generated.h | H5CPP-compiler output: register_struct<input_t> HDF5 compound type |
input.csv | Sample CSV (copied next to the binary by the build) |
Makefile | Standalone Makefile (CMake target is examples-csv) |
The C++ side defines the row as a plain POD. Strings are stored inline as fixed-length character arrays — the simplest representation for HDF5, and adequate when the strings are short and bounded. For long or variable-length text, splitting the strings into a separate dataset is often the better call.
<h5cpp/all> pulls in everything h5cpp needs. The compiler-generated generated.h carries the HDF5 compound descriptor for input_t and follows the h5cpp includes.
CSVReader<N> is templated on the number of columns. The header line lets you pick columns by name and ignore the rest:
Then the row pump:
h5::append buffers row insertions internally and flushes them as chunks — single-row writes do not turn into single-row HDF5 transactions.
Create the file, create the dataset, attach attributes, hand off to the packet-table handle:
A few things going on here:
h5::ds_t is the dataset handle; attributes are written on it.h5::pt_t is the packet-table view of the same dataset; it knows how to buffer + flush appends.h5::max_dims{H5S_UNLIMITED} makes the dataset extendable along its single axis.h5::chunk{10} | h5::gzip{9} is a deliberately tiny chunk for a small demo. In production, size the chunk so that one chunk is ≈ 1 MiB or one network MTU.generated.h is what the LLVM-based h5cpp compiler produces by scanning the TU. It is the HDF5 type descriptor for input_t — what would otherwise be a hand-rolled H5Tcreate(H5T_COMPOUND, ...) block:
You do not edit this file. The compiler regenerates it whenever struct.h or the source TU changes.
h5dump -pH output.h5:
Variable-length attribute strings, a fixed-size character-array column inside the compound, an unlimited-extent dimension chunked at 10, gzip-9 — all from the C++ above.
The example is wired into the CMake build as examples-csv. The build copies input.csv next to the binary in the build directory so ./examples-csv runs without a path argument. To run from anywhere:
The CSV reader hands you typed columns. The struct is the on-disk row layout. The packet table buffers the appends. The compound type comes from the H5CPP compiler. No H5Tinsert, H5Sclose, or H5Dclose in user code.
csv2hdf5.cpp — rendered with syntax highlightinggenerated.h — rendered with syntax highlightingstruct.h — rendered with syntax highlighting