|
H5CPP
v1.14.0
Modern C++ templates for HDF5 serial and parallel I/O
|
|
Scope: All compression, checksum, shuffle, and pre-processing filters that h5cpp implements natively or ships via third-party dependencies.
Target audience: Developers choosing filters for dataset creation pipelines.
Last updated: 2026-05-24 (branch 262-gorilla-xor-filter)
| Filter | HDF5 ID | Type | Source | h5cpp property | Best for |
|---|---|---|---|---|---|
| Deflate / Gzip | 1 | Lossless | zlib (opt. libdeflate) | h5::deflate{level} h5::gzip{level} | General purpose, text, HDF5 default |
| Shuffle | 2 | Pre-processing | Native | h5::shuffle | Multi-byte numeric arrays before entropy codec |
| Fletcher32 | 3 | Checksum | Native | h5::fletcher32 | Data integrity verification |
| SZIP | 4 | Lossless | Vendored libaec/szip | h5::szip (C API) | Scientific integer imagery (Rice/Golomb) |
| N-Bit | 5 | Pre-processing | HDF5 C (passthrough) | h5::nbit | Compact integer storage |
| Scale-Offset | 6 | Pre-processing | HDF5 C (passthrough) | h5::scaleoffset | Quantised float/int storage |
| LZ4 | 32004 | Lossless | System liblz4 | h5::lz4 | Extreme speed, modest ratio |
| Zstd | 32015 | Lossless | Vendored zstd | h5::zstd | High ratio, tunable speed |
| Gorilla | 32016 | Lossless | Native | h5::gorilla{4\|8} | Smooth floating-point time series |
IDs 1-6 are assigned by The HDF Group. IDs 32004, 32015, 32016 are community-registered filter slots.
h5cpp/H5Zall.hpp)H5Z_FILTER_DEFLATE, id=1)filter::deflate / filter::gzip (aliases).H5CPP_HAS_LIBDEFLATE is defined, h5cpp uses the vendored libdeflate (v1.25.0) instead of zlib. libdeflate is typically 2-5× faster for both encode and decode at the cost of slightly larger output.compress2 / uncompress from system zlib.params[0] = compression level (0-9, default from H5CPP_DEFAULT_COMPRESSION).H5Z_FILTER_SHUFFLE, id=2)N elements of B bytes, byte k of every element becomes contiguous.filter::shuffle — pure C++ nested loops, no external deps.params[0] = element size in bytes (inferred by HDF5).H5Z_FILTER_FLETCHER32, id=3)filter::fletcher32 — native loop over 16-bit words.size + 4.0 on mismatch (HDF5 treats this as filter failure).H5Z_FILTER_GORILLA, id=32016)filter::gorilla — native bit-packed encoder/decoder with 6-bit leading and 6-bit meaningful length descriptors.prev_meaningful <= 12 + meaningful) to avoid pathological expansion when a sign-bit flip creates a 64-bit-wide block.float32 (4) and float64 (8). Other sizes fall back to memcpy.params[0] = element size (must be explicit; auto-detect is intentionally disabled to avoid 4-vs-8 ambiguity).H5Z_FILTER_SZIP, id=4)thirdparty/szip/.filter::szip — thin wrapper around SZ_BufftoBuffCompress / SZ_BufftoBuffDecompress.params[0-3] = options_mask, bits_per_pixel, pixels_per_block, pixels_per_scanline.H5CPP_USE_SZIP (ON by default).H5Z_FILTER_LZ4, id=32004)find_library(lz4) + find_path(lz4.h)). Not vendored.filter::lz4 — calls LZ4_compress_default / LZ4_decompress_safe.H5CPP_USE_LZ4 (ON by default). If not found, compiles as passthrough.H5Z_FILTER_ZSTD, id=32015)thirdparty/zstd/.filter::zstd — calls ZSTD_compress / ZSTD_decompress.H5CPP_USE_ZSTD (ON by default).params[0] = compression level (1-22).H5CPP_HAS_LIBDEFLATE is defined, filter::deflate routes encode/decode through libdeflate instead of zlib.thirdparty/libdeflate/.H5CPP_USE_LIBDEFLATE (ON by default).h5::gzip or h5::deflate but need higher single-threaded throughput.These filters are registered and executed by the HDF5 C library itself during H5Dread / H5Dwrite. h5cpp provides property-list wrappers but does not reimplement the algorithm.
| Filter | ID | Wrapper | Why passthrough |
|---|---|---|---|
| N-Bit | 5 | h5::nbit | Bit-precision packing for integers; used rarely in trading/HPC paths. |
| Scale-Offset | 6 | h5::scaleoffset | Quantised float/int pre-processing; delegated to HDF5 C to avoid reverse-engineering internal cd_values descriptors. |
Both are valid in property chains and will be applied correctly by the underlying HDF5 library.
| Dependency | CMake option | Vendored? | Location | Compile def |
|---|---|---|---|---|
| zlib (gzip) | — (required) | No | System | — |
| libdeflate | H5CPP_USE_LIBDEFLATE | Yes | thirdparty/libdeflate/ | H5CPP_HAS_LIBDEFLATE |
| LZ4 | H5CPP_USE_LZ4 | No | System | H5CPP_HAS_LZ4 |
| Zstd | H5CPP_USE_ZSTD | Yes | thirdparty/zstd/ | H5CPP_HAS_ZSTD |
| SZIP | H5CPP_USE_SZIP | Yes | thirdparty/szip/ | H5CPP_HAS_SZIP |
| Gorilla | — (always) | N/A | Native in H5Zall.hpp | — |
| Shuffle | — (always) | N/A | Native in H5Zall.hpp | — |
| Fletcher32 | — (always) | N/A | Native in H5Zall.hpp | — |
Vendored libraries are built with EXCLUDE_FROM_ALL and linked statically into test executables; they do not install shared libraries unless explicitly requested.
Shuffle exposes byte-plane redundancy; gzip compresses it. This is the HDF5 "sweet spot" for float/int matrices.
LZ4 encode at ~500 MB/s+, decode at ~1-2 GB/s. Accept modest ratio for speed.
Zstd level 12 gives better ratios than gzip-6 at comparable decode speed.
For 1 M double sine wave: ~86.7 % of raw size, decode ~1 GB/s. If chunk size is large, follow with LZ4 for entropy coding: h5::gorilla{8} | h5::lz4.
Fletcher32 should be first in the pipeline so it checksums the raw data, not the compressed bitstream.
SZIP's Rice coder is optimised for spatially correlated integer raster data.
At runtime, h5::impl::filter::get_callback(H5Z_filter_t id) maps canonical HDF5 filter IDs to C++ function pointers:
Community filters (LZ4, Zstd, Gorilla) must be registered with HDF5 before use. h5cpp handles this automatically:
gorilla_register_filter() is called inside h5::gorilla::copy_impl() via std::call_once..h5pl paths or pre-loaded shared libraries).End of report.
examples/custom-pipeline/pipeline.cpp — custom filter pipeline compositionexamples/datasets/datasets.cpp — dataset creation with chunking + compression options