|
H5CPP
v1.14.0
Modern C++ templates for HDF5 serial and parallel I/O
|
|
Sources: CVE/NVD databases, HDF Group security advisories, GitHub issues (HDFGroup/hdf5, h5py, JuliaIO/HDF5.jl), Pulse Security fuzzing report, Debian/Ubuntu bug trackers, HDF5 release notes (1.10.x–1.14.x, 2.0.x), and the 939-repo hdf5-cpp-field-study corpus.
Frequency: Dominant bug class in HDF5 CVEs. Pulse Security found 89 unique memory corruption crashes in h5dump alone from fuzzing.
Affected functions: H5MM_strndup, H5C__reconstruct_cache_entry, H5HL__fl_deserialize, H5FS__sinfo_serialize_node_cb, H5T__conv_struct_opt, H5T__ref_mem_setnull, compact-layout dataset reads, object header decode routines.
Trigger: Malformed/crafted HDF5 files with manipulated metadata attributes, free-block parameters, or object headers.
Impact: DoS, potential RCE in server-side ingestion pipelines. HDF Group's own security scope lists "buffer overflows, out-of-bounds reads/writes" as the #1 in-scope vulnerability class.
Recent CVEs: CVE-2025-2310, CVE-2025-6269, CVE-2025-2924, CVE-2025-7067, CVE-2025-6750.
Frequency: Extremely common. Debian bug #638753 open since 2011 for "threadsafe memory leaks." HDF Group fixes memory leaks in nearly every maintenance release.
Library leaks: Metadata cache discard paths (CVE-2025-7068), object header continuation messages, H5O_create_ohdr/H5FL_reg_calloc leaks (GitHub #4586), scale-offset filter leaks, datatype ID leaks (GitHub #2419).
Application leaks: Our corpus shows 94% of repos use new/delete, 65% use malloc/free, only 52.6% use std::unique_ptr. Users leak hid_t handles by forgetting H5Fclose/H5Dclose/H5Tclose — especially painful in long-running services.
Impact: OOM kills in ingestion pipelines, worker churn in clusters.
Frequency: 15 unique UAF crashes found by Pulse Security fuzzing h5stat; additional UAF in h5dump (CVE-2026-34734).
Affected paths: H5T__conv_f_f (compound type conversion), H5T__conv_struct (datatype conversion pipeline). Freed objects are referenced in memmove calls during H5Dread.
Root cause: Datatype conversion buffer lifetime mismatches — the conversion pipeline frees intermediate buffers while struct-nested conversions still hold references.
Impact: DoS; potential RCE depending on heap layout.
Frequency: From our 939-repo corpus: 596 repos (63.5%) use HDF5 + threading; 292 repos (31.1%) use locks but do NOT build with H5_THREADSAFE. This is the #1 application-level bug category by volume.
Root cause: HDF5 is not thread-safe by default. The pre-built binaries are not thread-safe. Even concurrent access to different files requires the thread-safe build because the library modifies global data structures (free-space manager, open-file lists, error stack).
Common failure modes:
num_workers > 1 segfaults or deadlocks.H5_THREADSAFE — the library still touches globals internally.Impact: Silent data corruption, segmentation faults, deadlocks. HDF Group explicitly warns: "Concurrent access to different datasets in a single HDF5 file AND concurrent access to different HDF5 files both require a thread-safe version."
Frequency: Universal among raw-C-API users. Our corpus: 457 repos (48.6%) use raw C API only; 271 (28.9%) mix C and C++ APIs.
Pattern: hid_t file = H5Fopen(...); hid_t ds = H5Dopen(file, ...); H5Dread(ds, ...); — then forgetting the close cascade. Every hid_t is a leak if not explicitly closed.
Corpus evidence: Mixed C/C++ users have 4.4× more raw hid_t exposure than C-only users (they open both C and C++ API handles). The official H5Cpp.h C++ wrapper does not use RAII for all resources — users still leak.
Impact: Process-level fd exhaustion, metadata cache bloat, file lock contention.
Frequency: Persistent in parallel HDF5. CGNS alone had 3 bug-fix releases for parallel multi-dataset I/O deadlocks. HDF Group fixed "several potential MPI deadlocks in library failure conditions" in 1.14.x.
Common failure modes:
NULL data buffers to collective read/write functions cause deadlocks because a non-empty hyperslab selection is still created (CGNS #945).MPI_THREAD_MULTIPLE required but not requested (OpenMPI + parallel HDF5 cartesian communicator corruption).Impact: HPC job failures, wasted cluster hours, silent data corruption in distributed writes.
Frequency: Constant user complaint on forums. HDF5 has no journaling and no error recovery mechanism.
Library causes: H5Ocopy generating invalid files (GitHub #2653), version-bound mismatches, corrupted object headers from older library versions (NASA AURA incident 2007), H5Dwrite_chunk without close corrupting fill values.
Operational causes: Network share disconnections (sshfs unmount) cause segfaults on H5Fclose because the error stack dereferences freed file handles (h5py #2043). Power loss during SWMR writes leaves superblock status flags preventing reopen.
Impact: Unrecoverable scientific data loss. h5clear can only clear superblock marks — "it is not a general repair tool and should not be used to fix file corruption."
Frequency: Very common in cross-language workflows (Python → C++, Fortran → C).
Failure modes:
H5Tget_member_name returns memory the user must free with H5free_memory() — cross-CRT boundary crashes on Windows if freed with free()._Float16) support only added in 1.14.4 — older code fails silently.h5dread_f subroutine resolution failures (GitHub #4557).Impact: Silent data misinterpretation, segfaults in type-conversion paths, portability failures.
Frequency: Debian Mayhem project found h5copy, h5diff, h5dump, and h5import all crash with exit status 139 on malformed input. Julia HDF5.jl has recurring finalizer bugs ("not a file id" — GitHub #194).
Library failures: When a file open or read fails, the error-stack teardown can dereference invalid hid_t handles. Network share disconnections trigger H5F_get_nrefs segfaults during close.
Application failures: Users check H5Dread < 0 but then dereference the read buffer anyway. Our corpus found 16,105 try/catch blocks across 361 repos — but C-API users have zero structured error recovery.
Impact: Crash on what should be a clean error return. Production pipelines die instead of handling the exception.
Frequency: Less common than heap overflows but historically persistent. Multiple CVEs per year.
Affected paths: H5O_link_decode (CVE-2018-13869, CVE-2018-13870), H5O_attr_decode (CVE-2018-17435), H5D__select_io SIGFPE from division-by-zero (CVE-2018-17438), on-disk attribute size calculation overflow (CVE-2021-37501 / GHSA-rfgw-5vq3-wrjf).
Root cause: Insufficient bounds checking when decoding object headers from malformed files; integer overflow in size calculations leading to zero-sized allocations.
Impact: DoS via SIGFPE or over-read; information disclosure in server-side parsing.
| Rank | Bug Category | Primary Level | Frequency Indicator |
|---|---|---|---|
| 1 | Heap buffer overflow | Library | 89+ unique fuzzing crashes |
| 2 | Memory leaks | Library + App | Every release fixes leaks; 94% corpus uses raw malloc |
| 3 | Use-after-free | Library | 15+ UAF crashes in fuzzing; CVE-2026-34734 |
| 4 | Thread safety / races | Application | 63.5% of repos threaded; only 3.8% with H5_THREADSAFE |
| 5 | Resource / handle leaks | Application | 457 repos raw C API; 4.4× hid_t exposure in mixed users |
| 6 | MPI deadlocks | Library + App | Multiple CGNS/HDFGroup fixes per release |
| 7 | File corruption | Both | No journaling; h5clear explicitly "not a repair tool" |
| 8 | Type conversion mismatch | Application | Cross-language issues daily on forums |
| 9 | Segfaults in error paths | Both | h5copy/h5dump/h5diff all crash on bad input |
| 10 | Buffer over-read / int overflow | Library | 5+ CVEs in this class (2018–2021) |