H5CPP  v1.14.0
Modern C++ templates for HDF5 serial and parallel I/O
Loading...
Searching...
No Matches
Top 10 Most Frequent HDF5-Related Bugs

Sources: CVE/NVD databases, HDF Group security advisories, GitHub issues (HDFGroup/hdf5, h5py, JuliaIO/HDF5.jl), Pulse Security fuzzing report, Debian/Ubuntu bug trackers, HDF5 release notes (1.10.x–1.14.x, 2.0.x), and the 939-repo hdf5-cpp-field-study corpus.


1. Heap Buffer Overflows (Library-Level — #1 CVE Category)

Frequency: Dominant bug class in HDF5 CVEs. Pulse Security found 89 unique memory corruption crashes in h5dump alone from fuzzing.

Affected functions: H5MM_strndup, H5C__reconstruct_cache_entry, H5HL__fl_deserialize, H5FS__sinfo_serialize_node_cb, H5T__conv_struct_opt, H5T__ref_mem_setnull, compact-layout dataset reads, object header decode routines.

Trigger: Malformed/crafted HDF5 files with manipulated metadata attributes, free-block parameters, or object headers.

Impact: DoS, potential RCE in server-side ingestion pipelines. HDF Group's own security scope lists "buffer overflows, out-of-bounds reads/writes" as the #1 in-scope vulnerability class.

Recent CVEs: CVE-2025-2310, CVE-2025-6269, CVE-2025-2924, CVE-2025-7067, CVE-2025-6750.


2. Memory Leaks (Library + Application-Level)

Frequency: Extremely common. Debian bug #638753 open since 2011 for "threadsafe memory leaks." HDF Group fixes memory leaks in nearly every maintenance release.

Library leaks: Metadata cache discard paths (CVE-2025-7068), object header continuation messages, H5O_create_ohdr/H5FL_reg_calloc leaks (GitHub #4586), scale-offset filter leaks, datatype ID leaks (GitHub #2419).

Application leaks: Our corpus shows 94% of repos use new/delete, 65% use malloc/free, only 52.6% use std::unique_ptr. Users leak hid_t handles by forgetting H5Fclose/H5Dclose/H5Tclose — especially painful in long-running services.

Impact: OOM kills in ingestion pipelines, worker churn in clusters.


3. Use-After-Free (Library-Level)

Frequency: 15 unique UAF crashes found by Pulse Security fuzzing h5stat; additional UAF in h5dump (CVE-2026-34734).

Affected paths: H5T__conv_f_f (compound type conversion), H5T__conv_struct (datatype conversion pipeline). Freed objects are referenced in memmove calls during H5Dread.

Root cause: Datatype conversion buffer lifetime mismatches — the conversion pipeline frees intermediate buffers while struct-nested conversions still hold references.

Impact: DoS; potential RCE depending on heap layout.


4. Thread-Safety / Race Conditions (Application-Level)

Frequency: From our 939-repo corpus: 596 repos (63.5%) use HDF5 + threading; 292 repos (31.1%) use locks but do NOT build with H5_THREADSAFE. This is the #1 application-level bug category by volume.

Root cause: HDF5 is not thread-safe by default. The pre-built binaries are not thread-safe. Even concurrent access to different files requires the thread-safe build because the library modifies global data structures (free-space manager, open-file lists, error stack).

Common failure modes:

  • PyTorch DataLoader with num_workers > 1 segfaults or deadlocks.
  • Fork-after-open corrupts HDF5 internal state (Python multiprocessing).
  • User-implemented mutexes around HDF5 calls without H5_THREADSAFE — the library still touches globals internally.

Impact: Silent data corruption, segmentation faults, deadlocks. HDF Group explicitly warns: "Concurrent access to different datasets in a single HDF5 file AND concurrent access to different HDF5 files both require a thread-safe version."


5. Resource / Handle Leaks (Application-Level)

Frequency: Universal among raw-C-API users. Our corpus: 457 repos (48.6%) use raw C API only; 271 (28.9%) mix C and C++ APIs.

Pattern: hid_t file = H5Fopen(...); hid_t ds = H5Dopen(file, ...); H5Dread(ds, ...); — then forgetting the close cascade. Every hid_t is a leak if not explicitly closed.

Corpus evidence: Mixed C/C++ users have 4.4× more raw hid_t exposure than C-only users (they open both C and C++ API handles). The official H5Cpp.h C++ wrapper does not use RAII for all resources — users still leak.

Impact: Process-level fd exhaustion, metadata cache bloat, file lock contention.


6. MPI Deadlocks & Collective I/O Failures (Library + Application)

Frequency: Persistent in parallel HDF5. CGNS alone had 3 bug-fix releases for parallel multi-dataset I/O deadlocks. HDF Group fixed "several potential MPI deadlocks in library failure conditions" in 1.14.x.

Common failure modes:

  • Ranks passing NULL data buffers to collective read/write functions cause deadlocks because a non-empty hyperslab selection is still created (CGNS #945).
  • Mismatched collective operations (one rank takes an error path, others wait).
  • MPI_THREAD_MULTIPLE required but not requested (OpenMPI + parallel HDF5 cartesian communicator corruption).
  • HDF5 1.12.x CMake parallel build silently producing serial libraries (GitHub #2327).

Impact: HPC job failures, wasted cluster hours, silent data corruption in distributed writes.


7. File Corruption & Invalid File Handling (Both Levels)

Frequency: Constant user complaint on forums. HDF5 has no journaling and no error recovery mechanism.

Library causes: H5Ocopy generating invalid files (GitHub #2653), version-bound mismatches, corrupted object headers from older library versions (NASA AURA incident 2007), H5Dwrite_chunk without close corrupting fill values.

Operational causes: Network share disconnections (sshfs unmount) cause segfaults on H5Fclose because the error stack dereferences freed file handles (h5py #2043). Power loss during SWMR writes leaves superblock status flags preventing reopen.

Impact: Unrecoverable scientific data loss. h5clear can only clear superblock marks — "it is not a general repair tool and should not be used to fix file corruption."


8. Type Conversion / Datatype Mismatch Errors (Application-Level)

Frequency: Very common in cross-language workflows (Python → C++, Fortran → C).

Failure modes:

  • Compound type field alignment mismatches between languages.
  • Variable-length strings written by h5py unreadable in C++ (unknown_one_character_type).
  • H5Tget_member_name returns memory the user must free with H5free_memory() — cross-CRT boundary crashes on Windows if freed with free().
  • 16-bit float (_Float16) support only added in 1.14.4 — older code fails silently.
  • Fortran generic h5dread_f subroutine resolution failures (GitHub #4557).

Impact: Silent data misinterpretation, segfaults in type-conversion paths, portability failures.


9. Segmentation Faults from Error-Handling Paths (Both Levels)

Frequency: Debian Mayhem project found h5copy, h5diff, h5dump, and h5import all crash with exit status 139 on malformed input. Julia HDF5.jl has recurring finalizer bugs ("not a file id" — GitHub #194).

Library failures: When a file open or read fails, the error-stack teardown can dereference invalid hid_t handles. Network share disconnections trigger H5F_get_nrefs segfaults during close.

Application failures: Users check H5Dread < 0 but then dereference the read buffer anyway. Our corpus found 16,105 try/catch blocks across 361 repos — but C-API users have zero structured error recovery.

Impact: Crash on what should be a clean error return. Production pipelines die instead of handling the exception.


10. Buffer Over-Read & Integer Overflow (Library-Level)

Frequency: Less common than heap overflows but historically persistent. Multiple CVEs per year.

Affected paths: H5O_link_decode (CVE-2018-13869, CVE-2018-13870), H5O_attr_decode (CVE-2018-17435), H5D__select_io SIGFPE from division-by-zero (CVE-2018-17438), on-disk attribute size calculation overflow (CVE-2021-37501 / GHSA-rfgw-5vq3-wrjf).

Root cause: Insufficient bounds checking when decoding object headers from malformed files; integer overflow in size calculations leading to zero-sized allocations.

Impact: DoS via SIGFPE or over-read; information disclosure in server-side parsing.


Summary Table

Rank Bug Category Primary Level Frequency Indicator
1 Heap buffer overflow Library 89+ unique fuzzing crashes
2 Memory leaks Library + App Every release fixes leaks; 94% corpus uses raw malloc
3 Use-after-free Library 15+ UAF crashes in fuzzing; CVE-2026-34734
4 Thread safety / races Application 63.5% of repos threaded; only 3.8% with H5_THREADSAFE
5 Resource / handle leaks Application 457 repos raw C API; 4.4× hid_t exposure in mixed users
6 MPI deadlocks Library + App Multiple CGNS/HDFGroup fixes per release
7 File corruption Both No journaling; h5clear explicitly "not a repair tool"
8 Type conversion mismatch Application Cross-language issues daily on forums
9 Segfaults in error paths Both h5copy/h5dump/h5diff all crash on bad input
10 Buffer over-read / int overflow Library 5+ CVEs in this class (2018–2021)