Skip to content

Commit 30e5bde

Browse files
HDF5: Explicit control over chunking (#1591)
* Chunking specification per dataset, explicit specification Still need to filter out the warnings better * Json internal * Properly warn about unused items * Maybe expose this publicly? * CI Fixes * Documentation * Testing * Revert "Maybe expose this publicly?" This reverts commit f00baa7. * Remove todo comment
1 parent a0eca32 commit 30e5bde

File tree

8 files changed

+277
-111
lines changed

8 files changed

+277
-111
lines changed

docs/source/backends/hdf5.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ Any file object greater than or equal in size to threshold bytes will be aligned
6565

6666
``OPENPMD_HDF5_CHUNKS``: this sets defaults for data chunking via `H5Pset_chunk <https://support.hdfgroup.org/HDF5/doc/RM/H5P/H5Pset_chunk.htm>`__.
6767
Chunking generally improves performance and only needs to be disabled in corner-cases, e.g. when heavily relying on independent, parallel I/O that non-collectively declares data records.
68+
The chunk size can alternatively (or additionally) be specified explicitly per dataset, by specifying a dataset-specific chunk size in the JSON/TOML configuration of ``resetDataset()``/``reset_dataset()``.
6869

6970
``OPENPMD_HDF5_COLLECTIVE_METADATA``: this is an option to enable collective MPI calls for HDF5 metadata operations via `H5Pset_all_coll_metadata_ops <https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetAllCollMetadataOps>`__ and `H5Pset_coll_metadata_write <https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetCollMetadataWrite>`__.
7071
By default, this optimization is enabled as it has proven to provide performance improvements.

docs/source/details/backendconfig.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -183,12 +183,15 @@ A full configuration of the HDF5 backend:
183183
.. literalinclude:: hdf5.json
184184
:language: json
185185

186-
All keys found under ``hdf5.dataset`` are applicable globally (future: as well as per dataset).
186+
All keys found under ``hdf5.dataset`` are applicable globally as well as per dataset.
187187
Explanation of the single keys:
188188

189189
* ``hdf5.dataset.chunks``: This key contains options for data chunking via `H5Pset_chunk <https://support.hdfgroup.org/HDF5/doc/RM/H5P/H5Pset_chunk.htm>`__.
190190
The default is ``"auto"`` for a heuristic.
191191
``"none"`` can be used to disable chunking.
192+
193+
An explicit chunk size can be specified as a list of positive integers, e.g. ``hdf5.dataset.chunks = [10, 100]``. Note that this specification should only be used per-dataset, e.g. in ``resetDataset()``/``reset_dataset()``.
194+
192195
Chunking generally improves performance and only needs to be disabled in corner-cases, e.g. when heavily relying on independent, parallel I/O that non-collectively declares data records.
193196
* ``hdf5.vfd.type`` selects the HDF5 virtual file driver.
194197
Currently available are:

examples/5_write_parallel.cpp

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,9 @@ type = "subfiling"
5454
ioc_selection = "every_nth_rank"
5555
stripe_size = 33554432
5656
stripe_count = -1
57+
58+
[hdf5.dataset]
59+
chunks = "auto"
5760
)";
5861

5962
// open file for writing
@@ -81,7 +84,10 @@ stripe_count = -1
8184
// example 1D domain decomposition in first index
8285
Datatype datatype = determineDatatype<float>();
8386
Extent global_extent = {10ul * mpi_size, 300};
84-
Dataset dataset = Dataset(datatype, global_extent);
87+
Dataset dataset = Dataset(datatype, global_extent, R"(
88+
[hdf5.dataset]
89+
chunks = [10, 100]
90+
)");
8591

8692
if (0 == mpi_rank)
8793
cout << "Prepared a Dataset of size " << dataset.extent[0] << "x"

include/openPMD/IO/HDF5/HDF5IOHandlerImpl.hpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,9 +118,9 @@ class HDF5IOHandlerImpl : public AbstractIOHandlerImpl
118118
#endif
119119

120120
json::TracingJSON m_config;
121+
std::optional<nlohmann::json> m_buffered_dataset_config;
121122

122123
private:
123-
std::string m_chunks = "auto";
124124
struct File
125125
{
126126
std::string name;

include/openPMD/auxiliary/JSON_internal.hpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ namespace json
9191
* @return nlohmann::json const&
9292
*/
9393
nlohmann::json const &getShadow() const;
94+
nlohmann::json &getShadow();
9495

9596
/**
9697
* @brief Invert the "shadow", i.e. a copy of the original JSON value
@@ -247,5 +248,8 @@ namespace json
247248
*/
248249
nlohmann::json &
249250
merge(nlohmann::json &defaultVal, nlohmann::json const &overwrite);
251+
252+
nlohmann::json &filterByTemplate(
253+
nlohmann::json &defaultVal, nlohmann::json const &positiveMask);
250254
} // namespace json
251255
} // namespace openPMD

0 commit comments

Comments
 (0)