You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Extract verifyChunk
* Main implementation
Use magic number instead of API call (impl)
* Python bindings
* Throw errors if unsupported
* Testing
Only test if ADIOS2 version at least 2.9
* Documentation
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix ADIOS2 checks for multidimensional joined arrays
* Python test
* Expose this to the sliced Python API
* Fix
* Fix formatting
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Copy file name to clipboardExpand all lines: docs/source/usage/workflow.rst
+43Lines changed: 43 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,6 +3,49 @@
3
3
Workflow
4
4
========
5
5
6
+
Storing and reading chunks
7
+
--------------------------
8
+
9
+
1. **Chunks within an n-dimensional dataset**
10
+
11
+
Most commonly, chunks within an n-dimensional dataset are identified by their offset and extent.
12
+
The extent is the size of the chunk in each dimension, NOT the absolute coordinate within the entire dataset.
13
+
14
+
In the Python API, this is modeled to conform to the conventional ``__setitem__``/``__getitem__`` protocol.
15
+
16
+
2. **Joined arrays (write only)**
17
+
18
+
(Currently) only supported in ADIOS2 no older than v2.9.0 under the conditions listed in the `ADIOS2 documentation on joined arrays <https://adios2.readthedocs.io/en/latest/components/components.html#shapes>`_.
19
+
20
+
In some cases, the concrete chunk within a dataset does not matter and the computation of indexes is a needless computational and mental overhead.
21
+
This commonly occurs for particle data which the openPMD-standard models as a list of particles.
22
+
The order of particles does not matter greatly, and making different parallel processes agree on indexing is error-prone boilerplate.
23
+
24
+
In such a case, at most one *joined dimension* can be specified in the Dataset, e.g. ``{Dataset::JOINED_DIMENSION, 128, 128}`` (3D for the sake of explanation, particle data would normally be 1D).
25
+
The chunk is then stored by specifying an empty offset vector ``{}``.
26
+
The chunk extent vector must be equivalent to the global extent in all non-joined dimensions (i.e. joined arrays allow no further sub-chunking other than concatenation along the joined dimension).
27
+
The joined dimension of the extent vector specifies the extent that this piece should have along the joined dimension.
28
+
In the Python API, the slice-based setter syntax can be used as an abbreviation since the necessary information is determined from the passed array, e.g. ``record_component[()] = local_data``.
29
+
The global extent of the dataset along the joined dimension will then be the sum of all local chunk extents along the joined dimension.
30
+
31
+
Since openPMD follows a struct-of-array layout of data, it is important not to lose correlation of data between components. E.g., joining an array must take care that ``particles/e/position/x`` and ``particles/e/position/y`` are joined in uniform way.
32
+
33
+
The openPMD-api makes the **following guarantee**:
34
+
35
+
Consider a Series written from ``N`` parallel processes between two (collective) flush points. For each parallel process ``n`` and dataset ``D``, let:
36
+
37
+
* ``chunk(D, n, i)`` be the ``i``'th chunk written to dataset ``D`` on process ``n``
38
+
* ``num_chunks(D, n)`` be the count of chunks written by ``n`` to ``D``
39
+
* ``joined_index(D, c)`` be the index of chunk ``c`` in the joining order of ``D``
40
+
41
+
Then for any two datasets ``x`` and ``y``:
42
+
43
+
* If for any parallel process ``n`` the condition holds that ``num_chunks(x, n) = num_chunks(y, n)`` (between the two flush points!)...
44
+
* ...then for any parallel process ``n`` and chunk index ``i`` less than ``num_chunks(x, n)``: ``joined_index(x, chunk(x, n, i)) = joined_index(y, chunk(y, n, i))``.
45
+
46
+
**TLDR:** Writing chunks to two joined arrays in synchronous way (**1.** same order of store operations and **2.** between the same flush operations) will result in the same joining order in both arrays.
0 commit comments