Skip to content

Commit 5cd4760

Browse files
committed
Documentation
1 parent ec404d2 commit 5cd4760

File tree

1 file changed

+42
-0
lines changed

1 file changed

+42
-0
lines changed

docs/source/usage/workflow.rst

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,48 @@
33
Workflow
44
========
55

6+
Storing and reading chunks
7+
--------------------------
8+
9+
1. **Chunks within an n-dimensional dataset**
10+
11+
Most commonly, chunks within an n-dimensional dataset are identified by their offset and extent.
12+
The extent is the size of the chunk in each dimension, NOT the absolute coordinate within the entire dataset.
13+
14+
In the Python API, this is modeled to conform to the conventional ``__setitem__``/``__getitem__`` protocol.
15+
16+
2. **Joined arrays (write only)**
17+
18+
(Currently) only supported in ADIOS2 no older than v2.9.0 under the conditions listed in the `ADIOS2 documentation on joined arrays <https://adios2.readthedocs.io/en/latest/components/components.html#shapes>`_.
19+
20+
In some cases, the concrete chunk within a dataset does not matter and the computation of indexes is a needless computational and mental overhead.
21+
This commonly occurs for particle data which the openPMD-standard models as a list of particles.
22+
The order of particles does not matter greatly, and making different parallel processes agree on indexing is error-prone boilerplate.
23+
24+
In such a case, at most one *joined dimension* can be specified in the Dataset, e.g. ``{Dataset::JOINED_DIMENSION, 128, 128}`` (3D for the sake of explanation, particle data would normally be 1D).
25+
The chunk is then stored by specifying an empty offset vector ``{}``.
26+
The chunk extent vector must be equivalent to the global extent in all non-joined dimensions (i.e. joined arrays allow no further sub-chunking other than concatenation along the joined dimension).
27+
The joined dimension of the extent vector specifies the extent that this piece should have along the joined dimension.
28+
The global extent of the dataset along the joined dimension will then be the sum of all local chunk extents along the joined dimension.
29+
30+
Since openPMD follows a struct-of-array layout of data, it is important not to lose correlation of data between components. E.g., joining an array must take care that ``particles/e/position/x`` and ``particles/e/position/y`` are joined in uniform way.
31+
32+
The openPMD-api makes the **following guarantee**:
33+
34+
Consider a Series written from ``N`` parallel processes between two (collective) flush points. For each parallel process ``n`` and dataset ``D``, let:
35+
36+
* ``chunk(D, n, i)`` be the ``i``'th chunk written to dataset ``D`` on process ``n``
37+
* ``num_chunks(D, n)`` be the count of chunks written by ``n`` to ``D``
38+
* ``joined_index(D, c)`` be the index of chunk ``c`` in the joining order of ``D``
39+
40+
Then for any two datasets ``x`` and ``y``:
41+
42+
* If for any parallel process ``n`` the condition holds that ``num_chunks(x, n) = num_chunks(y, n)`` (between the two flush points!)...
43+
* ...then for any parallel process ``n`` and chunk index ``i`` less than ``num_chunks(x, n)``: ``joined_index(x, chunk(x, n, i)) = joined_index(y, chunk(y, n, i))``.
44+
45+
**TLDR:** Writing chunks to two joined arrays in synchronous way (**1.** same order of store operations and **2.** between the same flush operations) will result in the same joining order in both arrays.
46+
47+
648
Access modes
749
------------
850

0 commit comments

Comments
 (0)