|
3 | 3 | Workflow |
4 | 4 | ======== |
5 | 5 |
|
| 6 | +Storing and reading chunks |
| 7 | +-------------------------- |
| 8 | + |
| 9 | +1. **Chunks within an n-dimensional dataset** |
| 10 | + |
| 11 | + Most commonly, chunks within an n-dimensional dataset are identified by their offset and extent. |
| 12 | + The extent is the size of the chunk in each dimension, NOT the absolute coordinate within the entire dataset. |
| 13 | + |
| 14 | + In the Python API, this is modeled to conform to the conventional ``__setitem__``/``__getitem__`` protocol. |
| 15 | + |
| 16 | +2. **Joined arrays (write only)** |
| 17 | + |
| 18 | + (Currently) only supported in ADIOS2 no older than v2.9.0 under the conditions listed in the `ADIOS2 documentation on joined arrays <https://adios2.readthedocs.io/en/latest/components/components.html#shapes>`_. |
| 19 | + |
| 20 | + In some cases, the concrete chunk within a dataset does not matter and the computation of indexes is a needless computational and mental overhead. |
| 21 | + This commonly occurs for particle data which the openPMD-standard models as a list of particles. |
| 22 | + The order of particles does not matter greatly, and making different parallel processes agree on indexing is error-prone boilerplate. |
| 23 | + |
| 24 | + In such a case, at most one *joined dimension* can be specified in the Dataset, e.g. ``{Dataset::JOINED_DIMENSION, 128, 128}`` (3D for the sake of explanation, particle data would normally be 1D). |
| 25 | + The chunk is then stored by specifying an empty offset vector ``{}``. |
| 26 | + The chunk extent vector must be equivalent to the global extent in all non-joined dimensions (i.e. joined arrays allow no further sub-chunking other than concatenation along the joined dimension). |
| 27 | + The joined dimension of the extent vector specifies the extent that this piece should have along the joined dimension. |
| 28 | + The global extent of the dataset along the joined dimension will then be the sum of all local chunk extents along the joined dimension. |
| 29 | + |
| 30 | + Since openPMD follows a struct-of-array layout of data, it is important not to lose correlation of data between components. E.g., joining an array must take care that ``particles/e/position/x`` and ``particles/e/position/y`` are joined in uniform way. |
| 31 | + |
| 32 | + The openPMD-api makes the **following guarantee**: |
| 33 | + |
| 34 | + Consider a Series written from ``N`` parallel processes between two (collective) flush points. For each parallel process ``n`` and dataset ``D``, let: |
| 35 | + |
| 36 | + * ``chunk(D, n, i)`` be the ``i``'th chunk written to dataset ``D`` on process ``n`` |
| 37 | + * ``num_chunks(D, n)`` be the count of chunks written by ``n`` to ``D`` |
| 38 | + * ``joined_index(D, c)`` be the index of chunk ``c`` in the joining order of ``D`` |
| 39 | + |
| 40 | + Then for any two datasets ``x`` and ``y``: |
| 41 | + |
| 42 | + * If for any parallel process ``n`` the condition holds that ``num_chunks(x, n) = num_chunks(y, n)`` (between the two flush points!)... |
| 43 | + * ...then for any parallel process ``n`` and chunk index ``i`` less than ``num_chunks(x, n)``: ``joined_index(x, chunk(x, n, i)) = joined_index(y, chunk(y, n, i))``. |
| 44 | + |
| 45 | + **TLDR:** Writing chunks to two joined arrays in synchronous way (**1.** same order of store operations and **2.** between the same flush operations) will result in the same joining order in both arrays. |
| 46 | + |
| 47 | + |
6 | 48 | Access modes |
7 | 49 | ------------ |
8 | 50 |
|
|
0 commit comments