Skip to content

Commit bc4997b

Browse files
committed
Finish implementation and add testing
1 parent 984d206 commit bc4997b

File tree

1 file changed

+223
-19
lines changed

1 file changed

+223
-19
lines changed

docs/design_docs/cached_outputs.rst

Lines changed: 223 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -191,7 +191,7 @@ those on the LCRC server. For example:
191191
Design: unique identifier for cached outputs
192192
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
193193

194-
Date last modified: 2021/07/30
194+
Date last modified: 2021/08/03
195195

196196
Contributors: Xylar Asay-Davis
197197

@@ -209,10 +209,10 @@ database for that MPAS core on the LCRC server. For example:
209209
.. code-block:: json
210210
211211
{
212-
"ocean/global_ocean/QU240/mesh/mesh/culled_mesh.nc": "global_ocean/QU240/mesh/mesh/culled_mesh.210727.nc",
213-
"ocean/global_ocean/QU240/mesh/mesh/culled_graph.info": "global_ocean/QU240/mesh/mesh/culled_graph.210727.info",
214-
"ocean/global_ocean/QU240/mesh/mesh/critical_passages_mask_final.nc": "global_ocean/QU240/mesh/mesh/critical_passages_mask_final.210727.nc",
215-
"ocean/global_ocean/QU240/PHC/init/initial_state/initial_state.nc": "global_ocean/QU240/PHC/init/initial_state/initial_state.210727.nc",
212+
"ocean/global_ocean/QU240/mesh/mesh/culled_mesh.nc
213+
"ocean/global_ocean/QU240/mesh/mesh/culled_graph.info
214+
"ocean/global_ocean/QU240/mesh/mesh/critical_passages_mask_final.nc
215+
"ocean/global_ocean/QU240/PHC/init/initial_state/initial_state.nc
216216
"ocean/global_ocean/QU240/PHC/init/initial_state/init_mode_forcing_data.nc": "global_ocean/QU240/PHC/init/initial_state/init_mode_forcing_data.210727.nc"
217217
}
218218
@@ -240,62 +240,180 @@ to me how we achieve this flexibility without requiring that a given step
240240
either be set up as "normal" or "cached", and not both in the same work
241241
directory.
242242

243-
244-
245-
246243
Implementation
247244
--------------
248245

246+
The implementation is on
247+
`this branch <https://github.com/xylar/compass/tree/cached_init>`_.
248+
249249
.. _imp_cached:
250250

251251
Implementation: cached outputs
252252
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
253253

254-
Date last modified: 2021/07/30
254+
Date last modified: 2021/08/04
255255

256256
Contributors: Xylar Asay-Davis
257257

258+
Each step has a boolean attribute ``cached`` that defaults to ``False`` but
259+
which can be set to ``True`` by a process described in :ref:`imp_select`. If
260+
``cached == True``, when inputs and outputs are being processes, the usual
261+
inputs are ignored and instead the outputs are added as inputs. Targets in the
262+
``compass_cache`` database are selected using the dictionary stored in the
263+
MPAS core's ``cached_files.json``. Namelists and steams files are also not
264+
generated.
258265

259266
.. _imp_select:
260267

261268
Implementation: selecting whether to use cached outputs
262269
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
263270

264-
Date last modified: 2021/07/30
271+
Date last modified: 2021/08/04
265272

266273
Contributors: Xylar Asay-Davis
267274

275+
The implementation includes the two mechanisms for selecting cached outputs
276+
described in :ref:`des_select`.
277+
278+
When setting up a test suites, a new list of lists called ``cached`` is created
279+
along with the list of test-case paths. By default, all test cases have an
280+
empty list of steps with cached outputs. Any line in a test suite file that is
281+
``cached`` (once white space is stripped away) will indicate that all steps in
282+
that test case should use cached outputs. This is accomplished by adding a
283+
special "step" named ``_all`` as the first step in the list for the given test
284+
case. If a line of the test suite file starts with ``cached:`` (after
285+
stripping away white space), the remainder of the line is a space-separated
286+
list of step names that should be set up with cached outputs. These steps
287+
are appended to the list of cached steps for the test case. If a test case has
288+
many steps with cached outputs, it may be convenient to have multiple lines
289+
starting with ``cached:``, as in this example.
268290

291+
.. code-block:: none
292+
293+
ocean/global_convergence/cosine_bell
294+
cached: QU60_mesh QU60_init QU90_mesh QU90_init QU120_mesh QU120_init
295+
cached: QU150_mesh QU150_init QU180_mesh QU180_init QU210_mesh QU210_init
296+
cached: QU240_mesh QU240_init
297+
298+
If a user is setting up individual test cases, they can indicate that all the
299+
steps in a test case should have cached inputs with the suffix ``c`` after the
300+
test number. While there is also a flag ``--cached`` that can be used to list
301+
steps of a single test case to use from cached outputs, this feature is likely
302+
to be too cumbersome to be broadly useful. Instead, developers should probably
303+
create a test suite for test cases where users are likely to want some steps
304+
with and others without cached outputs, as in the Cosine Bell example above.
269305

270306
.. _imp_update:
271307

272308
Implementation: updating cached outputs
273309
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
274310

275-
Date last modified: 2021/07/30
311+
Date last modified: 2021/08/04
276312

277313
Contributors: Xylar Asay-Davis
278314

315+
The new ``compass cache`` command has been added and is defined in the
316+
``compass.cache`` module. It takes a list of step paths as input and optional
317+
flags ``--dry_run`` (which doesn't copy the files to the directory on the LCRC
318+
server) and ``--date_string``, which lets a user supply a date stamp (YYMMDD)
319+
other than today's date.
320+
321+
As stated in the design, the command is only available on Chrysalis and Anvil
322+
and should be run on a work directory. To support caching files from multiple
323+
MPAS cores at the same time, ``compass cache`` produces an updated database
324+
file ``<mpas_core>_cached_files.json`` in the base of the work directory where
325+
the command is run. If this file already exists before ``compass cache`` is
326+
run, the information for the specified steps will be added if it is not yet
327+
in the database or will be updated, e.g. with new date stamps, if it does
328+
exist. If no ``<mpas_core>_cached_files.json`` exists, the file
329+
``cached_files.json`` from the python module ``compass.<mpas_core>`` is used as
330+
the starting point instead. If this file also doesn't exist, we start with an
331+
empty dictionary.
332+
333+
As an example, yesterday (8/3/2021) when I made the following call:
334+
335+
.. code-block:: bash
336+
337+
for mesh in QU60 QU90 QU120 QU150 QU180 QU210 QU240
338+
do
339+
for step in mesh init
340+
do
341+
compass cache -i ocean/global_convergence/cosine_bell/${mesh}/${step}
342+
done
343+
done
344+
345+
the result was a cache file ``ocean_cached_files.json`` like this:
346+
347+
.. code-block:: json
348+
349+
{
350+
"ocean/global_convergence/cosine_bell/QU60/mesh/mesh.nc
351+
"ocean/global_convergence/cosine_bell/QU60/mesh/graph.info
352+
"ocean/global_convergence/cosine_bell/QU60/init/namelist.ocean
353+
"ocean/global_convergence/cosine_bell/QU60/init/initial_state.nc
354+
"ocean/global_convergence/cosine_bell/QU90/mesh/mesh.nc
355+
"ocean/global_convergence/cosine_bell/QU90/mesh/graph.info
356+
"ocean/global_convergence/cosine_bell/QU90/init/namelist.ocean
357+
"ocean/global_convergence/cosine_bell/QU90/init/initial_state.nc
358+
...
359+
}
360+
361+
This file should be copied back to ``compass/ocean/cached_files.json`` in
362+
a branch of the compass repo, committed to the branch, and updated on
363+
``master`` with a pull request as normal.
364+
279365

280366
.. _imp_unique:
281367

282368
Implementation: unique identifier for cached outputs
283369
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
284370

285-
Date last modified: 2021/07/30
371+
Date last modified: 2021/08/04
286372

287373
Contributors: Xylar Asay-Davis
288374

375+
A date string is appended to the end of files in the ``compass_cache`` database
376+
on LCRC and stored in ``cached_files.json``. The date string defaults to the
377+
date the ``compass cache`` command is run but can be specified manually with
378+
the ``--date_string`` flag if desired.
289379

290380
.. _imp_normal_or_cached:
291381

292382
Implementation: either "normal" or "cached" versions of a step
293383
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
294384

295-
Date last modified: 2021/07/30
385+
Date last modified: 2021/08/04
296386

297387
Contributors: Xylar Asay-Davis
298388

389+
The implementation leans heavily on the assumption that a given step will
390+
either be run with cached outputs or as normal, so that both versions are not
391+
available in the same work directory or as part of the same test suite.
392+
393+
Nevertheless, if a separate "cached" version of a step were desired, it would
394+
be necessary to make symlinks from the cached files in the location of the
395+
"uncached" version of the step to the location of the "cached" version. For
396+
example, if the "uncached" step is
397+
398+
.. code-block:: none
399+
400+
ocean/global_ocean/QU240/mesh/mesh
401+
402+
and the "cached" version of the step is
403+
404+
.. code-block:: none
405+
406+
ocean/global_ocean/QU240/cached/mesh/mesh
407+
408+
symlinks could be created on the LCRC server, e.g.
409+
410+
.. code-block:: none
411+
412+
/lcrc/group/e3sm/public_html/mpas_standalonedata/mpas-ocean/compass_cache/global_ocean/QU240/cached/mesh/mesh/culled_mesh.210803.nc
413+
-> /lcrc/group/e3sm/public_html/mpas_standalonedata/mpas-ocean/compass_cache/global_ocean/QU240/mesh/mesh/culled_mesh.210803.nc
414+
415+
and the ``cached`` attribute could be set to ``True`` in the constructor of the
416+
cached version of the step.
299417

300418
Testing
301419
-------
@@ -305,49 +423,135 @@ Testing
305423
Testing: cached outputs
306424
^^^^^^^^^^^^^^^^^^^^^^^
307425

308-
Date last modified: 2021/07/30
426+
Date last modified: 2021/08/04
309427

310428
Contributors: Xylar Asay-Davis
311429

430+
I have constructed cached versions of the following steps on the LCRC server,
431+
using test-case runs on Chrysalis.
432+
433+
.. code-block:: none
434+
435+
ocean/global_ocean/QU240/mesh/mesh/
436+
ocean/global_ocean/QU240/PHC/init/initial_state/
437+
ocean/global_ocean/QUwISC240/mesh/mesh/
438+
ocean/global_ocean/QUwISC240/PHC/init/initial_state/
439+
ocean/global_ocean/QUwISC240/PHC/init/ssh_adjustment/
440+
ocean/global_ocean/EC30to60/mesh/mesh/
441+
ocean/global_ocean/EC30to60/PHC/init/initial_state/
442+
ocean/global_ocean/WC14/mesh/mesh/
443+
ocean/global_ocean/WC14/PHC/init/initial_state/
444+
ocean/global_ocean/ECwISC30to60/mesh/mesh/
445+
ocean/global_ocean/ECwISC30to60/PHC/init/initial_state/
446+
ocean/global_ocean/ECwISC30to60/PHC/init/ssh_adjustment/
447+
ocean/global_ocean/SOwISC12to60/mesh/mesh/
448+
ocean/global_ocean/SOwISC12to60/PHC/init/initial_state/
449+
ocean/global_ocean/SOwISC12to60/PHC/init/ssh_adjustment/
450+
ocean/global_convergence/cosine_bell/QU60/mesh/
451+
ocean/global_convergence/cosine_bell/QU60/init/
452+
ocean/global_convergence/cosine_bell/QU90/mesh/
453+
ocean/global_convergence/cosine_bell/QU90/init/
454+
ocean/global_convergence/cosine_bell/QU120/mesh/
455+
ocean/global_convergence/cosine_bell/QU120/init/
456+
ocean/global_convergence/cosine_bell/QU180/mesh/
457+
ocean/global_convergence/cosine_bell/QU180/init/
458+
ocean/global_convergence/cosine_bell/QU210/mesh/
459+
ocean/global_convergence/cosine_bell/QU210/init/
460+
ocean/global_convergence/cosine_bell/QU240/mesh/
461+
ocean/global_convergence/cosine_bell/QU240/init/
462+
ocean/global_convergence/cosine_bell/QU150/mesh/
463+
ocean/global_convergence/cosine_bell/QU150/init/
464+
465+
I have set up and run versions of all these steps with cached outputs, together
466+
with forward runs (``performance_test`` in the global ocean test group, and
467+
``forward`` steps in the ``cosine_bell`` test case) that make use of the
468+
cached outputs as inputs. All tests ran successfully and were bit-for-bit with
469+
a baseline that was used to produce the cached outputs.
312470

313471
.. _test_select:
314472

315473
Testing: selecting whether to use cached outputs
316474
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
317475

318-
Date last modified: 2021/07/30
476+
Date last modified: 2021/08/04
319477

320478
Contributors: Xylar Asay-Davis
321479

480+
I added QUwISC240 test case to the ocean ``nightly`` test suite using cached
481+
outputs for the ``mesh`` and ``init`` test cases:
482+
483+
.. code-block:: none
484+
485+
ocean/global_ocean/QUwISC240/mesh
486+
cached
487+
ocean/global_ocean/QUwISC240/PHC/init
488+
cached
489+
ocean/global_ocean/QUwISC240/PHC/performance_test
490+
491+
I created a new test suite, ``cosine_bell_cached_init``, for the
492+
``cosine_bell`` test case that uses cached outputs fro the ``mesh`` and
493+
``init`` steps at each default mesh resolution:
494+
495+
.. code-block:: none
496+
497+
ocean/global_convergence/cosine_bell
498+
cached: QU60_mesh QU60_init QU90_mesh QU90_init QU120_mesh QU120_init
499+
cached: QU150_mesh QU150_init QU180_mesh QU180_init QU210_mesh QU210_init
500+
cached: QU240_mesh QU240_init
501+
502+
I set up the remaining steps with cached outputs mentioned in
503+
:ref:`test_cached` as follows:
504+
505+
.. code-block:: bash
322506
507+
compass list
508+
509+
compass setup -n 40c 41c 42 60c 61c 62 80c 81c 82 85c 86c 87 90c 91c 92 \
510+
95c 96c 97 ...
511+
512+
Results were bit-for-bit with the same test cases run without cached outputs.
323513

324514
.. _test_update:
325515

326516
Testing: updating cached outputs
327517
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
328518

329-
Date last modified: 2021/07/30
519+
Date last modified: 2021/08/04
330520

331521
Contributors: Xylar Asay-Davis
332522

523+
All cached files used in the testing above sere created with ``compass cache``
524+
on Chrysalis. Multiple runs of this command created, then updated the local
525+
``ocean_cached_files.json``, as expected. The files ended up in the expected
526+
directories on the LCRC server with the expected date strings appended to the
527+
file basename (before the extension).
528+
529+
The ``--dry_run`` feature also worked as expected, updating the
530+
``ocean_cached_files.json`` without copying files. The ``--date_string``
531+
flag could be used to specify an alternative suffix, as expected.
333532

334533
.. _test_unique:
335534

336535
Testing: unique identifier for cached outputs
337536
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
338537

339-
Date last modified: 2021/07/30
538+
Date last modified: 2021/08/04
340539

341540
Contributors: Xylar Asay-Davis
342541

542+
All files in the ``compass_cache`` database have date strings appended to them
543+
to make them unique. No testing has been performed yet to ensure that new
544+
cached files with new dated can be added but I don't foresee any problems.
343545

344546
.. _test_normal_or_cached:
345547

346548
Testing: either "normal" or "cached" versions of a step
347549
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
348550

349-
Date last modified: 2021/07/30
551+
Date last modified: 2021/08/04
350552

351553
Contributors: Xylar Asay-Davis
352554

353-
555+
The implementation that I tested is based on this requrements. However, in the
556+
future, the requirement could be relaxed if need be using the approach I
557+
outlined in :ref:`imp_normal_or_cached`.

0 commit comments

Comments
 (0)