Direct netCDF I/O for soca state/geometry (soca_io_mod)#1242
Conversation
Introduces src/soca/IO/soca_io_mod.F90 with soca_io_file_reader / soca_io_file_writer types (init/enqueue/commit/close), and routes all state, geometry, and balance-coefficient I/O through it. PE 0 calls nf90_get_var / nf90_put_var; mpp_broadcast and mpp_gather distribute / collect data on the geometry's pelist. Removes all fms_io_mod usages (register_restart_field, restore_state, save_restart, read_data, file_exist, field_exist, fms_io_init, fms_io_exit) from soca, including the global_soca_geom_counter shim. fms_init / fms_end are retained (subsystem init only, no I/O). Resolves the GDAS Tsnz_h shape mismatch by building nf90_get_var count from the file's actual dim sizes, not the caller's buffer rank. Per-domain state writes (ocn/sfc/ice/wav/bio) and reads now use the new module; the FMS code paths are removed in the same change rather than gated. Refs #1125. Obsoletes PR #1241.
Under FMS, the writer would implicitly add the .nc suffix, so test YAMLs and downstream readers reference 'ocn.<exp>.<typ>.<date>.nc'. The direct-netCDF writer in soca_io_mod is faithful to whatever string it's given, so files were landing without the extension and no consumer could find them.
Six soca_add_test() calls listed test_socs_parameters_diffusion as a dependency (should be test_soca_). Because the named test does not exist, ctest treated the dependency as void and ran the variational tests concurrently with parameters_diffusion under -j>1, causing races on the diffusion calibration outputs.
Drops the SOCA_IO_SERIAL / SOCA_IO_PARALLEL mode bifurcation (and the method= arg + io.method config knob) and rewrites the reader to do per-PE strided nf90_get_var, mirroring FMS mpp_io's domain-decomposed read pattern. Each PE opens the file once via NF90_NOWRITE and reads its own compute-domain tile via nf90_get_var(start, count) -- the same shape as fms_io's READ_RECORD_ for fileset.NE.MPP_SINGLE. Adds a module-level read cache mirroring fms_io's get_file_unit + files_read(i)%var(j) tables: - one nf90_open per (PE, filename) for the whole run - cached (varid, file_ndims, middle_dim sizes) per (file, var) - soca_io_close_all() flushes the cache; wired into soca_geom_end Caller updates: drop method= from reader/writer init() and the soca_io_method_from_config helper from soca_fields_mod / soca_geom_mod / soca_balance_mod. 1-deg 20-mem LETKF read phase (LocalEnsembleDA before solver ctor): baseline (FMS): 3.91 s before this commit (no cache): 13.92 s with this commit: 3.97 s Outputs are bit-identical.
…cache) Adds a second reader_commit implementation -- PE 0 nf90_get_var the global field + mpp_broadcast -- alongside the existing per-PE strided path. Both paths share the file-handle + var-metadata cache. Select via SOCA_IO_READ_MODE=broadcast|strided (default broadcast); the env var is latched on first reader_commit. Why broadcast as default: in DA cycling the page cache is always cold, since each cycle reads files that the previous cycle / model run just wrote. On cold cache the broadcast path (1 sequential read on PE 0, kernel readahead happy) beats strided (8 concurrent strided readers thrash the prefetcher and pay 8x the open-syscall cost). 1-deg 20-mem LETKF, rancor, taskset -c 0-7, cold cache (sample 1): broadcast: 13.5 s read strided: 30.5 s read Warm cache reverses the result (strided 3.9 s vs broadcast 13.3 s) but cycling never sees warm cache. On a parallel filesystem (Lustre/GPFS) or multi-node setup the strided path may win again -- toggle the env var to test.
Drops SOCA_IO_READ_MODE in favor of a yaml block on the geometry
config:
geometry:
io:
read mode: broadcast | strided
If absent the module-level default (broadcast) is kept.
soca_geom_init calls soca_io_read_mode_from_config(f_conf) once at
startup; the choice latches for every subsequent reader_commit.
Unrecognized values abort so typos surface immediately.
Annotates testinput/letkf.yml's geometry block with the two read-mode options so a contributor reading the canonical letkf example sees both choices without having to hunt for the soca_io_mod docstring.
The write path emits FMS-style auto-numbered dim names (xaxis_N / yaxis_N / zaxis_N / Time); the file-header docstring incorrectly claimed xh / yh / Time. Correct the comment -- no code change.
Writer: - enqueue holds pointers to caller buffers instead of allocating and copying (mirrors FMS register_restart_field contract). Compute-slice extraction moves from enqueue to commit, so peak per-writer memory drops from sum(var_bytes) to one (nx_c x ny_c) tile. - commit uses the 3D mpp_gather overload for 3D vars: one collective per var instead of nlevels, and no per-level gbuf3d(:,:,k) = gbuf2d memcpy on root. - Reuse gbuf3d / tile3 across 3D vars when nlevels matches. Reader: - commit_reader_strided hoists tile2 out of the per-var loop and reuses tile3 / tile4 when trailing dims match. - Dead tile3/tile4 zero-inits removed (read_var_strided fully fills the buffer). read_var_strided: fixed-size stack arrays for st / ct instead of per-call heap allocation. put_axis_coord_data: reuse idxbuf across axes. Cleanup: - Remove dead 'use mpi' and 'use fckit_configuration_module'. - Remove dead cartesian_axis parameter on writer_enqueue_1d (it was stored but never written to the netcdf output). - Drop unused nprocs out-param of mpi_pelist. - soca_io_close_all: use ncc on nf90_close instead of silently discarding the status. - commit_reader_scatter: matching dead-init cleanup; comment updated to flag it as pending parallel-ensemble I/O exercise. - Trim verbose / redundant comments.
|
Note, @shlyaeva, this uses the |
There was a problem hiding this comment.
Pull request overview
This PR replaces SOCA’s previous FMS restart I/O pathway (register_restart_field / save_restart / restore_state) with direct NetCDF Fortran (nf90_*) I/O implemented in a new soca_io_mod module, and updates geometry/fields/balance call sites to use a register-then-commit reader/writer API.
Changes:
- Added
src/soca/IO/soca_io_mod.F90providingsoca_io_reader/soca_io_writerplus basic file/variable existence helpers and a cached read-handle table. - Updated
soca_geom_mod,soca_fields_mod, andsoca_balance_modto use the new reader/writer API in place of FMS I/O. - Extended LETKF YAML test input with an
io.read modeknob and adjusted test dependencies naming intest/CMakeLists.txt.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| test/testinput/letkf.yml | Adds YAML configuration for geometry I/O read strategy. |
| test/CMakeLists.txt | Fixes/aligns diffusion parameter test dependency name. |
| src/soca/LinearVariableChange/Balance/soca_balance_mod.F90 | Switches sea-ice Jacobian read from read_data to soca_io_reader. |
| src/soca/IO/soca_io_mod.F90 | Introduces direct NetCDF I/O module with writer/reader and cached read handles. |
| src/soca/IO/CMakeLists.txt | Adds the new IO module to SOCA build sources. |
| src/soca/Geometry/soca_geom_mod.F90 | Replaces gridspec read/write via FMS restart API with soca_io_reader/writer; closes cached handles on geom end. |
| src/soca/Fields/soca_fields_mod.F90 | Replaces restart I/O and existence checks with soca_io_* equivalents; updates filename generation to include .nc. |
| src/soca/CMakeLists.txt | Adds IO subdirectory to the build. |
Comments suppressed due to low confidence (3)
src/soca/IO/soca_io_mod.F90:196
- The writer stores pointers to
srcdummy arguments (self%vars(... )%data* => src) and relies on them remaining valid untilcommit(). Per the Fortran standard this is only safe if the actual arguments have theTARGET(orPOINTER) attribute; otherwise the stored pointers can become undefined afterenqueue_*returns. Either (a) change the API to copy data into internal allocatables, or (b) require/enforceTARGETon all actual arrays passed toenqueue(including allocatable components likegeom%lon,varwrapper%data, etc.), and update call sites accordingly.
subroutine writer_enqueue_1d(self, name, src, long_name, units)
class(soca_io_writer), intent(inout) :: self
character(len=*), intent(in) :: name
real(kind=kind_real), target, intent(in) :: src(:)
character(len=*), optional, intent(in) :: long_name, units
call grow_if_needed(self)
self%nvars = self%nvars + 1
self%vars(self%nvars)%name = name
self%vars(self%nvars)%ndims = 1
self%vars(self%nvars)%nlevels = 1
self%vars(self%nvars)%data1d => src
self%vars(self%nvars)%long_name = name
self%vars(self%nvars)%units = 'none'
if (present(long_name)) self%vars(self%nvars)%long_name = long_name
if (present(units)) self%vars(self%nvars)%units = units
end subroutine writer_enqueue_1d
src/soca/IO/soca_io_mod.F90:401
- Same issue for 3D gathers:
gbuf3dis only allocated on root, but it is passed tompp_gatheron all PEs. Passing an unallocated allocatable as an actual argument is not safe even if non-root never reads it. Allocate a dummygbuf3d(1,1,1)(or otherwise ensure it is allocated) on non-root PEs before callingmpp_gather.
nlev = self%vars(v)%nlevels
if (is_root) then
if (allocated(gbuf3d)) then
if (size(gbuf3d, 3) /= nlev) deallocate(gbuf3d)
end if
if (.not. allocated(gbuf3d)) allocate(gbuf3d(self%nx_g, self%ny_g, nlev))
end if
if (allocated(tile3)) then
if (size(tile3, 3) /= nlev) deallocate(tile3)
end if
if (.not. allocated(tile3)) allocate(tile3(nx_c, ny_c, nlev))
tile3 = self%vars(v)%data3d(i_off : i_off + nx_c - 1, &
j_off : j_off + ny_c - 1, :)
call mpp_gather(self%isc, self%iec, self%jsc, self%jec, nlev, pelist, &
tile3, gbuf3d, is_root)
if (is_root) then
src/soca/Geometry/soca_geom_mod.F90:257
- The new
greader%enqueue(...)calls pass allocatable components likeself%lon,self%mask2d, etc. that are not declared withTARGET. Sincesoca_io_readerstores pointers to the enqueued buffers untilcommit(), these actual arguments should beTARGET(or the reader should be changed to copy into internal storage) to avoid non-standard/undefined pointer lifetimes.
call greader%init(self%Domain%mpp_domain, str)
call greader%enqueue("lonh", self%lonh)
call greader%enqueue("lath", self%lath)
call greader%enqueue("lonq", self%lonq)
call greader%enqueue("latq", self%latq)
call greader%enqueue("lon", self%lon)
call greader%enqueue("lat", self%lat)
call greader%enqueue("lonu", self%lonu)
call greader%enqueue("latu", self%latu)
call greader%enqueue("lonv", self%lonv)
call greader%enqueue("latv", self%latv)
call greader%enqueue("sin_rot", self%sin_rot)
call greader%enqueue("cos_rot", self%cos_rot)
call greader%enqueue("dx", self%dx)
call greader%enqueue("dy", self%dy)
call greader%enqueue("area", self%cell_area)
call greader%enqueue("mask2d", self%mask2d)
call greader%enqueue("mask2du", self%mask2du)
call greader%enqueue("mask2dv", self%mask2dv)
call greader%commit()
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Drop dangling references to the removed soca_io_read_mode_from_config / 'io.read mode' YAML key in soca_geom_mod and the letkf test input. The selector was already gone from soca_io_mod; without these the PR fails to compile.
- soca_io_mod: allocate gbuf2d / gbuf3d as 1x1 dummies on non-root so the actuals passed to mpp_gather are always allocated (assumed-shape dummy requires it). Pattern matches commit_reader_scatter. - soca_io_mod: refresh stale 'Data is copied in' comment to reflect the pointer-based enqueue semantics (caller buffer kept alive/unmutated through commit; actual must satisfy TARGET association rules). - soca_geom_mod: TARGET on the 'self' dummy in soca_geom_init, soca_geom_init_fieldset, and soca_geom_write so the allocatable components (self%lonh, self%lat, ..., self%mask2d*) are valid pointer targets for the soca_io reader/writer enqueue. TARGET on the local fieldData / fieldDataVars too. - soca_fields_mod: TARGET on h_common and on the local vars(:) wrapper array so vars(n)%data sections used in enqueue are valid pointer targets. - soca_balance_mod: TARGET on local kct for the same reason.
|
Thanks for the review. All six Copilot findings addressed:
All 7 writer/reader-heavy soca ctests pass locally; cluster-faithful timing unchanged from the prior commit on this branch. |
|
@travissluka thank you! I'll test this today in the workflow and report back. |
shlyaeva
left a comment
There was a problem hiding this comment.
Thank you @travissluka! This approval is based solely on testing with 0.25 degree soca in GFSv17 "hybrid" configuration (the one that also runs LETKF) for 1-2 cycles. The timings of I/O are slightly better than on develop 🎉 . I have not reviewed any of this code, and don't plan to review it.
(side note unrelated to this PR: the overall timing of 3DVar is significantly slower than with the code from a few weeks ago, I suspect something may have changed in how we calibrate vertical diffusion. we'll investigate that separately).
I think the slight improvement is because FMS was copying fields before writing them, whereas now we just pass pointers around. Memory usage should also be slightly less now. |
There was a problem hiding this comment.
Thank you for pointing me to looking at the memory, I just rebuilt with soca develop with using the same oops/etc branches for apples to apples comparison (the previous one again wasn't!), and actually there are two issues (one crucial).
- [major, needs fixing before the merge]. The issue with vertical diffusion that I mentioned above is due to this branch.
On develop, consistent with other builds:
0: group 1 of 1
0: Generating vertical scales
0: Reading from file
0: mask: NONE
0: scale type: Gaussian
0: vtScales: min: 5 mean: 7.17912 max: 15
0: Diffusion: vertical iterations: 450
0: Calculating vertical normalization...
0: Info : Writing file: ./staticb/vt_ocean.nc
on this branch:
0: ==================================================================================
0: saber::Diffusion calibration
0: ----------------------------------------------------------------------------------
0:
0: group 1 of 1
0: Generating vertical scales
0: Reading from file
0: mask: NONE
0: scale type: Gaussian
0: vtScales: min: -2.30585 mean: 7.85725 max: 32.6274
0: Diffusion: vertical iterations: 2128
0: Calculating vertical normalization...
0: Info : Writing file: ./staticb/vt_ocean.nc
0: ==================================================================================
I don't know yet why this happens, I'll dig into what exactly is happening in our workflow here, compare some files and analyze.
- Memory is actually worse here than on develop, especially for LETKF.
Var:
develop:
0: OOPS_STATS IncrementalAssimilation start - Runtime: 4.81 sec, Local Memory: 470.56 MB
0: OOPS_STATS Variational end - Runtime: 115.71 sec, Local Memory: 4.19 GB
0: OOPS_STATS Run end - Runtime: 117.60 sec, Memory: total: 418.78 GB, per task: min = 825.51 MB, max = 4189.71 MB
this branch:
0: OOPS_STATS IncrementalAssimilation start - Runtime: 5.85 sec, Local Memory: 517.79 MB
0: OOPS_STATS Variational end - Runtime: 156.98 sec, Local Memory: 3.81 GB
0: OOPS_STATS Run end - Runtime: 159.86 sec, Memory: total: 526.89 GB, per task: min = 1.05 GB, max = 3.81 GB
You'll notice the difference in runtime -- this is due to vertical diffusion behaving differently.
LETKF:
develop:
0: OOPS_STATS LocalEnsembleDA before solver ctor - Runtime: 30.10 sec, Local Memory: 1.78 GB
0: OOPS_STATS LocalEnsembleDA before solver - Runtime: 97.67 sec, Memory: total: 742.32 GB, per task: min = 2.83 GB, max = 5.08 GB
0: OOPS_STATS Run end - Runtime: 631.32 sec, Memory: total: 898.97 GB, per task: min = 2.98 GB, max = 8.45 GB
this branch:
0: OOPS_STATS LocalEnsembleDA before solver ctor - Runtime: 11.51 sec, Local Memory: 4.31 GB
0: OOPS_STATS LocalEnsembleDA before solver - Runtime: 77.66 sec, Memory: total: 1420.82 GB, per task: min = 5.66 GB, max = 7.71 GB
0: OOPS_STATS Run end - Runtime: 546.03 sec, Memory: total: 1597.63 GB, per task: min = 6.00 GB, max = 9.41 GB
@travissluka can you look into whether we can bring the memory down? I'll try to figure out why the vertical diffusion estimation is misbehaving.
|
@shlyaeva 1) Once I get number 1 fixed i'll push changes for a retest Edit: I'm not actually able to reproduce the vertical correlation bug. Given the value range, that does oddly seem like it's reading in temperature values, not correlation values 🤔 |
…e checks Three related fixes in the direct-netCDF reader path: 1. read_var_strided previously hardcoded "last Fortran dim is trailing time" and set ct(file_ndims)=1 unconditionally. For a file with spatial-only layout (Temp(z,y,x) with no leading Time), this read only level 1 of z into a multi-level destination, leaving the rest uninitialized -- showing up as garbage scale values in the vertical diffusion calibration. Now discriminate file_ndims == dst_rank (no time, fill every dim from the file) vs file_ndims > dst_rank (trailing time + middle squeeze, e.g. CICE Tsnz_h's nksnow=1). A total-element-count check catches silent partial-fills and dim-size mismatches (e.g. file z=75 vs destination z=25). 2. Drop the read_cache / cached_open / soca_io_close_all machinery. Each reader_commit nf90_opens, reads all enqueued vars (with inline inq_varid + inquire_variable + inquire_dimension; microseconds), and nf90_closes. Holding NetCDF4 handles open across commits was bloating LETKF per-task memory by ~MB-to-GB scale (HDF5 metadata + chunk caches per open file). Restores per-task max to match develop. 3. Add check_buf_1d / check_buf_2d helpers called from every reader/writer enqueue. Catches unallocated and wrong-sized caller buffers at the enqueue site instead of silently storing the pointer for a later get_var/put_var to mishandle.
|
Both items addressed in 2d7ab3c. Vertical diffusion: your Memory: dropped the read-side file/var cache that was keeping NetCDF4 handles (and their HDF5 chunk/metadata caches) open across commits. Per-task max now matches develop on the local emulator. 72/72 ctests pass. |
There was a problem hiding this comment.
I've retested cleanly (two separate experiments, the only difference is soca branch), and the latest fix fixed both the issue with reading vertical diffusion parameters, and with memory. Thank you @travissluka!
Var:
this branch:
Runtime: 132.91 sec, Memory: total: 405.00 GB, per task: min = 800.67 MB, max = 3408.76 MB
develop (ran after this branch):
Runtime: 111.65 sec, Memory: total: 418.95 GB, per task: min = 824.59 MB, max = 4297.21 MB
LETKF:
this branch:
Runtime: 541.38 sec, Memory: total: 880.34 GB, per task: min = 2.91 GB, max = 6.54 GB
develop:
Runtime: 642.86 sec, Memory: total: 899.06 GB, per task: min = 2.97 GB, max = 8.61 GB
Runtimes for the other tasks using soca I/O (bmat tasks, parallel recentering) have reasonably similar runtime and memory use.
DavidNew-NOAA
left a comment
There was a problem hiding this comment.
No objections on my end
|
@travissluka |
It turns out those |
FMS 2025.02 removed the ishift/jshift optional args from mpp_scatter, breaking compilation on spack-stack 2.1. Pre-apply the shift in the indices we pass; behavior unchanged on FMS 2024.02 (spack-stack 1.9.2).
|
Pushed in a547504. No answer changes expected. Note: this code path is dormant on this PR — wired up in the follow-on parallel-ensemble I/O PR — so it unblocks spack-stack 2.1 compile but isn't exercised functionally yet. |
There was a problem hiding this comment.
I ran two identical 0.25-deg multi-node 3DFGAT tests with a January 15th SOCA version and this branch. Residual norms are identical. I do see some improvement which is great, especially with the runtime in the new version:
These are with Spack-stack 1.9
This branch:
OOPS_STATS Run end - Runtime: 255.88 sec, Memory: total: 1140.07 GB, per task: min = 1.51 GB, max = 4.14 GB
Run: Finishing oops::Variational<SOCA, UFO and IODA observations> with status = 0
OOPS Ending 2026-05-20 14:41:34 (UTC+0000)
Jan15th:
OOPS_STATS Run end - Runtime: 358.17 sec, Memory: total: 1292.89 Gb, per task: min = 1.54 Gb, max = 4.51 Gb
Run: Finishing oops::Variational<SOCA, UFO and IODA observations> with status = 0
OOPS Ending 2026-05-20 14:39:44 (UTC+0000)
|
@travissluka I was getting the following error compiling your branch with spack stack 2.1
|
@DavidNew-NOAA Well, that's obnoxious. Does |
|
@travissluka I concur. |
|
Sounds good to me, I'm merging since both I and Doruk approved. If there are remaining issues, let's do follow-up PRs! |
|
Have other centers switched to spack 2 or 2.1? FWIW, I naively googled Welp, nevermind 😄 |
Rebased onto develop after the direct-netCDF soca_io_mod PR (#1242) landed. End-to-end ensemble-parallel I/O on top of the stateless direct-netCDF reader. Opt-in via the new oops StateSet/IncrementSet bulk-readEnsemble/writeEnsemble dispatch path; soca provides the model-level implementations. State/Increment: - State::{readEnsemble,writeEnsemble} and Increment::{readEnsemble, writeEnsemble} (Fields + interface Fortran bindings to match) - writeEnsemble: gather each member to a strided writer PE (soca_io_ensemble_root_pe assignment), then phase-2 per-member netCDF writes happen concurrently across writer PEs - readEnsemble: per-member loop today, honoring single-state read mode (strided vs scatter); parallel-across-members reads via soca_io_readers_commit_ensemble are wired for a follow-up soca_io_mod: - writer staging split into define/gather/write/close phases plus soca_io_writers_commit_ensemble; reader staging mirrored with read/distribute/close and soca_io_readers_commit_ensemble. Reader opens+closes ncid inside stage_read (no global file-handle cache, matching develop's stateless model post-2d7ab3cf) - new public knobs: soca_io_ensemble_write_parallel, soca_io_ensemble_read_scatter, soca_io_single_state_read_scatter, soca_io_async_mpi, soca_io_ensemble_root_pe - mpi_pelist_and_comm helper for the writer/reader PE subsets Geometry: - soca_io_config_from_yaml called from soca_geom_init resolves the parallel/sequential, scatter/strided, async-mpi knobs from geometry.io once; values persist module-level for the run Companion to JCSDA-internal/oops setID fix in StateSet::buildFromConfigs (needed so PseudoModel and HTLM dispatch see correct per-member IDs when StateSet takes the HasReadEnsemble path).
Description
Resolves #1225
Rips out FMS I/O! Replaces SOCA's
register_restart_field/save_restart/restore_statepaths with directnf90_*I/O in a new modulesrc/soca/IO/soca_io_mod.F90. Same on-disk file format. The register-then-commit API mirrors FMS semantics so call sites insoca_geom_mod,soca_fields_mod, andsoca_balance_modare minimally adapted.Why:
Issue(s) addressed
n/a
Dependencies
n/a
Impact
soca-only. Same files on disk; same public Fortran interface in
soca_geom,soca_fields,soca_balance. FMS dependency forregister_restart_field/save_restart/restore_stateis removed;mpp_gather/mpp_scatter/mpp_broadcastare still used.Manual Testing Instructions (optional)
n/a
Checklist