Skip to content

Conversation

@danieljvickers
Copy link
Member

@danieljvickers danieljvickers commented Dec 10, 2025

User description

User description

Description

I became aware that post_process does not compile with the --no-mpi flag right now. Even through we don't use post process without parallel IO, I still wanted to resolve the compilation so that it doesn't throw bugs or confuse anyone.

I also noticed that I missed about 6 macro calls from the old macros, so I removed them and cleaned up the code a little.

Type of change

Please delete options that are not relevant.

  • Code Cleanup

Scope

  • This PR comprises a set of related changes with a common goal

If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.

How Has This Been Tested?

It runs the test suite locally. These changes should not be breaking.

Test Configuration:

  • What computers and compilers did you use to test this:

Checklist

  • I have added comments for the new code
  • I added Doxygen docstrings to the new code
  • I have made corresponding changes to the documentation (docs/)
  • I have added regression tests to the test suite so that people can verify in the future that the feature is behaving as expected
  • I have added example cases in examples/ that demonstrate my new feature performing as expected.
    They run to completion and demonstrate "interesting physics"
  • I ran ./mfc.sh format before committing my code
  • New and existing tests pass locally with my changes, including with GPU capability enabled (both NVIDIA hardware with NVHPC compilers and AMD hardware with CRAY compilers) and disabled
  • This PR does not introduce any repeated code (it follows the DRY principle)
  • I cannot think of a way to condense this code and reduce any introduced additional line count

If your code changes any code source files (anything in src/simulation)

To make sure the code is performing as expected on GPU devices, I have:


PR Type

Bug fix, Enhancement


Description

  • Removed deprecated _OLD macro variants from GPU parallelization macros

  • Fixed post-processing compilation with --no-mpi flag by adding conditional MPI guards

  • Replaced all GPU_PARALLEL_LOOP_OLD calls with GPU_PARALLEL_LOOP in MPI communication code

  • Improved code consistency and removed redundant macro definitions


Diagram Walkthrough

flowchart LR
  A["Deprecated _OLD Macros"] -->|Remove| B["Macro Definition Files"]
  C["MPI-dependent Code"] -->|Add Conditional Guards| D["Post-processing Module"]
  E["Old Macro Calls"] -->|Replace with New| F["MPI Communication Code"]
  B --> G["Cleaner Codebase"]
  D --> G
  F --> G
Loading

File Walkthrough

Relevant files
Code cleanup
acc_macros.fpp
Remove deprecated ACC_PARALLEL_LOOP_OLD macro                       

src/common/include/acc_macros.fpp

  • Removed deprecated ACC_PARALLEL_LOOP_OLD macro definition
  • Kept ACC_PARALLEL_LOOP macro intact for current usage
  • Eliminated 32 lines of redundant macro code
+0/-32   
omp_macros.fpp
Remove deprecated OMP_PARALLEL_LOOP_OLD macro                       

src/common/include/omp_macros.fpp

  • Removed deprecated OMP_PARALLEL_LOOP_OLD macro definition
  • Kept OMP_PARALLEL_LOOP macro intact for current usage
  • Eliminated 47 lines of redundant macro code with compiler-specific
    logic
+0/-47   
parallel_macros.fpp
Remove deprecated GPU_PARALLEL_LOOP_OLD macro                       

src/common/include/parallel_macros.fpp

  • Removed deprecated GPU_PARALLEL_LOOP_OLD macro definition
  • Kept GPU_PARALLEL_LOOP macro intact for current usage
  • Eliminated 23 lines of redundant wrapper macro code
+0/-17   
Enhancement
m_mpi_common.fpp
Replace deprecated GPU_PARALLEL_LOOP_OLD calls                     

src/common/m_mpi_common.fpp

  • Replaced all 6 instances of GPU_PARALLEL_LOOP_OLD calls with
    GPU_PARALLEL_LOOP
  • Updated macro invocation syntax from #:call to $: notation
  • Fixed indentation and formatting for improved code readability
  • Added $:END_GPU_PARALLEL_LOOP() closing statements
+119/-119
Bug fix
m_start_up.fpp
Add MPI conditional compilation guards                                     

src/post_process/m_start_up.fpp

  • Added #ifdef MFC_MPI guards around MPI-dependent code sections
  • Protected MPI_ALLREDUCE call in energy FFT computation
  • Protected MPI transpose subroutines s_mpi_transpose_x2y and
    s_mpi_transpose_y2z
  • Protected MPI initialization and FFT setup code in
    s_initialize_modules and s_initialize_mpi_domain
  • Protected MPI communicator cleanup in s_finalize_modules
  • Enabled post-processing compilation with --no-mpi flag
+24/-0   


CodeAnt-AI Description

Allow post-processing to compile and run without MPI and remove legacy parallel macros

What Changed

  • Post-processing no longer calls MPI routines when built without MPI support; MPI calls and communicator handling are enclosed so serial builds compile and run.
  • Replaced legacy GPU_PARALLEL_LOOP_OLD macro invocations with the unified GPU_PARALLEL_LOOP macros across communication/unpacking code so generated parallel regions are consistent.
  • Removed old macro definitions and added matching macro end markers so code generation no longer references deprecated macros.

Impact

✅ Builds post-processing in non-MPI (serial) configurations
✅ Fewer compilation failures caused by leftover legacy macros
✅ Consistent generated parallel regions during communication unpacking

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

Summary by CodeRabbit

Release Notes

  • Refactor
    • Removed deprecated parallel loop macros from the codebase.
    • Updated GPU parallel loop syntax to a new standard across all parallel regions.
    • Reorganized MPI-specific code paths to improve distributed execution support and code clarity.

✏️ Tip: You can customize this high-level summary in your review settings.

@codeant-ai
Copy link

codeant-ai bot commented Dec 10, 2025

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 10, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

This PR removes deprecated parallel loop macros (ACC_PARALLEL_LOOP_OLD, OMP_PARALLEL_LOOP_OLD, GPU_PARALLEL_LOOP_OLD) from macro definition files and updates all their usages in MPI common code to the new macro syntax. Additionally, it adds MPI-specific conditional compilation guards to startup initialization routines.

Changes

Cohort / File(s) Summary
Macro definition removals
src/common/include/acc_macros.fpp, src/common/include/omp_macros.fpp, src/common/include/parallel_macros.fpp
Deleted legacy *_LOOP_OLD macro definitions (ACC_PARALLEL_LOOP_OLD, OMP_PARALLEL_LOOP_OLD, GPU_PARALLEL_LOOP_OLD), removing associated clause assembly and directive generation blocks.
MPI macro usage updates
src/common/m_mpi_common.fpp
Replaced all invocations of GPU_PARALLEL_LOOP_OLD with the new GPU_PARALLEL_LOOP(...) / $:END_GPU_PARALLEL_LOOP() syntax across multiple MPI pack/unpack blocks, preserving collapse directives and loop bodies.
MPI conditional initialization
src/post_process/m_start_up.fpp
Added #ifdef MFC_MPI guards to enable MPI-specific code paths for buffer allocation, FFT planning, domain decomposition initialization, and communicator setup.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Specific areas requiring attention:
    • Verify all GPU_PARALLEL_LOOP_OLDGPU_PARALLEL_LOOP(...) / $:END_GPU_PARALLEL_LOOP() replacements in m_mpi_common.fpp are complete and syntactically correct across all MPI paths (qbmm_comm, mv_in, pb_in blocks and all mpi_dir branches)
    • Confirm that collapse directives and private variable declarations are preserved correctly in the new macro syntax
    • Validate MPI conditional guards in m_start_up.fpp correctly wrap FFT and domain initialization logic without breaking non-MPI builds

Possibly related PRs

  • Add line numbering to gpu loops #1029 — Introduced the new start/end GPU/OpenMP/OpenACC macro split and initially preserved the old call-style macros with _OLD suffixes, which this PR now removes entirely.

Suggested labels

Review effort 4/5, refactor, size:XXL

Poem

🐰 The old macros fade like morning dew,
GPU loops now wear a fresher hue,
MPI paths bloom where guards now stand,
New syntax blooms across the land!
Legacy loops make way with grace,
A cleaner parallel computing space.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main changes: code cleanup and fixing post-processing compilation with the --no-mpi flag.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed The PR description provides clear context about the changes: removing deprecated macros and fixing post-process compilation with --no-mpi flag. The template structure is followed with description, type of change, scope, and testing information.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@qodo-code-review
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

The removal of compiler-conditional mappings in the old OMP loop macro may change performance/behavior for different compilers; confirm the simplified '!$omp target teams' directive still matches required parallelism for NVIDIA/CCE/AMD toolchains.

    #:set omp_clause_val = default_val.strip('\n') + private_val.strip('\n') + reduction_val.strip('\n') + &
        & copy_val.strip('\n') + copyin_val.strip('\n') + &
        & copyout_val.strip('\n') + create_val.strip('\n') + &
        & no_create_val.strip('\n') + present_val.strip('\n') + &
        & deviceptr_val.strip('\n') + attach_val.strip('\n')

    #:set omp_clause_val = omp_clause_val.strip('\n')
    #:set omp_directive = '!$omp target teams ' + omp_clause_val + extraOmpArgs_val.strip('\n')

    #:set omp_end_directive = '!$omp end target teams'
    $:omp_directive
    $:code
    $:omp_end_directive
#:enddef

#:def OMP_PARALLEL_LOOP(collapse=None, private=None, parallelism='[gang, vector]', &
    & default='present', firstprivate=None, reduction=None, reductionOp=None, &
    & copy=None, copyin=None, copyinReadOnly=None, copyout=None, create=None, &
    & no_create=None, present=None, deviceptr=None, attach=None, extraOmpArgs=None)
MPI Guards Completeness

New MPI guards were added, but ensure all MPI-dependent variables (e.g., ranks, communicator sizes, fft_wrt path) and routines used inside guarded sections are not referenced elsewhere when MFC_MPI is undefined, to avoid undefined symbol or logic discrepancies in no-MPI builds.

        call s_initialize_variables_conversion_module()
        call s_initialize_data_input_module()
        call s_initialize_derived_variables_module()
        call s_initialize_data_output_module()

        ! Associate pointers for serial or parallel I/O
        if (parallel_io .neqv. .true.) then
            s_read_data_files => s_read_serial_data_files
        else
            s_read_data_files => s_read_parallel_data_files
        end if

#ifdef MFC_MPI
        if (fft_wrt) then

            num_procs_x = (m_glb + 1)/(m + 1)
            num_procs_y = (n_glb + 1)/(n + 1)
            num_procs_z = (p_glb + 1)/(p + 1)

            Nx = m_glb + 1
            Ny = n_glb + 1
            Nz = p_glb + 1

            Nxloc = (m_glb + 1)/num_procs_y
            Nyloc = n + 1
            Nyloc2 = (n_glb + 1)/num_procs_z
            Nzloc = p + 1

            Nf = max(Nx, Ny, Nz)

            @:ALLOCATE(data_in(Nx*Nyloc*Nzloc))
            @:ALLOCATE(data_out(Nx*Nyloc*Nzloc))

            @:ALLOCATE(data_cmplx(Nx, Nyloc, Nzloc))
            @:ALLOCATE(data_cmplx_y(Nxloc, Ny, Nzloc))
            @:ALLOCATE(data_cmplx_z(Nxloc, Nyloc2, Nz))

            @:ALLOCATE(En_real(Nxloc, Nyloc2, Nz))
            @:ALLOCATE(En(Nf))

            size_n(1) = Nx
            inembed(1) = Nx
            onembed(1) = Nx

            fwd_plan_x = fftw_plan_many_dft(1, size_n, Nyloc*Nzloc, &
                                            data_in, inembed, 1, Nx, &
                                            data_out, onembed, 1, Nx, &
                                            FFTW_FORWARD, FFTW_MEASURE)

            size_n(1) = Ny
            inembed(1) = Ny
            onembed(1) = Ny

            fwd_plan_y = fftw_plan_many_dft(1, size_n, Nxloc*Nzloc, &
                                            data_out, inembed, 1, Ny, &
                                            data_in, onembed, 1, Ny, &
                                            FFTW_FORWARD, FFTW_MEASURE)

            size_n(1) = Nz
            inembed(1) = Nz
            onembed(1) = Nz

            fwd_plan_z = fftw_plan_many_dft(1, size_n, Nxloc*Nyloc2, &
                                            data_in, inembed, 1, Nz, &
                                            data_out, onembed, 1, Nz, &
                                            FFTW_FORWARD, FFTW_MEASURE)

            call MPI_CART_CREATE(MPI_COMM_WORLD, 3, (/num_procs_x, &
                                                      num_procs_y, num_procs_z/), &
                                 (/.true., .true., .true./), &
                                 .false., MPI_COMM_CART, ierr)
            call MPI_CART_COORDS(MPI_COMM_CART, proc_rank, 3, &
                                 cart3d_coords, ierr)

            call MPI_Cart_SUB(MPI_COMM_CART, (/.true., .true., .false./), MPI_COMM_CART12, ierr)
            call MPI_COMM_RANK(MPI_COMM_CART12, proc_rank12, ierr)
            call MPI_CART_COORDS(MPI_COMM_CART12, proc_rank12, 2, cart2d12_coords, ierr)

            call MPI_Cart_SUB(MPI_COMM_CART, (/.true., .false., .true./), MPI_COMM_CART13, ierr)
            call MPI_COMM_RANK(MPI_COMM_CART13, proc_rank13, ierr)
            call MPI_CART_COORDS(MPI_COMM_CART13, proc_rank13, 2, cart2d13_coords, ierr)

        end if
#endif
    end subroutine s_initialize_modules

    subroutine s_mpi_FFT_fwd

        integer :: j, k, l

#ifdef MFC_MPI

        do l = 1, Nzloc
            do k = 1, Nyloc
                do j = 1, Nx
                    data_in(j + (k - 1)*Nx + (l - 1)*Nx*Nyloc) = data_cmplx(j, k, l)
                end do
            end do
        end do

        call fftw_execute_dft(fwd_plan_x, data_in, data_out)

        do l = 1, Nzloc
            do k = 1, Nyloc
                do j = 1, Nx
                    data_cmplx(j, k, l) = data_out(j + (k - 1)*Nx + (l - 1)*Nx*Nyloc)
                end do
            end do
        end do

        call s_mpi_transpose_x2y !!Change Pencil from data_cmplx to data_cmpx_y

        do l = 1, Nzloc
            do k = 1, Nxloc
                do j = 1, Ny
                    data_out(j + (k - 1)*Ny + (l - 1)*Ny*Nxloc) = data_cmplx_y(k, j, l)
                end do
            end do
        end do

        call fftw_execute_dft(fwd_plan_y, data_out, data_in)

        do l = 1, Nzloc
            do k = 1, Nxloc
                do j = 1, Ny
                    data_cmplx_y(k, j, l) = data_in(j + (k - 1)*Ny + (l - 1)*Ny*Nxloc)
                end do
            end do
        end do

        call s_mpi_transpose_y2z !!Change Pencil from data_cmplx_y to data_cmpx_z

        do l = 1, Nyloc2
            do k = 1, Nxloc
                do j = 1, Nz
                    data_in(j + (k - 1)*Nz + (l - 1)*Nz*Nxloc) = data_cmplx_z(k, l, j)
                end do
            end do
        end do

        call fftw_execute_dft(fwd_plan_z, data_in, data_out)

        do l = 1, Nyloc2
            do k = 1, Nxloc
                do j = 1, Nz
                    data_cmplx_z(k, l, j) = data_out(j + (k - 1)*Nz + (l - 1)*Nz*Nxloc)
                end do
            end do
        end do

#endif

    end subroutine s_mpi_FFT_fwd

    impure subroutine s_initialize_mpi_domain

        num_dims = 1 + min(1, n) + min(1, p)

#ifdef MFC_MPI
        ! Initialization of the MPI environment
        call s_mpi_initialize()

        ! Processor with rank 0 assigns default user input values prior to reading
        ! those in from the input file. Next, the user inputs are read in and their
        ! consistency is checked. The detection of any inconsistencies automatically
        ! leads to the termination of the post-process.
        if (proc_rank == 0) then
            call s_assign_default_values_to_user_inputs()
            call s_read_input_file()
            call s_check_input_file()

            print '(" Post-processing a ", I0, "x", I0, "x", I0, " case on ", I0, " rank(s)")', m, n, p, num_procs
        end if

        ! Broadcasting the user inputs to all of the processors and performing the
        ! parallel computational domain decomposition. Neither procedure has to be
        ! carried out if the simulation is in fact not truly executed in parallel.
        call s_mpi_bcast_user_inputs()
        call s_initialize_parallel_io()
        call s_mpi_decompose_computational_domain()
        call s_check_inputs_fft()

#endif

    end subroutine s_initialize_mpi_domain

    impure subroutine s_finalize_modules
        ! Disassociate pointers for serial and parallel I/O
        s_read_data_files => null()

!        if (sim_data .and. proc_rank == 0) then
!            call s_close_intf_data_file()
!            call s_close_energy_data_file()
!        end if

        if (fft_wrt) then
            if (c_associated(fwd_plan_x)) call fftw_destroy_plan(fwd_plan_x)
            if (c_associated(fwd_plan_y)) call fftw_destroy_plan(fwd_plan_y)
            if (c_associated(fwd_plan_z)) call fftw_destroy_plan(fwd_plan_z)
            if (allocated(data_in)) deallocate (data_in)
            if (allocated(data_out)) deallocate (data_out)
            if (allocated(data_cmplx)) deallocate (data_cmplx)
            if (allocated(data_cmplx_y)) deallocate (data_cmplx_y)
            if (allocated(data_cmplx_z)) deallocate (data_cmplx_z)
            if (allocated(En_real)) deallocate (En_real)
            if (allocated(En)) deallocate (En)
            call fftw_cleanup()
        end if

#ifdef MFC_MPI
        if (fft_wrt) then
            if (MPI_COMM_CART12 /= MPI_COMM_NULL) call MPI_Comm_free(MPI_COMM_CART12, ierr)
            if (MPI_COMM_CART13 /= MPI_COMM_NULL) call MPI_Comm_free(MPI_COMM_CART13, ierr)
            if (MPI_COMM_CART /= MPI_COMM_NULL) call MPI_Comm_free(MPI_COMM_CART, ierr)
        end if
#endif
GPU Loop Macro Migration

The transition from '#:call ..._OLD' to '$:GPU_PARALLEL_LOOP' plus '$:END_GPU_PARALLEL_LOOP()' must match the macro interface; verify nesting and private variable 'r' scoping and that END macro expands correctly for both OpenACC and OpenMP backends.

                    $:GPU_PARALLEL_LOOP(collapse=4,private='[r]')
                    do l = 0, p
                        do k = 0, n
                            do j = -buff_size, -1
                                do i = 1, nVar
                                    r = (i - 1) + v_size* &
                                        (j + buff_size*((k + 1) + (n + 1)*l))
                                    q_comm(i)%sf(j + unpack_offset, k, l) = real(buff_recv(r), kind=stp)
#if defined(__INTEL_COMPILER)
                                    if (ieee_is_nan(q_comm(i)%sf(j, k, l))) then
                                        print *, "Error", j, k, l, i
                                        error stop "NaN(s) in recv"
                                    end if
#endif
                                end do
                            end do
                        end do
                    end do
                    $:END_GPU_PARALLEL_LOOP()

                    if (qbmm_comm) then
                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
                        do l = 0, p
                            do k = 0, n
                                do j = -buff_size, -1
                                    do i = nVar + 1, nVar + 4
                                        do q = 1, nb
                                            r = (i - 1) + (q - 1)*4 + v_size* &
                                                (j + buff_size*((k + 1) + (n + 1)*l))
                                            pb_in(j + unpack_offset, k, l, i - nVar, q) = real(buff_recv(r), kind=stp)
                                        end do
                                    end do
                                end do
                            end do
                        end do
                        $:END_GPU_PARALLEL_LOOP()

                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
                        do l = 0, p
                            do k = 0, n
                                do j = -buff_size, -1
                                    do i = nVar + 1, nVar + 4
                                        do q = 1, nb
                                            r = (i - 1) + (q - 1)*4 + nb*4 + v_size* &
                                                (j + buff_size*((k + 1) + (n + 1)*l))
                                            mv_in(j + unpack_offset, k, l, i - nVar, q) = real(buff_recv(r), kind=stp)
                                        end do
                                    end do
                                end do
                            end do
                        end do
                        $:END_GPU_PARALLEL_LOOP()
                    end if
                #:elif mpi_dir == 2
                    $:GPU_PARALLEL_LOOP(collapse=4,private='[r]')
                    do i = 1, nVar
                        do l = 0, p
                            do k = -buff_size, -1
                                do j = -buff_size, m + buff_size
                                    r = (i - 1) + v_size* &
                                        ((j + buff_size) + (m + 2*buff_size + 1)* &
                                         ((k + buff_size) + buff_size*l))
                                    q_comm(i)%sf(j, k + unpack_offset, l) = real(buff_recv(r), kind=stp)
#if defined(__INTEL_COMPILER)
                                    if (ieee_is_nan(q_comm(i)%sf(j, k, l))) then
                                        print *, "Error", j, k, l, i
                                        error stop "NaN(s) in recv"
                                    end if
#endif
                                end do
                            end do
                        end do
                    end do
                    $:END_GPU_PARALLEL_LOOP()

                    if (qbmm_comm) then
                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
                        do i = nVar + 1, nVar + 4
                            do l = 0, p
                                do k = -buff_size, -1
                                    do j = -buff_size, m + buff_size
                                        do q = 1, nb
                                            r = (i - 1) + (q - 1)*4 + v_size* &
                                                ((j + buff_size) + (m + 2*buff_size + 1)* &
                                                 ((k + buff_size) + buff_size*l))
                                            pb_in(j, k + unpack_offset, l, i - nVar, q) = real(buff_recv(r), kind=stp)
                                        end do
                                    end do
                                end do
                            end do
                        end do
                        $:END_GPU_PARALLEL_LOOP()

                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
                        do i = nVar + 1, nVar + 4
                            do l = 0, p
                                do k = -buff_size, -1
                                    do j = -buff_size, m + buff_size
                                        do q = 1, nb
                                            r = (i - 1) + (q - 1)*4 + nb*4 + v_size* &
                                                ((j + buff_size) + (m + 2*buff_size + 1)* &
                                                 ((k + buff_size) + buff_size*l))
                                            mv_in(j, k + unpack_offset, l, i - nVar, q) = real(buff_recv(r), kind=stp)
                                        end do
                                    end do
                                end do
                            end do
                        end do
                        $:END_GPU_PARALLEL_LOOP()
                    end if
                #:else
                    ! Unpacking buffer from bc_z%beg
                    $:GPU_PARALLEL_LOOP(collapse=4,private='[r]')
                    do i = 1, nVar
                        do l = -buff_size, -1
                            do k = -buff_size, n + buff_size
                                do j = -buff_size, m + buff_size
                                    r = (i - 1) + v_size* &
                                        ((j + buff_size) + (m + 2*buff_size + 1)* &
                                         ((k + buff_size) + (n + 2*buff_size + 1)* &
                                          (l + buff_size)))
                                    q_comm(i)%sf(j, k, l + unpack_offset) = real(buff_recv(r), kind=stp)
#if defined(__INTEL_COMPILER)
                                    if (ieee_is_nan(q_comm(i)%sf(j, k, l))) then
                                        print *, "Error", j, k, l, i
                                        error stop "NaN(s) in recv"
                                    end if
#endif
                                end do
                            end do
                        end do
                    end do
                    $:END_GPU_PARALLEL_LOOP()

                    if (qbmm_comm) then
                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
                        do i = nVar + 1, nVar + 4
                            do l = -buff_size, -1
                                do k = -buff_size, n + buff_size
                                    do j = -buff_size, m + buff_size
                                        do q = 1, nb
                                            r = (i - 1) + (q - 1)*4 + v_size* &
                                                ((j + buff_size) + (m + 2*buff_size + 1)* &
                                                 ((k + buff_size) + (n + 2*buff_size + 1)* &
                                                  (l + buff_size)))
                                            pb_in(j, k, l + unpack_offset, i - nVar, q) = real(buff_recv(r), kind=stp)
                                        end do
                                    end do
                                end do
                            end do
                        end do
                        $:END_GPU_PARALLEL_LOOP()

                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
                        do i = nVar + 1, nVar + 4
                            do l = -buff_size, -1
                                do k = -buff_size, n + buff_size
                                    do j = -buff_size, m + buff_size
                                        do q = 1, nb
                                            r = (i - 1) + (q - 1)*4 + nb*4 + v_size* &
                                                ((j + buff_size) + (m + 2*buff_size + 1)* &
                                                 ((k + buff_size) + (n + 2*buff_size + 1)* &
                                                  (l + buff_size)))
                                            mv_in(j, k, l + unpack_offset, i - nVar, q) = real(buff_recv(r), kind=stp)
                                        end do
                                    end do
                                end do
                            end do
                        end do
                        $:END_GPU_PARALLEL_LOOP()
                    end if

@codeant-ai codeant-ai bot added the size:L This PR changes 100-499 lines, ignoring generated files label Dec 10, 2025
Comment on lines +968 to 974
q_comm(i)%sf(j + unpack_offset, k, l) = real(buff_recv(r), kind=stp)
#if defined(__INTEL_COMPILER)
if (ieee_is_nan(q_comm(i)%sf(j, k, l))) then
print *, "Error", j, k, l, i
error stop "NaN(s) in recv"
end if
if (ieee_is_nan(q_comm(i)%sf(j, k, l))) then
print *, "Error", j, k, l, i
error stop "NaN(s) in recv"
end if
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Fix the NaN check to use the correct array index with unpack_offset to validate the newly assigned value. [possible issue, importance: 8]

Suggested change
q_comm(i)%sf(j + unpack_offset, k, l) = real(buff_recv(r), kind=stp)
#if defined(__INTEL_COMPILER)
if (ieee_is_nan(q_comm(i)%sf(j, k, l))) then
print *, "Error", j, k, l, i
error stop "NaN(s) in recv"
end if
if (ieee_is_nan(q_comm(i)%sf(j, k, l))) then
print *, "Error", j, k, l, i
error stop "NaN(s) in recv"
end if
#endif
q_comm(i)%sf(j + unpack_offset, k, l) = real(buff_recv(r), kind=stp)
#if defined(__INTEL_COMPILER)
if (ieee_is_nan(q_comm(i)%sf(j + unpack_offset, k, l))) then
print *, "Error", j, k, l, i
error stop "NaN(s) in recv"
end if
#endif

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 5 files

print *, "Error", j, k, l, i
error stop "NaN(s) in recv"
end if
if (ieee_is_nan(q_comm(i)%sf(j, k, l))) then
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Logic error: the NaN check tests a different array element than the one assigned (it omits unpack_offset), so a NaN written into the target element can go undetected and the check may read an unrelated location (possibly out of bounds); make the check use the same index j + unpack_offset. [logic error]

Severity Level: Minor ⚠️

Suggested change
if (ieee_is_nan(q_comm(i)%sf(j, k, l))) then
if (ieee_is_nan(q_comm(i)%sf(j + unpack_offset, k, l))) then
Why it matters? ⭐

The check currently reads a different location (j,k,l) than the element just written (j+unpack_offset,k,l), so it can miss NaNs or read unrelated memory. Changing the nan-test to inspect q_comm(i)%sf(j+unpack_offset,k,l) fixes a real logic bug and validates the value just written.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** src/common/m_mpi_common.fpp
**Line:** 970:970
**Comment:**
	*Logic Error: Logic error: the NaN check tests a different array element than the one assigned (it omits `unpack_offset`), so a NaN written into the target element can go undetected and the check may read an unrelated location (possibly out of bounds); make the check use the same index `j + unpack_offset`.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

print *, "Error", j, k, l, i
error stop "NaN(s) in recv"
end if
if (ieee_is_nan(q_comm(i)%sf(j, k, l))) then
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Logic error: after unpacking the buffer in the mpi_dir==2 branch the code assigns into q_comm(i)%sf(j, k + unpack_offset, l) but the NaN check inspects q_comm(i)%sf(j, k, l), so it validates the wrong element; update the check to include k + unpack_offset. [logic error]

Severity Level: Minor ⚠️

Suggested change
if (ieee_is_nan(q_comm(i)%sf(j, k, l))) then
if (ieee_is_nan(q_comm(i)%sf(j, k + unpack_offset, l))) then
Why it matters? ⭐

Same class of bug as above: assignment writes into k+unpack_offset but the NaN check inspects k. Updating the checked index to k+unpack_offset ensures the freshly assigned element is validated and prevents false negatives.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** src/common/m_mpi_common.fpp
**Line:** 1025:1025
**Comment:**
	*Logic Error: Logic error: after unpacking the buffer in the mpi_dir==2 branch the code assigns into `q_comm(i)%sf(j, k + unpack_offset, l)` but the NaN check inspects `q_comm(i)%sf(j, k, l)`, so it validates the wrong element; update the check to include `k + unpack_offset`.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

print *, "Error", j, k, l, i
error stop "NaN(s) in recv"
end if
if (ieee_is_nan(q_comm(i)%sf(j, k, l))) then
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Logic error: in the z-direction unpacking the code writes into q_comm(i)%sf(j, k, l + unpack_offset) but the NaN test reads q_comm(i)%sf(j, k, l), so the validation misses the element that was just written; change the check to use l + unpack_offset. [logic error]

Severity Level: Minor ⚠️

Suggested change
if (ieee_is_nan(q_comm(i)%sf(j, k, l))) then
if (ieee_is_nan(q_comm(i)%sf(j, k, l + unpack_offset))) then
Why it matters? ⭐

The code writes into l+unpack_offset but tests l, so it doesn't validate the written element. Adjusting the NaN check to l+unpack_offset fixes a real correctness issue.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** src/common/m_mpi_common.fpp
**Line:** 1084:1084
**Comment:**
	*Logic Error: Logic error: in the z-direction unpacking the code writes into `q_comm(i)%sf(j, k, l + unpack_offset)` but the NaN test reads `q_comm(i)%sf(j, k, l)`, so the validation misses the element that was just written; change the check to use `l + unpack_offset`.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

@codeant-ai
Copy link

codeant-ai bot commented Dec 10, 2025

CodeAnt AI finished reviewing your PR.

@codeant-ai
Copy link

codeant-ai bot commented Dec 10, 2025

💡 Enhance Your PR Reviews

We noticed that 3 feature(s) are not configured for this repository. Enabling these features can help improve your code quality and workflow:

🚦 Quality Gates

Status: Quality Gates are not enabled at the organization level
Learn more about Quality Gates

🎫 Jira Ticket Compliance

Status: Jira credentials file not found. Please configure Jira integration in your settings
Learn more about Jira Integration

⚙️ Custom Rules

Status: No custom rules configured. Add rules via organization settings or .codeant/review.json in your repository
Learn more about Custom Rules


Want to enable these features? Contact your organization admin or check our documentation for setup instructions.

@codecov
Copy link

codecov bot commented Dec 11, 2025

Codecov Report

❌ Patch coverage is 16.92308% with 54 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.11%. Comparing base (8d9a83b) to head (0f32e58).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/common/m_mpi_common.fpp 18.03% 50 Missing ⚠️
src/post_process/m_start_up.fpp 0.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1083      +/-   ##
==========================================
- Coverage   44.16%   44.11%   -0.06%     
==========================================
  Files          71       71              
  Lines       20197    20250      +53     
  Branches     1970     1970              
==========================================
+ Hits         8921     8933      +12     
- Misses      10148    10189      +41     
  Partials     1128     1128              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@danieljvickers
Copy link
Member Author

@sbryngelson I think that this is ready for merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Review effort 2/5 size:L This PR changes 100-499 lines, ignoring generated files

Development

Successfully merging this pull request may close these issues.

2 participants