(Closes #3157, #3306) Initial implementation of maximal parallel region trans.#3205
(Closes #3157, #3306) Initial implementation of maximal parallel region trans.#3205LonelyCat124 wants to merge 26 commits intomasterfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3205 +/- ##
========================================
Coverage 99.95% 99.95%
========================================
Files 382 384 +2
Lines 54107 54212 +105
========================================
+ Hits 54085 54190 +105
Misses 22 22 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@MetBenjaminWent Chris said you had some cases worth trying this with as functionality tests? |
|
The examples I've been give by to look at are: |
|
Made a bit more progress with this now - there was definitely some missing logic for bdy_impl3 to work. One thing that is apparent that applying things in this way means we result in barriers that live outside parallel regions, which should be purged by @sergisiso @arporter Am I ok to make that change as part of this PR? |
…y on applying inside if/loop nodes
|
I fixed the previous issue, and this has raised some other challenges with the cases I receieved from MO.
I assume we just need to use |
|
Yeah, I'd just been looking at the PR earlier today after I looked at this - I guess its still a while until we'd have it ready, but if it could handle cases like this it would definitely help (assuming the rest is otherwise ok). |
|
I copied this chunk of code into a minimally compiling module: module example_module
implicit none
type :: Dims
integer :: j_start, j_end, i_start, i_end
end type
contains
subroutine sub(tdims, r_rho_levels, dtrdz_charney_grid, fqw, dqw, dqw_nt, gamma2, blm1, omp_block)
type(Dims), intent(inout) :: tdims
integer, intent(inout) :: r_rho_levels(:,:,:)
integer, intent(inout) :: dtrdz_charney_grid(:,:,:)
integer, intent(inout) :: fqw(:,:,:)
integer, intent(inout) :: dqw(:,:,:)
integer, intent(inout) :: dqw_nt(:,:,:)
integer, intent(inout) :: gamma2(:,:)
integer, intent(inout) :: blm1, omp_block
integer :: jj, k, r_sq, rr_sq, l, i, j
do jj = tdims%j_start, tdims%j_end, omp_block
do k = blm1, 2, -1
l = 0
do j = jj, min(jj+omp_block-1, tdims%j_end)
do i = tdims%i_start, tdims%i_end
r_sq = r_rho_levels(i,j,k)*r_rho_levels(i,j,k)
rr_sq = r_rho_levels(i,j,k+1)*r_rho_levels(i,j,k+1)
dqw(i,j,k) = (-dtrdz_charney_grid(i,j,k) * (rr_sq * fqw(i,j,k + 1) - r_sq * fqw(i,j,k)) + dqw_nt(i,j,k)) * gamma2(i,j)
end do
end do
end do
end do
end subroutine
end moduleThe analysis gives the following output: Interestingly, it does take a few seconds to prove that the |
|
I used There are 2 things left to potentially look at.
|
|
@LonelyCat124 Could we finish this PR without the Assignments as there are a few cases that are just contiguous loops that we could already do? And then do the Assignments in a follow up as they seem complicated and worth disucssing separately. |
@sergisiso I'm slightly hesitant since some of the examples I was given by MO do need these (and in fact create wrong code without it) if it were to be applied. I think at least I'd need/want to add a check to see if any I had started implemeting some logic now for Assignment but the next accesses check is somewhat complicated. Edit Though: I think we're also limited without Matthew's #3213 anyway so maybe the assignment checks aren't so urgent. Edit 2: This was my current plan: But the next access check needs to happen in apply when applying the parallel transformation and becomes a bit of a mess I think (since we may end up having to split the parallel regions again at this point). |
@LonelyCat124 I need some context about this issue, can you paste and example here |
|
@LonelyCat124 I thought without assignments it would be encapsulating in a transformation what your already implemented in |
Yeah it would probably be similar, but I was testing it with some code (https://code.metoffice.gov.uk/trac/lfric_apps/browser/main/trunk/science/physics_schemes/source/boundary_layer/bdy_impl3.F90) and we do a lot worse than the manual implementation. I think I'm happy to just go with this simple version initially though and then improve it, probably makes reviewing more straight forward. I'll write some tests for the current functionality and add some stuff to the docstrings. |
@sergisiso An example of one we'd do wrong right now would be any code where have |
|
These last 2 would also be an issue without this transformation if the do the typical |
|
@sergisiso This is ready for another look when you're back off leave. |
sergisiso
left a comment
There was a problem hiding this comment.
@LonelyCat124 Thanks for renaming and adding the Profiling example, I made a few more comments for the implementation, but it is getting closer for the basic functionality.
| end subroutine x""" | ||
| psyir = fortran_reader.psyir_from_source(code) | ||
| MaximalOMPParallelRegionTrans().apply(psyir.children[0].children[:]) | ||
| assert len(psyir.walk(OMPParallelDirective)) == 0 |
There was a problem hiding this comment.
Also add some tests where the transformation adds some directives, and with nested ifs/loops as this are relevant for the transformation.
There was a problem hiding this comment.
Oh and the block of only barriers since you mention it in a comment.
|
@sergisiso This should be ready for another look now. Edit: Missing some new coverage, will wait. |
|
@sergisiso Coverage fixed, ready for review again now |
sergisiso
left a comment
There was a problem hiding this comment.
It is almost there now @LonelyCat124 , just some more minor things to clean up.
| class MaximalProfilingTrans(MaximalRegionTrans): | ||
| '''Applies Profiling to the largest possible region. |
There was a problem hiding this comment.
Maybe call it:
class MaximalProfilingOutsideDirectives(MaximalRegionTrans):
''' Applies profiling hooks to the largest possible region outside directives regions '''
to make clear what are we going for here.
| _allowed_contiguous_nodes = (Assignment, Call, CodeBlock) | ||
| _transformation = ProfileTrans | ||
|
|
||
| def __init__(self, routine_name: str = ""): |
There was a problem hiding this comment.
I know this may not be possible if we bring this into its own file, but for now the routine_name doesn't feel quite right as an argument, I would prefer the name check to be outside, where it is called:
if routine_name not in PROFILING_IGNORE:
MaximalProfilingTrans().apply(children)
|
|
||
| :param routine_name: The name of the Routine being profiled. | ||
| ''' | ||
| _allowed_contiguous_nodes = (Assignment, Call, CodeBlock) |
There was a problem hiding this comment.
Maybe add a comment saying: "We purposely don't encompase Directive, or Return statements (which would create unclosed hooks)."
Also another picky name change but "_allowed_contiguous_statements" may be better (as we don't really validate other kinds inner nodes I think)
| '''Abstract transformation containing the functionality to add | ||
| the largest allowed transformation to the provided code segment. |
There was a problem hiding this comment.
"Abstract transformation containing the functionality to apply another transformation to the largest code segment possible while satisfying its validation and any additionally provided constraint." ?
| The _transformation should be a transformation class to apply to | ||
| the computed set of regions. | ||
|
|
||
| The _allowed_contiguous_nodes is a tuple of Node classes that are allowed | ||
| as statements in the transformed region. Note that upon finding a Loop or | ||
| IfBlock, the node's children will be checked to determine whether its safe | ||
| to contain the Loop or IfBlock in the transformed section.''' |
There was a problem hiding this comment.
I think this is now sufficinenlty documented with the "#:", I would remove it from here.
| the largest allowed transformation to the provided code segment. | ||
|
|
||
| Subclasses should override the _transformation and | ||
| _allowed_contiguous_nodes members to control the functionality. |
| super().__init__() | ||
| self._routine_name = routine_name | ||
|
|
||
| def _satisfies_minimum_region_rules(self, region: list[Node]) -> bool: |
There was a problem hiding this comment.
There are still some differences from the previous implementation. The most common pattern is that something that before looked like:
if (TRIM(cdrw) == 'WRITE') then
CALL profile_psy_data_8 % PreStart("zdfosm", "osm_rst-r8", 0, 0)
if (lwp) then
! PSyclone CodeBlock (unsupported code) reason:
! - Unsupported statement: Write_Stmt
WRITE(numout, *) '---- osm-rst ----'
end if
call iom_rstput(kt, nitrst, numrow, 'wn', ww)
call iom_rstput(kt, nitrst, numrow, 'hbl', hbl)
call iom_rstput(kt, nitrst, numrow, 'dh', dh)
if (ln_osm_mle) then
call iom_rstput(kt, nitrst, numrow, 'hmle', hmle)
end if
CALL profile_psy_data_8 % PostEnd
return
end if
Now does not have any profiling hook inserted. Do you know why not?
No description provided.