Skip to content

Comments

240 add skeleton fab script#246

Open
hiker wants to merge 34 commits intoMetOffice:mainfrom
hiker:240_add_skeleton_fab_script
Open

240 add skeleton fab script#246
hiker wants to merge 34 commits intoMetOffice:mainfrom
hiker:240_add_skeleton_fab_script

Conversation

@hiker
Copy link

@hiker hiker commented Jan 27, 2026

PR Summary

Sci/Tech Reviewer: @MatthewHambley
Code Reviewer: @t00sa

Adds a first Fab build script for Skeleton. To keep this change minimal, it's command line only (i.e. no cylc integration, which can come later).

Code Quality Checklist

  • I have performed a self-review of my own code
  • My code follows the project's style guidelines
  • Comments have been included that aid understanding and enhance the readability of the code
  • My changes generate no new warnings
  • All automated checks in the CI pipeline have completed successfully

Testing

  • I have tested this change locally, using the LFRic Core rose-stem suite
  • If required (e.g. API changes) I have also run the LFRic Apps test suite using this branch
  • If any tests fail (rose-stem or CI) the reason is understood and acceptable (e.g. kgo changes)
  • I have added tests to cover new functionality as appropriate (e.g. system tests, unit tests, etc.)
  • Any new tests have been assigned an appropriate amount of compute resource and have been allocated to an appropriate testing group (i.e. the developer tests are for jobs which use a small amount of compute resource and complete in a matter of minutes)

trac.log

Security Considerations

  • I have reviewed my changes for potential security issues
  • Sensitive data is properly handled (if applicable)
  • Authentication and authorisation are properly implemented (if applicable)

Performance Impact

  • Performance of the code has been considered and, if applicable, suitable performance measurements have been conducted

AI Assistance and Attribution

  • Some of the content of this change has been produced with the assistance of Generative AI tool name (e.g., Met Office Github Copilot Enterprise, Github Copilot Personal, ChatGPT GPT-4, etc) and I have followed the Simulation Systems AI policy (including attribution labels)

Documentation

  • Where appropriate I have updated documentation related to this change and confirmed that it builds correctly

PSyclone Approval

  • If you have edited any PSyclone-related code (e.g. PSyKAl-lite, Kernel interface, optimisation scripts, LFRic data structure code) then please contact the TCD Team

Sci/Tech Review

  • I understand this area of code and the changes being added
  • The proposed changes correspond to the pull request description
  • Documentation is sufficient (do documentation papers need updating)
  • Sufficient testing has been completed

(Please alert the code reviewer via a tag when you have approved the SR)

Code Review

  • All dependencies have been resolved
  • Related Issues have been properly linked and addressed
  • CLA compliance has been confirmed
  • Code quality standards have been met
  • Tests are adequate and have passed
  • Documentation is complete and accurate
  • Security considerations have been addressed
  • Performance impact is acceptable

@hiker hiker marked this pull request as draft January 27, 2026 06:04
@github-actions github-actions bot added the cla-required The CLA has not yet been signed by the author of this PR - added by GA label Jan 27, 2026
@github-actions github-actions bot added cla-signed The CLA has been signed as part of this PR - added by GA and removed cla-required The CLA has not yet been signed by the author of this PR - added by GA labels Jan 27, 2026
@hiker
Copy link
Author

hiker commented Jan 27, 2026

This PR only adds a command-line build system using Fab for the Skeleton apps. It is not at all integrated into cylc or any other test suite. Tests have been added to the infrastructure/build/fab scripts which cover the newly added infrastructure files there.

Some unit tests are based on some AI input to setup the frame work, but have been manually tweaked to ensure code coverage.

Documentation is in form of a README in the above directory, please let me know if you want me to add more elsewhere.

@hiker hiker marked this pull request as ready for review January 27, 2026 07:37
@yaswant yaswant requested review from t00sa and removed request for mike-hobson and stevemullerworth February 5, 2026 09:26
Copy link
Collaborator

@MatthewHambley MatthewHambley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I note that a lot of requests made in the original review on SRS have not been addressed so for convenience I have repeated them here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generic build components should be in lfric_build at the top level, not infrastructure/build.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tools, and nearly all makefiles and PSyclone tools reside in core/infrastructure, while lfric_build is a basically empty directory (containing an unused scripts and tests for it). Wouldn't it be easier for all developers to maintain that mapping (which also keeps the root directory one directory cleaner)?

Is there a document outlining these 'should be' statements? I get caught up frequently by something that is envisioned to work differently, which causes me a lot of wasted time. And I would also like to understand the reasoning behind some of these rules.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build tooling is where it is due to it all being in Gung-ho in the early days. When the infrastructure was calved off, the build tooling went with it because it was not Gung-ho specific. It is also not infrastructure specific but has not been moved due to resourcing constraints.

Now we are moving to a new build system, we can put the generic build tooling where it belongs, at the top level. This process has been started by creating a directory for it and adding a module for functionality we knew we wanted.

There is no document because the restructuring of the core repository is still under discussion. We do know, however, that we want the generic build support at the top level. I include the most recent diagram of our thinking regarding the rest of the layout. This is liable to change in detail, but the broad brush strokes are correct.

Image

Given that the only change required is to move the files, very little effort has been wasted. The mechanism through which to we find out about these plans is through review.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the context really helped, appreciated!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generic build components should be in lfric_build at the top level, not infrastructure/build.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question: why. Templaterator, ... are all what I would call generic build components, and the are under infrastructure/build. Why make it more confusing for the user?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but they are in the wrong place too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I didn't know that this kind of refactoring was the goal. If all these should should be moved into the build directory, it makes sense. I will do this as the final commit before resubmitting this.


:returns List[str]: list of all supported compiler profiles.
'''
return ["full-debug", "fast-debug", "production", "unit-tests"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a general problem with there being nothing magical or required about "full debug" or any of the other "profiles." However, specifically there are also integration tests which don't seem to have been covered.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean here. It is the main idea that there is nothing magical, allowing sites to add their own configs (e.g. memory-leack-check, performance-check).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is certainly a point, but also unit and integration tests don't have profiles. They always use the same set of arguments.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but still, tests should have some well defined flags. I see two options:

  1. They use the same flags used in the actual build. Then we don't need a separate set of test-options
  2. There are dedicated options for testing, which might be different from the main executable.

In general, I would suggest 1. (i.e. testing will be compiled in the same fab-repository), and this would make sure that the right code is tested. But if tests depend on compiler optimisation (I never looked into details), then we would need a dedicated mode for tests, to consistently test the right thing.

Comment on lines +97 to +107
def setup_cray(self, build_config: BuildConfig) -> None:
'''
This method sets up the Cray compiler and linker flags.
For now call an external function, since it is expected that
this configuration can be very lengthy (once we support
compiler modes).

:param build_config: the Fab build configuration instance
:type build_config: :py:class:`fab.BuildConfig`
'''
setup_cray(build_config, self.args)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a function called setup_cray which calls a function called setup_cray can only ever cause confusion. That goes for all the others. Why does this facade exist?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few reasons, one being pep8 support. If we copied all these files into one config file, it would grow very big (close to 1000 lines, which is afaik a pep8 limit). Even if it's under, having big files make it much harder to manage scripts. For example, if you search for a ifort command line option, you might end up in the ifx section of the script - which you might not be able to see on the screen.

It also makes it easier to setup a site-configuration (it makes it very obvious which files you need if you want to start from scratch). And it kind of simulate current makefile system as well, where you have one config/make file per compiler suite.

One option we could use would be to rewrite them as mixin. I didn't to it this way since that would potentially create havoc if a user adds other (existing) methods to these scripts as mixin, then python's resolution order would apply, and ... chaotic things would happen.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The presence of these confusingly named duplicate functions is a smell. It suggests a problem with the design. Maybe the concept of site configurations would benefit from further consideration.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what you mean with duplicate functions. The method names (setup_cray(self)) and functions called (setup_cray())? I assume this is what you mean, and will rename the latter to setup_cray_script().

We could remove the setup_cray(self) method, but the advantage of having a method in between is that a site that has a few different compilers available, can overwrite these setup methods to do customisation (i.e. each compiler is updated in its own function). ATM, I mostly set up a site in update_toolbox (since it's only one compiler in what I have done so far), so if you setup three compilers that would make for a rather longish function.

Maybe I should change this and set up compiler by overwriting the setup_xx method.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the user experience we want to have is that building is always achieved through the same procedure. Specifically, changing into a project directory and issuing a command. The same command. So please rename all build scripts to build. Note that we do not add the .py extension to executables.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yaswant , @t00sa , @Pierre-siddall , @cameronbateman-mo

I did not see this mentioning in any coding style. IMHO, the fafct that typical coding filename extensions are not used in LFRic is pretty ... confusing. You can tell from the extension which kind of script it is, it makes it easier for tools to do the proper highlighting (and avoids the issue that you try to view a binary :) ). I always find it confusing if I see a script and can't even be certain if it is a script in the first place. If this is indeed LFRic coding style, it should be documented to become explicit.

Additionally, esp. if you work on a base class which will affect several other derived classes, having individual names for each applications make it much easier to know where you are in your editor (since an editor tab will usually not have enough space to display the whole path). E.g. you add or modify a method in LFRicBase, and then open one derived class after another (and then need to go back to fix something else up). I often end up with half a dozen fab* scripts. If they were all called build, I would not be able to see which tab is which build script (am happy to share a screenshot of my editor).

Also, using build might give the wrong impression to users, since there is no indication that this script requires Fab to be installed. An alternative would be to use .fav as suffix, but that then also confuses editors :)

Furthermore, we are still experimenting about the best design of scripts. For example, I often use PSyclone's kernel extraction, which requires a new step to be run before PSyclone (and files to be added to the build system etc). ATM, I am using a mixin class to add the new step, and than for any apps that I want to use kernel extraction with, I just create a derived class form the Fab build of this class, add the mixin, and it all works. But I have two different build scripts then (e.g. fab_lfric_atm.py, and fab_lfric_atm_extract.py. Obviously, other designs are possible (use extraction as a command line option and have if-tests to decide in the script what steps to do). For now, I prefer the design that allows me to create a derived script (since kernel extraction is pretty niche, so I don't want other people to get confused by code that in all likelihood they will never need). Additionally, we also use this in lfric inputs, to have separate scripts that build part of the tools (and one script that builds all). Using OO design here makes it easy to avoid code duplication. But we obviously can't do that with just one name.

IMHO, the condition of 'just one name' feels like a left-over from using Makefiles :) We have the opportunity with Fab to allow to design code that's easier to understand and manage, and given that I don't see the 'one name' as an explicit requirement, I would argue that this is the better approach.

Feedback welcome.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is common practice to drop the extension for executables, for instance executable shell scripts do not usually have the .sh extension. It is not in any code style document because it is common practice. The extension is still used when the file is a module in a package. It doesn't cause confusion because from the user's point of view it isn't a Python (or anything else) file, it is an executable which they will execute.

Most editors I have come across which provide syntax highlighting will also interrogate the "shebang" line to inform highlighting mode. Any executable implemented using Python should have a shebang.

Yes, having multiple identically named files open can be confusing, but this will be an infrequent occurrence. Particularly after initial development. However, the key thing to bare in mind here is that this is user facing and so should favour ease of use for the user, even when that is at the expense of development ease.

The requirement of the Fab framework is a project level one and will be documented at that level. A try/except block around the import could be used to provide a fuller error message but the user will still get a "missing module" error without it.

If there is a requirement to support exotic usage such as kernel extraction then projects which require this can have a .py file somewhere within them, possibly a directory fab which can then be imported into the build script and anywhere else it is needed.

Rather than being a "left-over" we looked at the way Make has a consistent user experience and said "Yes! That's what we want for our new system too." Furthermore, it is a common approach among build, and related, tooling. See CMake or Ninja which both use a standard command and a commonly named configuration file. An advantage of this approach is to clearly signal to the user when they can build (they see the configuration file) and how to do it, the use the build command.

@hiker
Copy link
Author

hiker commented Feb 9, 2026

I note that a lot of requests made in the original review on SRS have not been addressed so for convenience I have repeated them here.

I note that all my comments and questions to your original reviews (as originally agreed a few months ago on MetOffice/lfric-baf#83) have not been addressed so for convenience I will repeat them here.

It will take me a while to get through all your comments, but I will start adding comments now, so ideally we can discuss some of the issue and reach an agreement while I still work on other comments.

@hiker
Copy link
Author

hiker commented Feb 11, 2026

@MatthewHambley , thanks a lot for you helpful review.
As noted in my comments, I don't understand (or agree) with a few comments.

Copy link
Collaborator

@MatthewHambley MatthewHambley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a few things left to bottom out.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is common practice to drop the extension for executables, for instance executable shell scripts do not usually have the .sh extension. It is not in any code style document because it is common practice. The extension is still used when the file is a module in a package. It doesn't cause confusion because from the user's point of view it isn't a Python (or anything else) file, it is an executable which they will execute.

Most editors I have come across which provide syntax highlighting will also interrogate the "shebang" line to inform highlighting mode. Any executable implemented using Python should have a shebang.

Yes, having multiple identically named files open can be confusing, but this will be an infrequent occurrence. Particularly after initial development. However, the key thing to bare in mind here is that this is user facing and so should favour ease of use for the user, even when that is at the expense of development ease.

The requirement of the Fab framework is a project level one and will be documented at that level. A try/except block around the import could be used to provide a fuller error message but the user will still get a "missing module" error without it.

If there is a requirement to support exotic usage such as kernel extraction then projects which require this can have a .py file somewhere within them, possibly a directory fab which can then be imported into the build script and anywhere else it is needed.

Rather than being a "left-over" we looked at the way Make has a consistent user experience and said "Yes! That's what we want for our new system too." Furthermore, it is a common approach among build, and related, tooling. See CMake or Ninja which both use a standard command and a commonly named configuration file. An advantage of this approach is to clearly signal to the user when they can build (they see the configuration file) and how to do it, the use the build command.


:returns List[str]: list of all supported compiler profiles.
'''
return ["full-debug", "fast-debug", "production", "unit-tests"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is certainly a point, but also unit and integration tests don't have profiles. They always use the same set of arguments.

Comment on lines +97 to +107
def setup_cray(self, build_config: BuildConfig) -> None:
'''
This method sets up the Cray compiler and linker flags.
For now call an external function, since it is expected that
this configuration can be very lengthy (once we support
compiler modes).

:param build_config: the Fab build configuration instance
:type build_config: :py:class:`fab.BuildConfig`
'''
setup_cray(build_config, self.args)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The presence of these confusingly named duplicate functions is a smell. It suggests a problem with the design. Maybe the concept of site configurations would benefit from further consideration.

from templaterator import Templaterator


class LFRicBase(FabBase):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zero configuration is an important feature of Fab but the fact that this class exists means it is not being used. This is "some configuration" mode.

"point precision.")

group.add_argument(
'--precision-default', type=str, default=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this default actually mean in a world where the defaults are defined elsewhere and are not uniform. i.e. different precision bubbles default to different values.

Comment on lines 424 to 425
except ValueError:
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was not clear from the code. Some comments to explain each attempt would help.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but they are in the wrong place too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build tooling is where it is due to it all being in Gung-ho in the early days. When the infrastructure was calved off, the build tooling went with it because it was not Gung-ho specific. It is also not infrastructure specific but has not been moved due to resourcing constraints.

Now we are moving to a new build system, we can put the generic build tooling where it belongs, at the top level. This process has been started by creating a directory for it and adding a module for functionality we knew we wanted.

There is no document because the restructuring of the core repository is still under discussion. We do know, however, that we want the generic build support at the top level. I include the most recent diagram of our thinking regarding the rest of the layout. This is liable to change in detail, but the broad brush strokes are correct.

Image

Given that the only change required is to move the files, very little effort has been wasted. The mechanism through which to we find out about these plans is through review.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Comment on lines 392 to 397
def get_psyclone_config(self) -> List[str]:
'''
:returns: the command line options to pick the right
PSyclone config file.
'''
return ["--config", str(self._psyclone_config)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously I am not asking you to explain how OO works for every method. But when an otherwise pointless method exists, for the purpose of being over-ridden, that is worth documenting.

@hiker hiker requested a review from allynt as a code owner February 20, 2026 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed The CLA has been signed as part of this PR - added by GA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Fab build script for Skeleton apps

2 participants