-
Notifications
You must be signed in to change notification settings - Fork 8
Description
JobTypes can currently describe how the applications are called depending on the underlying (software) layers used for running them.
The involved layers we have identified are (based on our experience):
- Workload manager (e.g.
Slurm) - Process manager (e.g.
Openmpi) - Container technology (e.g.
Singularity) - Software call (e.g.
echo hola)
Currently, workload managers types like Slurm or Torque describe how a batch script is submitted. In addition, a SingularityJob type describes how to call a singularity container inside workload manager. In particular, a SingularityJob type always use mpirun as a process manager, but I think is not always mandatory (e.g. a sequential singularity job). Finally, a particular software call is also expressed through a command job option.
In my opinion, the top (workload manager) and bottom ("software-call") layers of the hierarchy are mandatory. I think the other layers could be optional and interrelated in order to express (in a more flexible way) the requirements and execution mode of every application. As a suggestion, a contained_in relationship could express this kind of hierarchical structures.
As a(n) (simplified) example, the following options could be acceptable:
srun commandsrun [mpirun] commandsrun [singularity] commandsrun [mpirun] [singularity] command
This flexibility could be expressed by a blueprint like the following example:
computational_resources:
type: hpc.nodes.WorkloadManager
properties:
...
mpi:
type: hpc.nodes.MPIJob
properties:
job_options:
modules:
- openmpi
...
flags:
- '--mca orte_tmpdir_base /tmp '
- '--mca pmix_server_usock_connections 1'
relationships:
- type: contained_in #job_managed_by_wm
target: computational_resources
container:
type: hpc.nodes.SingularityJob
properties:
job_options:
modules:
- singularity
...
relationships:
- type: contained_in
target: mpi
software:
type: hpc.nodes.ShellJob
properties:
job_options:
command: 'flow inputfile outputfile'
relationships:
- type: contained_in
target: container
Even if we though about other container solutions (docker), or single-node parallel jobs, we can interchange MPI+container-contained_in relationships, avoiding the MPI-vendor-and-version matching inside and outside the container in some cases:
srun [singularity | docker] [mpirun] command
To think