Work Queue: Hardware-level isolation between tasks? #4370

pdobbelaere · 2026-03-03T17:01:41Z

pdobbelaere
Mar 3, 2026

TL;DR
Can we isolate computational resources to individual tasks managed by a single worker in WorkQueue?

We are using Parsl with WQ to build molecular modelling workflows (psiflow). The WorkQueueExecutor is convenient because it can schedule differently sized tasks together in a single resource block (i.e., a HPC job allocation). However, it does not force those tasks to use different resources (#3886), leading to thread contention issues or potentially more severe oversubscription problems. WQ does set several environment variables (OMP_NUM_CORES, ...), but those do not specify 'allocated cores' or similar.

Some possible ideas:

I have seen people launch multiple WQ workers in a single job allocation, effectively implementing a 1-task-per-worker model, but that also eliminates the task resource packing of WQ.
Vibe-googling pointed me towards TaskVine, which we have no experience with. A potential switch would require heavy refactoring, but might give the cleanest solution.
Most workflows where this is a serious issue are executed on HPC facilities. It is probably possible to hack together some SLURM srun wrappers. Obviously, that will only work in SLURM environments and becomes more complex when running in containers.

Any insight would be greatly appreciated.

dthain · 2026-03-03T18:03:25Z

dthain
Mar 3, 2026
Maintainer

Thanks for getting in touch to talk it over.

The general assumptions of WQ are this:

All tasks are owned by the same user, so we don't worry about them "attacking" each other.
However, they do need to assigned to distinct resources (cores, gpus, whatever) so as not to step on each other.
The WQ task should describe "how many" resources, and the worker should convey "which" specific resources to each running task.
Tasks are generally short-running, so WQ does not have any built-in virtualization at task startup. But that is something that could be added by wrappers if it makes sense in a specific case.

So I think we can break this down into two distinct problems:

Is the right information about resource assignment flowing down to the right layers? (And if not, then we need to do some plumbing to ensure assignments are conveyed correctly to each task.)
Can the tasks be trusted to respect their assignments? (And if not, then we need to do some hardware enforcement. But this comes at a cost.)

TaskVine does have a number of advantages over WQ. But in this aspect of resource assignment, it works the same. So let's just address the problem here in WQ, and then we can port the same solution over to TaskVine.

Based on your discussion so far, it seems to me the problem is #1 -- WQ is telling each task how many cores to use, but not which ones. If we could figure out some improvement using (for example) OMP_PLACES, then would the tasks behave themselves?

0 replies

pdobbelaere · 2026-03-04T11:13:29Z

pdobbelaere
Mar 4, 2026
Author

Hi! I think your assessment is correct, and whether tasks respect their 'resource allocation' should be up to the user -- not WQ. Ideally, a task could find which resources WQ reserved for it (in its environment), and from there restrict itself to those resources in some way (taskset, cgroups, what have you).

I do not know how the WQ worker manages the tasks it schedules. Does it internally assign e.g., cores to a task? Or does it simply keep track of the total 'cores in use' for all running tasks? In the first case, it might be possible to pipe this reservation to the task quite straightforwardly.

I'm afraid OMP_PLACES will not be the final solution. In general, our tasks can also rely on MPI software. Additionally, would you not need OMP_PROC_BIND=[CPU ids] to ensure different tasks do not bind threads to the same cores accidentally?

As mentioned in my first post, we usually run in a SLURM environment, and then srun -c $CORES [the task] could already achieve what we want - also avoiding the need for tasks to care about which resources they use. This could be the simplest option, but also a portability limitation.

Small disclaimer, this really is not familiar territory for me. I might be complicating things unnecessarily. Feel free to think outside my poorly informed box.

0 replies

dthain · 2026-03-04T13:40:30Z

dthain
Mar 4, 2026
Maintainer

(FYI, I'm moving this over to GitHub discussions; will start an issue once we have a firm idea of what's needed.)

0 replies

dthain · 2026-03-04T14:00:53Z

dthain
Mar 4, 2026
Maintainer

At the moment, each worker is assigned a fungible number of cores (work_queue_worker --cores X) and simply manages the total number of cores without assigning specific values. It would be any easy adjustment for the worker to instead assign specific cores to each task, and then pass that along through various environment variables. (We already do that with GPUs.)

But as you have pointed out, the mechanisms for this seem to vary a lot across operating systems, batch systems, sites, etc.

But for your case, srun --cores $C sounds like a reasonable approach, assuming the overhead of srun is negligible compared to the task. And I can see situations where we might want to do the same thing, except with taskset or cgroups etc. I think we have a few options for how to insert the wrapper.

1- Modify the application to insert the preamble in the place where it defines tasks. (Although this might be quite difficult given the whole software stack.)
2- Modify the Parsl-WorkQueue executor to insert a preamble into each task at the submission point. (Update to the executor config file and the Parsl codebase.)
3- Modify the worker to transparently insert a preamble on all tasks that it runs. (A fairly simple update to cctools.)

I'm actually leaning towards 3, because the worker (and factory) may already know that they are running in slurm, whereas the application may be agnostic.

What do you think?

0 replies

pdobbelaere · 2026-03-04T15:12:34Z

pdobbelaere
Mar 4, 2026
Author

Generally speaking, our tasks (solving Schrödinger, running molecular dynamics) are long-running processes. Overhead should never be a concern (for us).

If it is not a complex change to insert which resources (cores, GPUs) WQ assigns to a task into the task environment, then that already seems like a solid solution. From there, users could still choose to enforce/ignore this assignment in their task definition. I think this approach corresponds to your suggestion 1 and should be possible within our framework of Parsl bash_apps. One step further would be for WQ itself to set the resource limitation on tasks, similar to what HighThroughputExecutor in Parsl can do. I imagine this involves similar modifications, but now on the WQ worker side (your suggestion 3).

A possible caveat: the optimal subset of resources for a task (e.g., which 8 out of 32 cores maximise cache locality, minimise memory bandwidth congestion, etc.) is difficult to specify without detailed knowledge of the CPU infrastructure. The WQ worker should probably not concern itself with such low-level optimisations - that feels like scheduler stuff.

Regarding SLURM, the srun solution would solve this problem, but also create new things to fix for us. Currently, we run (some instances of) the WQ worker inside a container, in which the usual SLURM executables do not exist. It might be possible to 'bind' the srun functionality into our containers, but we have not explored this route yet.

Overall, I think this should be opt-in behaviour. I quickly checked the WorkQueueExecutor code, and it specifies an init_command option - "Command line to run before executing a task in a worker." - which might be a simple location to hook into.

0 replies

pdobbelaere · 2026-03-16T10:44:31Z

pdobbelaere
Mar 16, 2026
Author

(Gentle bump)

@dthain, what is your opinion on this? Perhaps we should ask the Parsl devs for their views too?

0 replies

dthain · 2026-03-16T17:33:57Z

dthain
Mar 16, 2026
Maintainer

My apologies, I mistaken thought you had reached a conclusion.

Given that this is opt-in, site specific behavior, then we should rely on a general command insertion rather than building up a whole now capability. And the Parsl init_command seems like the right way to insert that from the user side.

In the meantime, I was poking at the capability to insert a command from the worker side of things, to handle cases where the person deploying is not necessarily the same as the application author. This PR implements the ability to run work_queue_worker --task-wrapper "srun \$CORES" which wraps every task as indicated. You may find that more convenient, once it comes out in the next release.

I think the init_command method is the place to start -- does that work for you?

0 replies

pdobbelaere · 2026-03-17T09:34:43Z

pdobbelaere
Mar 17, 2026
Author

I would prefer for this functionality not to hinge upon SLURM being available (because it will not always be).

If WQ could expose environment variables like

WQ_CORES=4
WQ_CORE_IDS=0,1,2,3

similar to how it works for GPUs, then I imagine a simple wrapper

if slurm_available:
    # srun $WQ_CORES [...]
else:
    # taskset -c $WQ_CORE_IDS [...]

already goes a long way to fixing the problem.

tagging @benclifford for his view on how we could insert this functionality into the WorkQueueExecutor.

0 replies

dthain · 2026-03-17T17:16:31Z

dthain
Mar 17, 2026
Maintainer

We can definitely add the capability to expose specific core assignments to tasks, and I will add that to the short-term queue here.

I think there are so many potential configurations that I am reluctant to hard-code how those assignments are enforced, at least yet. (I'm thinking about different batch systems, containers, topologies, nested workers, omp, mkl, etc. etc.)

If you are willing to deploy the srun solution via the parsl task_wrapper capability, let's see if that has the desired effect on your end. And if that leads to a good experience, we can think about a more permanent feature.

0 replies

pdobbelaere · 2026-03-18T09:43:55Z

pdobbelaere
Mar 18, 2026
Author

Great! That would really help us out.

I agree that a one-size-fits-all solution is unlikely given the variety of execution environments. To me, it's preferable for users to explicitly specify how they enforce resource assignments - i.e., through some transparent plugin/wrapper setup - rather than it being hidden away in complex code wizardry.

I will definitely try it out for our setup and report back to you.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Work Queue: Hardware-level isolation between tasks? #4370

Uh oh!

{{title}}

Uh oh!

Replies: 10 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Work Queue: Hardware-level isolation between tasks? #4370

Uh oh!

pdobbelaere Mar 3, 2026

Replies: 10 comments

Uh oh!

dthain Mar 3, 2026 Maintainer

Uh oh!

pdobbelaere Mar 4, 2026 Author

Uh oh!

dthain Mar 4, 2026 Maintainer

Uh oh!

dthain Mar 4, 2026 Maintainer

Uh oh!

pdobbelaere Mar 4, 2026 Author

Uh oh!

pdobbelaere Mar 16, 2026 Author

Uh oh!

dthain Mar 16, 2026 Maintainer

Uh oh!

pdobbelaere Mar 17, 2026 Author

Uh oh!

dthain Mar 17, 2026 Maintainer

Uh oh!

pdobbelaere Mar 18, 2026 Author

pdobbelaere
Mar 3, 2026

dthain
Mar 3, 2026
Maintainer

pdobbelaere
Mar 4, 2026
Author

dthain
Mar 4, 2026
Maintainer

dthain
Mar 4, 2026
Maintainer

pdobbelaere
Mar 4, 2026
Author

pdobbelaere
Mar 16, 2026
Author

dthain
Mar 16, 2026
Maintainer

pdobbelaere
Mar 17, 2026
Author

dthain
Mar 17, 2026
Maintainer

pdobbelaere
Mar 18, 2026
Author