fix_worker_issue by hamshkhawar · Pull Request #570 · PolusAI/image-tools

hamshkhawar · 2024-11-18T18:18:26Z

Toil throws an error if the minimum cpu workers are more than one. So this is a fix for running jobs with Toil

Nicholas-Schaub

It looks like we removed preadator. I think that's okay, but let's not lose the focus of what preadator was doing. It allowed multiple images to be processed in parallel using multiple processes with each BioReader/BioWriter pair using multiple threads. In this code, there seems to be no distinction between the two.
It's unclear to me why we try to do a full direct read and write of a file, and then if theres an error fall back to a tiled read. Even when we do a tiled read, we still load the full image into memory by storing it in "final_image", which defeats the purpose of doing tiled reads and writes. Then we don't even do a tiled write.

I realize this change was made somewhere along the way, but we cannot lose one of the primary criteria why this tool exists: a scalable way to convert images to ome format that can run more or less on any hardware.

Nicholas-Schaub · 2026-03-02T14:38:15Z

formats/ome-converter-tool/src/polus/images/formats/ome_converter/__main__.py

-    ) as executor:
-        threads = []
+    # Use ProcessPoolExecutor for multiprocessing
+    with ProcessPoolExecutor(max_workers=NUM_THREADS) as executor:


We should make sure to draw a distinction between number of thread and number of processes.

I added ENV variables separately

Nicholas-Schaub · 2026-03-02T14:40:00Z

formats/ome-converter-tool/src/polus/images/formats/ome_converter/image_converter.py

-
-if platform.startswith("linux"):
-    NUM_THREADS = len(os.sched_getaffinity(0)) // 2  # type: ignore
+if not NUM_THREADS_ENV or NUM_THREADS_ENV == "1":


I'm not sure I understand this. If we define the number of threads in the environment as 1, we want to override it?

I have redefined it. It wasn't overriding the ENV variable but was setting it to half of the cpu cores if not defined

Nicholas-Schaub · 2026-03-02T14:43:39Z

formats/ome-converter-tool/src/polus/images/formats/ome_converter/image_converter.py

+                    if br.Z > 1:
+                        suffix_parts.append(f"_z{z}")
+                    if num_series > 1:
+                        suffix_parts.append(f"_level_{idx}")


This seems to break with how we define other components. For example, "z{z}" instead of "z_{z}".

Should we maybe do "s{idx}"?

Each series is saved independently, I renamed level to s

Nicholas-Schaub · 2026-03-02T14:45:36Z

formats/ome-converter-tool/src/polus/images/formats/ome_converter/image_converter.py

+    """Process a single view (t,c,z) from a BioReader."""
+    try:
+        # Try direct read first
+        final_image = br[:, :, z, c, t]


This is a complete rewrite of our scalable read/write algorithm. We need to restore our previous method because it was designed for scalability.

Nicholas-Schaub · 2026-03-02T14:45:57Z

formats/ome-converter-tool/src/polus/images/formats/ome_converter/image_converter.py

-                    ),
+        try:
+            # Explicitly set the series/level parameter when opening the file
+            with BioReader(inp_image, max_workers=NUM_THREADS, level=idx) as br:


Remember, workers here is actually threads and not processes.

I have two separate ENV variable now set to define the
NUM_WORKERS for defining separate processes
NUM_THREADS for each processes uses internally

Nicholas-Schaub · 2026-03-02T14:46:54Z

formats/ome-converter-tool/src/polus/images/formats/ome_converter/image_converter.py

+        write_image(
+            br=br,
+            c=c,
+            image=final_image,
+            out_path=out_path,
+            max_workers=NUM_THREADS,
+        )


Same as above. Reading and writing a whole plane is not viable for certain image types like whole brain slices, pathology, etc.

Nicholas-Schaub · 2026-03-02T14:51:37Z

formats/ome-converter-tool/src/polus/images/formats/ome_converter/image_converter.py

@@ -141,6 +254,14 @@ def convert_image(
                out_path=out_path,
                max_workers=NUM_THREADS,
            )


I realize this isn't your code, but we shouldn't be trying to store the entire image output in memory when we may have memory bound nodes.

I am now writing each tile separately

Nicholas-Schaub · 2026-03-02T14:54:36Z

formats/ome-converter-tool/src/polus/images/formats/ome_converter/image_converter.py

+    with ProcessPoolExecutor(max_workers=NUM_THREADS) as executor:
+        futures = []
        for files in fps():
            file = files[1][0]
-            threads.append(
-                executor.submit(convert_image, file, file_extension, out_dir),
-            )
+            futures.append(executor.submit(convert_image, file, POLUS_IMG_EXT, out_dir))

        for f in tqdm(
-            as_completed(threads),
-            total=len(threads),
+            as_completed(futures),
+            total=len(futures),
            mininterval=5,
-            desc=f"converting images to {file_extension}",
+            desc=f"converting images to {POLUS_IMG_EXT}",


This code seems redundant with what's in the main typer function. It seems like we should maybe remove the code in the main typer function and just call "batch_convert"

hamshkhawar

@Nicholas-Schaub
I’ve addressed the comments and updated the dependencies and Python version. It appears some dependencies (e.g., Typer and others) require Python 3.10 and are not installed under Python 3.9 in GitHub Actions, which is causing the failures. We likely need to update the Python version used in the GitHub Actions workflow.

hamshkhawar · 2026-03-02T18:24:36Z

formats/ome-converter-tool/src/polus/images/formats/ome_converter/image_converter.py

-                    ),
+        try:
+            # Explicitly set the series/level parameter when opening the file
+            with BioReader(inp_image, max_workers=NUM_THREADS, level=idx) as br:


I have two separate ENV variable now set to define the
NUM_WORKERS for defining separate processes
NUM_THREADS for each processes uses internally

hamshkhawar · 2026-03-02T18:41:02Z

formats/ome-converter-tool/src/polus/images/formats/ome_converter/image_converter.py

-
-if platform.startswith("linux"):
-    NUM_THREADS = len(os.sched_getaffinity(0)) // 2  # type: ignore
+if not NUM_THREADS_ENV or NUM_THREADS_ENV == "1":


I have redefined it. It wasn't overriding the ENV variable but was setting it to half of the cpu cores if not defined

hamshkhawar · 2026-03-02T18:41:36Z

formats/ome-converter-tool/src/polus/images/formats/ome_converter/__main__.py

-    ) as executor:
-        threads = []
+    # Use ProcessPoolExecutor for multiprocessing
+    with ProcessPoolExecutor(max_workers=NUM_THREADS) as executor:


I added ENV variables separately

hamshkhawar · 2026-03-02T20:45:11Z

formats/ome-converter-tool/src/polus/images/formats/ome_converter/image_converter.py

+                    if br.Z > 1:
+                        suffix_parts.append(f"_z{z}")
+                    if num_series > 1:
+                        suffix_parts.append(f"_level_{idx}")


Each series is saved independently, I renamed level to s

hamshkhawar · 2026-03-03T18:22:10Z

formats/ome-converter-tool/src/polus/images/formats/ome_converter/image_converter.py

+    with ProcessPoolExecutor(max_workers=NUM_THREADS) as executor:
+        futures = []
        for files in fps():
            file = files[1][0]
-            threads.append(
-                executor.submit(convert_image, file, file_extension, out_dir),
-            )
+            futures.append(executor.submit(convert_image, file, POLUS_IMG_EXT, out_dir))

        for f in tqdm(
-            as_completed(threads),
-            total=len(threads),
+            as_completed(futures),
+            total=len(futures),
            mininterval=5,
-            desc=f"converting images to {file_extension}",
+            desc=f"converting images to {POLUS_IMG_EXT}",


hamshkhawar · 2026-03-03T18:22:37Z

formats/ome-converter-tool/src/polus/images/formats/ome_converter/image_converter.py

@@ -141,6 +254,14 @@ def convert_image(
                out_path=out_path,
                max_workers=NUM_THREADS,
            )


I am now writing each tile separately

hamshkhawar · 2026-03-03T18:22:45Z

formats/ome-converter-tool/src/polus/images/formats/ome_converter/image_converter.py

+        write_image(
+            br=br,
+            c=c,
+            image=final_image,
+            out_path=out_path,
+            max_workers=NUM_THREADS,
+        )


hamshkhawar requested review from NHotaling, Nicholas-Schaub and hsidky as code owners November 18, 2024 18:18

ndonyapour added a commit to ndonyapour/image-tools that referenced this pull request Feb 27, 2026

Merge PR PolusAI#570: OME Converter

d660cb1

Nicholas-Schaub requested changes Mar 2, 2026

View reviewed changes

hamshkhawar added 9 commits March 2, 2026 12:24

fix_worker_issue

3e94961

fix logic for channelnames

67934ba

func to handle path and list of files

e8877ad

Bump version: 0.3.3-dev5 → 0.3.3-dev6

5b49743

fix the bug for flex file

35901d8

Bump version: 0.3.3-dev6 → 0.3.4-dev0

348ccfa

fix the bug for flex file

0a18d39

fix the bug for flex file

2df2ef1

Bump version: 0.3.4-dev0 → 0.3.4-dev1

745b2c2

hamshkhawar force-pushed the fix_worker_omeconverter branch from 55e95ee to 745b2c2 Compare March 2, 2026 17:30

hamshkhawar added 6 commits March 3, 2026 11:23

Bump version: 0.3.4-dev1 → 0.3.4-dev2

3bceb17

addressed Nick comments

fe6d3e6

fix ruff mypy checks

eb633cc

Apply black formatting

b512a9e

fixing ruff mypy checks

1230daf

updating docs

ca42e7c

hamshkhawar commented Mar 3, 2026

View reviewed changes

hamshkhawar added 2 commits March 3, 2026 14:27

update README

eac8cfc

refactored convert_image to resolve PLR0912

718199f

Nicholas-Schaub approved these changes Mar 6, 2026

View reviewed changes

Nicholas-Schaub merged commit c36895b into PolusAI:master Mar 6, 2026
3 of 4 checks passed

Conversation

hamshkhawar commented Nov 18, 2024

Uh oh!

Nicholas-Schaub left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hamshkhawar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants