feat: async run_experiment via RunHandle + cancellation + status widget by hinderling · Pull Request #10 · pertzlab/faro

hinderling · 2026-05-15T14:08:05Z

Summary

Move the MDA feed loop onto a worker thread, expose live status through a RunHandle (psygnal Signal), and add a napari dock widget that mirrors + steers the current run. Replaces the synchronous-blocking run_experiment / continue_experiment API.

Draft: breaks the public API. Notebook updates required (see below) before merging. The async demo notebook included here is a test artifact — it must be removed before merge (see Demo notebook section).

Why

The controller's feed loop ran on the main thread, so:

napari froze for the duration of every run (no Qt-event processing).
run_experiment blocked the calling cell — no interactive monitoring / cancellation without Ctrl-C (which sometimes left device state half-set).
Status was opaque: "what timepoint are we on, are we lagging?" was unanswerable.
No clean way to cancel or pause a long run.

Moving the loop onto its own thread fixes all of these: napari is responsive by construction, the cell returns immediately, and cancellation / pause / live status become natural.

What changed

New: `faro/core/run_status.py`

RunStatus — immutable snapshot dataclass: state, current_event_index, current_fov, n_events_total, n_events_consumed, n_frames_received, started_at / finished_at, lag_ms, background_errors, fatal_error, …
RunHandle — owns the worker thread + cooperative cancel/pause events, carries the run's (sorted) event list. Methods: status(), wait(), cancel(), pause(), resume(), is_running(), is_paused(). Signal: statusChanged (psygnal) emitting the latest RunStatus.
RunState: pending → running ⇄ pausing/paused → done/error (cancelling on cancel).

`faro/core/controller.py`

Controller.runStarted = Signal(object) fires on each new run/continue carrying the fresh RunHandle.
run_experiment / continue_experiment spawn a worker thread and return the handle immediately; validation still runs synchronously on the caller. Events are sorted once and stashed on the handle so the widget renders them in execution order.
_run_worker centralises pre-flight setup and wraps the feed loop so failures land in handle.fatal_error instead of crashing the user.
_run_mda_with_events polls cancel_event and pause_event each iteration — pause halts feeding after the in-flight backpressure window drains; resume continues.
fix: the engine queue is recreated per run. A cancelled run aborts the engine mid-drain, leaving a stale STOP_EVENT behind; reusing the queue made the next run's engine consume that sentinel and stall after a few events ("stuck at 3/80").
fix: _bump_status_for_frame skips IMG_STIM snaps — a stim emission is the SLM-illuminated snap paired with its imaging frame; counting it double-updated lag/elapsed and drifted the frame count off the RTMEvent count.
napari preview: the controller no longer carries its own preview-layer machinery, and live mode no longer has to be manually disconnected before a run. napari-micromanager's own _NapariMDAHandler keeps routing frames into the preview layer throughout the run; the controller just stops continuous sequence acquisition once at MDA start to avoid a snap-buffer race. Notebooks can drop the old "break the CoreViewerLink before running" dance.

New: `faro/widgets/experiment_status.py`

ExperimentStatusWidget — a napari dock panel that mirrors and controls the current run:

State chip, legend (imaging / stim / ref).
Event strip — one cell per RTMEvent, color-coded by type, past=opaque / future=dimmed progress fill, current cell bordered. Scales to thousands of events.
FOV map — one dot per unique stage position, equal-aspect, visit-order path, active dot recolored to the current event type.
Stats — event N/M, elapsed, scheduled, lag (red > 5 s), remaining, errors.
Pause / Resume + Stop buttons.
Theme-adaptive (napari light/dark), auto-rebinds on every new run via runStarted.

Async/Qt fixes folded in

PYMM_SIGNALS_BACKEND=psygnal forced in faro/microscope/base.py — with a QApplication loaded, pymmcore-plus otherwise picks the Qt signal backend and queues frameReady to the main thread; if the main thread is blocked (handle.wait()), frames never reach the controller. Forcing psygnal keeps the data path direct/synchronous on the engine thread.
Widget connects statusChanged with thread="main" + drives psygnal.qt.start_emitting_from_queue() so worker-thread emits reach QWidgets safely.
uv.lock: bumped pymmcore-widgets past an upstream fix (_presets_widget crashing on an empty device label during MDA events).

BREAKING: notebook updates required

Before

ctrl.run_experiment(events, stim_mode="current")   # blocked here
ctrl.finish_experiment()

After — choose one:

(a) Blocking equivalent (smallest diff):

ctrl.run_experiment(events, stim_mode="current").wait()
ctrl.finish_experiment()

(b) Non-blocking, with status / cancel / pause:

handle = ctrl.run_experiment(events, stim_mode="current")
# other cells can run; handle.status() / handle.cancel() / handle.pause()
handle.wait()                  # block at the end if desired
ctrl.finish_experiment()

Optional napari widget:

from faro.widgets import ExperimentStatusWidget
viewer.window.add_dock_widget(ExperimentStatusWidget(ctrl), name="Experiment")

Demo notebook (test artifact — remove before merge)

experiments/02_demo_sim_optogenetic/demo_sim_optogenetic_napari_async.ipynb is included only to exercise this PR against the virtual-microscope optogenetic backend (async run, pause/resume, cancel/restart, the status widget, multi-FOV). It doubles as a worked example of what the migrated notebooks could look like. It should be deleted before this PR merges — the real deliverable is the API + widget, not this notebook.

What to check / test before merging

Every notebook in experiments/* that calls run_experiment / continue_experiment — migrate to .wait() or the non-blocking flow. Confirm none rely on the old blocking return.
Notebooks that manually tear down the napari live link / CoreViewerLink before a run — that workaround is no longer needed; verify removing it and that the preview layer keeps updating during the run.
tests/hardware/* — update for the new RunHandle return type; run on the Moench rig.
Multi-channel imaging: the widget's frame counter / strip cursor assume ~1 imaging frame per RTMEvent. For multi-channel plans n_frames_received outpaces the RTMEvent count — verify the strip/stats still read sensibly or gate the assumption.
continue_experiment + the widget: confirm the strip/map rebuild correctly for the appended events and the FOV map merges positions.
Headless / no-Qt runs (CI, non-microscope dev machine) — import faro stays Qt-free; .wait() path works without a QApplication.
Cancel-then-restart and pause/resume on real hardware (verified on the simulator; engine-abort semantics differ per device).
Bump the virtual-microscope lockfile pin — uv lock --upgrade-package virtual-microscope to pick up the fixes now on its default branch (JIT pre-warm; SimCameraDevice digital ROI / MDA-teardown fix). Without this the demo notebook's first ~4 s of frames stall and the napari Snap preview freezes after a run. Commit the uv.lock change separately (it is not async/widget code).

Related (separate repo)

Two virtual-microscope fixes were needed for the demo notebook and have already landed on its default branch (virtual-env):

JIT pre-warm — pre-warms the numba physics-step JIT before the RealtimeEngine starts; otherwise the first ~4 s of snaps stall behind a compile holding the sim lock, so frames arrive in a burst instead of paced.
SimCameraDevice digital ROI — implements real ROI cropping. It also fixes an MDA-teardown bug: the camera previously raised NotImplementedError from set_roi, which aborted MDARunner._finish_run before it emitted sequenceFinished; napari-micromanager then never cleared _mda_running, so the Snap preview silently stopped updating after a run.

These are not part of this PR — faro just needs the lockfile bump above to pick them up.

Verification

Exercised end-to-end against the virtual-microscope optogenetic backend (napari + napari-micromanager + the widget):

Live status flows worker → widget on the main thread (psygnal queued delivery); strip / FOV map / stats update in real time.
Cancel mid-run, then restart from the notebook — reaches steady state, no stall.
Pause halts feeding after the backpressure window drains; resume runs to completion.
Frame count tracks RTMEvents 1:1 for single-channel plans; stim snaps no longer double-count.
87 unit tests pass.

Compatibility notes

Headless / no Qt: works — psygnal delivers slots synchronously without Qt. Widget package is opt-in (import faro.widgets); import faro / import faro.core stay Qt-free.
MDA engines other than pymmcore-plus: no regression — the controller still talks to hardware exclusively through AbstractMicroscope.

Screenshot

Move the MDA feed loop onto a worker thread, expose live status through a RunHandle + psygnal Signal, and add a minimal napari widget that mirrors the current run. Breaking change: ctrl.run_experiment(events, ...) and ctrl.continue_experiment(...) now return a RunHandle immediately instead of blocking until the run is done. Existing notebooks that did `ctrl.run_experiment(events, ...)` must be updated to either `handle = ctrl.run_experiment(events, ...); handle.wait()` for the old blocking semantics, or to use the new non-blocking flow (poll handle.status(), subscribe to handle.statusChanged, call handle.cancel() to stop early). What's in this commit: - faro/core/run_status.py (new): * RunStatus -- immutable snapshot dataclass with state, event/FOV indices, frame count, lag_ms, error info. * RunHandle -- owns the worker thread + cooperative cancel event, exposes status()/wait()/cancel()/is_running() + a psygnal statusChanged signal that emits the latest RunStatus on each update. Subscribers on the main thread see queued-connection delivery via psygnal's Qt integration. - faro/core/controller.py: * Controller exposes a class-level runStarted = Signal(object). Fires on every new run/continue so widgets can re-bind. * run_experiment / continue_experiment spawn a worker thread, return the handle, emit runStarted. Validation still happens synchronously so a bad event list raises on the calling thread. * _run_worker centralises pre-flight setup (writer init -- including the potentially-slow zarr rmtree on overwrite -- and Analyzer construction) and wraps the feed loop in try/except so worker-side failures land in handle.fatal_error rather than crashing the user. * _run_mda_with_events accepts the handle, checks handle.cancel_event at each loop iteration and in the backpressure throttle, asks the engine to cancel the in-flight event when set, and emits status updates on each RTMEvent dequeue. * _on_frame_ready (and ControllerSimulated._on_frame_ready) call a shared _bump_status_for_frame helper that increments n_frames_received and computes lag_ms vs event.min_start_time. * Now off the main thread, all the prior Qt-pumping helpers (_pump_qt_and_sleep, _qt_join, _wait_for_frame_pumping_qt) and the superqt ensure_main_thread import are obsolete and removed. The preview-layer machinery (viewer=, _on_preview_frame, _apply_preview, PREVIEW_LAYER_NAME) is also removed -- napari-micromanager's own _NapariMDAHandler already routes generator events into the preview layer. * finish_experiment now waits for the current handle before shutting down the Analyzer. * _pending_sentinels guarded by a Lock since extend_experiment now runs on the calling thread while the feed loop runs on the worker. - faro/widgets/experiment_status.py (new): * ExperimentStatusWidget -- read-out of state, FOV, event index, frame count, lag, elapsed time, error count. Has a Stop button that calls handle.cancel(). Subscribes to controller.runStarted so it automatically re-binds when a new run begins; cleans up the previous handle's signal subscription on each rebind. Verified end-to-end via a Qt smoke test: - Live updates flow from the worker thread to the widget on the main thread (psygnal+Qt queued delivery). - Stop button triggers handle.cancel(); the worker's cancel-check fires within one iteration and the run exits at the next event boundary. - Starting a new run re-binds the widget to the new handle and resets the progress bar / counters.

The OmeZarrWriter init in _run_worker still pulled image height/width via self._mic.mmc.getImageHeight/Width -- a pymmcore-plus-specific call that breaks any non-pymmcore microscope. Use the AbstractMicroscope-level convention: subclasses populate self.image_height / self.image_width on the microscope instance (Moench already does this in init_scope). Fall back to mmc if the attributes aren't present but mmc is, so existing pymmcore-only microscopes keep working without code changes. Raise a clear error when neither path is available.

Three independent bugs surfaced when running the new async run_experiment + ExperimentStatusWidget against a napari viewer (reproduced with the optogenetic virtual_microscope backend): 1. pymmcore-plus's signals_backend() auto-selects the *qt* backend whenever a QApplication is loaded. core.mda.events.frameReady then becomes a QtCore.SignalInstance and cross-thread emits land in Qt.QueuedConnection, where they're delivered only when the main thread pumps events. With Controller.run_experiment now spawning a worker and RunHandle.wait() joining on it, the main thread is typically idle-blocked exactly when the engine is firing frames -- so the controller's _on_frame_ready never ran, the engine completed "successfully" with zero frames received, and the pipeline never saw any data. Force PYMM_SIGNALS_BACKEND=psygnal in faro/microscope/base.py so the data path stays direct/synchronous on the engine thread regardless of whether Qt is loaded. The widget-side path (RunHandle.statusChanged) still uses psygnal's own queued delivery -- see fix #2. 2. ExperimentStatusWidget connected handle.statusChanged with the default (direct) connection. Status updates emitted from the worker thread therefore ran the widget's _refresh slot synchronously off-main, calling QLabel.setText / QProgressBar.setValue from a non-GUI thread. Under napari that lands in vispy's OpenGL compositor and aborts with "Cannot make QOpenGLContext current in a different thread" -> SIGABRT (kernel hard-crash in VSCode Jupyter). Switch to connect(..., thread="main") so psygnal queues the call into its main-thread queue. 3. psygnal's queued callbacks live in QueuedCallback._GLOBAL_QUEUE, which nothing drains by default -- the widget would be invoked on the main thread, but only when something explicitly calls psygnal.emit_queued(). RunHandle's docstring claims auto-Qt delivery; that's not how psygnal actually works. Call psygnal.qt.start_emitting_from_queue() in the widget's __init__, which installs a main-thread QTimer that fires emit_queued() on every Qt event-loop tick. Idempotent and global, so multiple widgets / multiple runs are safe. Lockfile: bump pymmcore-widgets (8c8f76e -> 48ff414) so the unrelated upstream crash in pymmcore_widgets._presets_widget._on_property_changed when handed an empty device label (virtual_microscope's shutter) is included. Without that bump, the MDA engine itself aborts on the first setShutterOpen() once frames actually start flowing. Verified end-to-end against virtual_microscope's optogenetic backend: - headless async run: 5/5 frames (regression check, unchanged) - napari.Viewer() + handle.wait(): 5/5 frames (was 0/5) - napari + napari-micromanager + widget: 5/5 frames, no crash, exit 0 - widget visibly updates progress / frames / state mid-experiment (sampled QLabel.text() while pumping Qt events) - 87 unit tests still pass

Sibling of demo_sim_optogenetic.ipynb that exercises the new async run_experiment + RunHandle + ExperimentStatusWidget end-to-end against virtual_microscope's optogenetic backend, with a live napari viewer dock-attached. Walks through: handle = ctrl.run_experiment(...) is non-blocking, the kernel is free; poll handle.status() while it runs; subscribe to handle.statusChanged from the kernel side; cancel via the widget Stop button or handle.cancel(); handle.wait() blocks if you want the old synchronous semantics; continue_experiment() re-binds the widget automatically via runStarted. Phases are concatenated with combine(..., axis="t") per the new RTMSequence API.

Backend changes that make an async run inspectable and steerable -- the data the new ExperimentStatusWidget renders, plus two bug fixes surfaced while building it. run_status.py - RunHandle.events: optional snapshot of the (sorted) RTMEvents the handle is driving, so widgets can render per-event visualisations (event strip, FOV map) that need the full plan up front. - Pause/resume: RunState gains "pausing"/"paused"; RunHandle gains pause()/resume()/is_paused() and a pause_event the feed loop polls. cancel() now also clears the pause event so a cancel while paused still releases the feed loop. controller.py - run_experiment / continue_experiment sort events once (by min_start_time, then position) and stash the sorted list on the handle, so the order the worker processes matches what the widget displays. - Feed loop honors pause_event: before pulling the next RTMEvent it checks the flag, flips state to "paused", and idles until resume() -- the MDA engine drains whatever is already queued, then waits. - fix: the engine queue (self._queue) is recreated per run. The finally-block feeds a STOP_EVENT sentinel to stop the engine; on a *cancelled* run cancel_mda() aborts the engine, which may stop without draining the queue, leaving stale events + the sentinel behind. Reusing that queue made the next run's engine consume the stale sentinel and exit after a few events ("stuck at 3/80"). A fresh queue per run fixes it. - fix: _bump_status_for_frame skips IMG_STIM frames. A stim emission is the SLM-illuminated snap paired with its imaging frame; counting it double-updated the status (lag/elapsed refreshing twice per stim event) and made n_frames_received drift away from the RTMEvent count. Imaging + ref frames are the meaningful data frames. Verified end-to-end against the optogenetic virtual-microscope backend: cancel mid-run then restart reaches steady state (no stall); pause halts feeding after the backpressure window drains and resume continues to completion; frame count tracks RTMEvents 1:1 for single-channel plans.

Rework the minimal status widget into a full run dashboard, driven by the RunHandle data exposed in the previous commit. Components (top to bottom): - State chip -- RUNNING / PAUSED / DONE / ... as plain text in a translucent-neutral rounded chip (no per-state fill: a colored banner competed with the imaging/stim/ref legend colors). - Legend chips -- imaging / stim / ref; the chip matching the current event type is fully opaque, the others dimmed. - EventStrip -- one cell per RTMEvent, color-coded by type. Past + current cells opaque (progress fill), future cells dimmed. Same-type runs are coalesced into single fills so thousands of events render with correct alpha instead of over-stacking at sub-pixel widths. Empty state draws a "(no events loaded)" placeholder. - FovMap -- one dot per unique FOV position, equal-aspect (a straight line of FOVs stays a line), grey visit-order path, active dot recolored to the current event type. Pinned square via resizeEvent. Paints its own rounded panel background; "FOV X/Y" counter in the corner. - Stats form -- event N/M, elapsed, scheduled, lag, remaining, errors. Times formatted hh:mm:ss with the leading unit suffixed and dropped when zero; lag turns red past 5 s. Wrapped in a shaded panel echoing napari's layer-controls boxes. - Pause/Resume + Stop buttons. Threading / theming details: - statusChanged is connected with thread="main" and the widget calls psygnal.qt.start_emitting_from_queue() so worker-thread emits are delivered on the GUI thread (drives QWidgets safely under napari). - A 250 ms QTimer ticks the elapsed/remaining clocks between status emissions so time fields don't freeze between frames. - The strip cursor tracks n_frames_received (actual snaps), not n_events_consumed (the feed loop runs 3-4 ahead via backpressure, which made the strip jump several cells at run start). - Colors/fonts derive from the Qt palette so the widget adapts to napari's light/dark theme; corner radii match napari widgets.

Add a second stage position (20, 20, 0) to the baseline / stim / recovery sequences so the demo exercises a 2-FOV acquisition -- the ExperimentStatusWidget's FOV map then shows both positions and the visit-order path between them. Drop the frame interval 1.5s -> 1s.

hinderling · 2026-05-16T11:38:39Z

@alandolt can you have a look if you see any general issues with this architecture change? still a few open TODOs before merging, but the main idea is there i think! but would be great to have your input before i start migrating the other notebooks etc. I think this will also be useful more long-term, running experiments on different microscopes simultaneously with BO for example, in combo with pymmcore-proxy.

hinderling mentioned this pull request May 16, 2026

fix: stop live mode + pump Qt event loop during run_experiment #9

Draft

3 tasks

hinderling and others added 7 commits May 16, 2026 11:35

hinderling force-pushed the feat/async-run-handle branch from d473b9b to 3c0e798 Compare May 16, 2026 09:53

hinderling marked this pull request as ready for review May 16, 2026 11:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: async run_experiment via RunHandle + cancellation + status widget#10

feat: async run_experiment via RunHandle + cancellation + status widget#10
hinderling wants to merge 7 commits into
pertzlab:mainfrom
hinderling:feat/async-run-handle

hinderling commented May 15, 2026 •

edited

Loading

Uh oh!

hinderling commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hinderling commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What changed

New: faro/core/run_status.py

faro/core/controller.py

New: faro/widgets/experiment_status.py

Async/Qt fixes folded in

BREAKING: notebook updates required

Before

After — choose one:

Demo notebook (test artifact — remove before merge)

What to check / test before merging

Related (separate repo)

Verification

Compatibility notes

Screenshot

Uh oh!

hinderling commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hinderling commented May 15, 2026 •

edited

Loading

New: `faro/core/run_status.py`

`faro/core/controller.py`

New: `faro/widgets/experiment_status.py`