feat: async run_experiment via RunHandle + cancellation + status widget#10
Open
hinderling wants to merge 7 commits into
Open
feat: async run_experiment via RunHandle + cancellation + status widget#10hinderling wants to merge 7 commits into
hinderling wants to merge 7 commits into
Conversation
3 tasks
Move the MDA feed loop onto a worker thread, expose live status through a
RunHandle + psygnal Signal, and add a minimal napari widget that mirrors
the current run.
Breaking change:
ctrl.run_experiment(events, ...) and ctrl.continue_experiment(...) now
return a RunHandle immediately instead of blocking until the run is
done. Existing notebooks that did `ctrl.run_experiment(events, ...)`
must be updated to either `handle = ctrl.run_experiment(events, ...);
handle.wait()` for the old blocking semantics, or to use the new
non-blocking flow (poll handle.status(), subscribe to
handle.statusChanged, call handle.cancel() to stop early).
What's in this commit:
- faro/core/run_status.py (new):
* RunStatus -- immutable snapshot dataclass with state, event/FOV
indices, frame count, lag_ms, error info.
* RunHandle -- owns the worker thread + cooperative cancel event,
exposes status()/wait()/cancel()/is_running() + a psygnal
statusChanged signal that emits the latest RunStatus on each update.
Subscribers on the main thread see queued-connection delivery via
psygnal's Qt integration.
- faro/core/controller.py:
* Controller exposes a class-level runStarted = Signal(object). Fires
on every new run/continue so widgets can re-bind.
* run_experiment / continue_experiment spawn a worker thread, return
the handle, emit runStarted. Validation still happens synchronously
so a bad event list raises on the calling thread.
* _run_worker centralises pre-flight setup (writer init -- including
the potentially-slow zarr rmtree on overwrite -- and Analyzer
construction) and wraps the feed loop in try/except so worker-side
failures land in handle.fatal_error rather than crashing the user.
* _run_mda_with_events accepts the handle, checks handle.cancel_event
at each loop iteration and in the backpressure throttle, asks the
engine to cancel the in-flight event when set, and emits status
updates on each RTMEvent dequeue.
* _on_frame_ready (and ControllerSimulated._on_frame_ready) call a
shared _bump_status_for_frame helper that increments
n_frames_received and computes lag_ms vs event.min_start_time.
* Now off the main thread, all the prior Qt-pumping helpers
(_pump_qt_and_sleep, _qt_join, _wait_for_frame_pumping_qt) and the
superqt ensure_main_thread import are obsolete and removed. The
preview-layer machinery (viewer=, _on_preview_frame, _apply_preview,
PREVIEW_LAYER_NAME) is also removed -- napari-micromanager's own
_NapariMDAHandler already routes generator events into the preview
layer.
* finish_experiment now waits for the current handle before shutting
down the Analyzer.
* _pending_sentinels guarded by a Lock since extend_experiment now
runs on the calling thread while the feed loop runs on the worker.
- faro/widgets/experiment_status.py (new):
* ExperimentStatusWidget -- read-out of state, FOV, event index,
frame count, lag, elapsed time, error count. Has a Stop button
that calls handle.cancel(). Subscribes to controller.runStarted
so it automatically re-binds when a new run begins; cleans up the
previous handle's signal subscription on each rebind.
Verified end-to-end via a Qt smoke test:
- Live updates flow from the worker thread to the widget on the main
thread (psygnal+Qt queued delivery).
- Stop button triggers handle.cancel(); the worker's cancel-check
fires within one iteration and the run exits at the next event
boundary.
- Starting a new run re-binds the widget to the new handle and resets
the progress bar / counters.
The OmeZarrWriter init in _run_worker still pulled image height/width via self._mic.mmc.getImageHeight/Width -- a pymmcore-plus-specific call that breaks any non-pymmcore microscope. Use the AbstractMicroscope-level convention: subclasses populate self.image_height / self.image_width on the microscope instance (Moench already does this in init_scope). Fall back to mmc if the attributes aren't present but mmc is, so existing pymmcore-only microscopes keep working without code changes. Raise a clear error when neither path is available.
Three independent bugs surfaced when running the new async run_experiment + ExperimentStatusWidget against a napari viewer (reproduced with the optogenetic virtual_microscope backend): 1. pymmcore-plus's signals_backend() auto-selects the *qt* backend whenever a QApplication is loaded. core.mda.events.frameReady then becomes a QtCore.SignalInstance and cross-thread emits land in Qt.QueuedConnection, where they're delivered only when the main thread pumps events. With Controller.run_experiment now spawning a worker and RunHandle.wait() joining on it, the main thread is typically idle-blocked exactly when the engine is firing frames -- so the controller's _on_frame_ready never ran, the engine completed "successfully" with zero frames received, and the pipeline never saw any data. Force PYMM_SIGNALS_BACKEND=psygnal in faro/microscope/base.py so the data path stays direct/synchronous on the engine thread regardless of whether Qt is loaded. The widget-side path (RunHandle.statusChanged) still uses psygnal's own queued delivery -- see fix #2. 2. ExperimentStatusWidget connected handle.statusChanged with the default (direct) connection. Status updates emitted from the worker thread therefore ran the widget's _refresh slot synchronously off-main, calling QLabel.setText / QProgressBar.setValue from a non-GUI thread. Under napari that lands in vispy's OpenGL compositor and aborts with "Cannot make QOpenGLContext current in a different thread" -> SIGABRT (kernel hard-crash in VSCode Jupyter). Switch to connect(..., thread="main") so psygnal queues the call into its main-thread queue. 3. psygnal's queued callbacks live in QueuedCallback._GLOBAL_QUEUE, which nothing drains by default -- the widget would be invoked on the main thread, but only when something explicitly calls psygnal.emit_queued(). RunHandle's docstring claims auto-Qt delivery; that's not how psygnal actually works. Call psygnal.qt.start_emitting_from_queue() in the widget's __init__, which installs a main-thread QTimer that fires emit_queued() on every Qt event-loop tick. Idempotent and global, so multiple widgets / multiple runs are safe. Lockfile: bump pymmcore-widgets (8c8f76e -> 48ff414) so the unrelated upstream crash in pymmcore_widgets._presets_widget._on_property_changed when handed an empty device label (virtual_microscope's shutter) is included. Without that bump, the MDA engine itself aborts on the first setShutterOpen() once frames actually start flowing. Verified end-to-end against virtual_microscope's optogenetic backend: - headless async run: 5/5 frames (regression check, unchanged) - napari.Viewer() + handle.wait(): 5/5 frames (was 0/5) - napari + napari-micromanager + widget: 5/5 frames, no crash, exit 0 - widget visibly updates progress / frames / state mid-experiment (sampled QLabel.text() while pumping Qt events) - 87 unit tests still pass
Sibling of demo_sim_optogenetic.ipynb that exercises the new async run_experiment + RunHandle + ExperimentStatusWidget end-to-end against virtual_microscope's optogenetic backend, with a live napari viewer dock-attached. Walks through: handle = ctrl.run_experiment(...) is non-blocking, the kernel is free; poll handle.status() while it runs; subscribe to handle.statusChanged from the kernel side; cancel via the widget Stop button or handle.cancel(); handle.wait() blocks if you want the old synchronous semantics; continue_experiment() re-binds the widget automatically via runStarted. Phases are concatenated with combine(..., axis="t") per the new RTMSequence API.
Backend changes that make an async run inspectable and steerable --
the data the new ExperimentStatusWidget renders, plus two bug fixes
surfaced while building it.
run_status.py
- RunHandle.events: optional snapshot of the (sorted) RTMEvents the
handle is driving, so widgets can render per-event visualisations
(event strip, FOV map) that need the full plan up front.
- Pause/resume: RunState gains "pausing"/"paused"; RunHandle gains
pause()/resume()/is_paused() and a pause_event the feed loop polls.
cancel() now also clears the pause event so a cancel while paused
still releases the feed loop.
controller.py
- run_experiment / continue_experiment sort events once (by
min_start_time, then position) and stash the sorted list on the
handle, so the order the worker processes matches what the widget
displays.
- Feed loop honors pause_event: before pulling the next RTMEvent it
checks the flag, flips state to "paused", and idles until resume()
-- the MDA engine drains whatever is already queued, then waits.
- fix: the engine queue (self._queue) is recreated per run. The
finally-block feeds a STOP_EVENT sentinel to stop the engine; on a
*cancelled* run cancel_mda() aborts the engine, which may stop
without draining the queue, leaving stale events + the sentinel
behind. Reusing that queue made the next run's engine consume the
stale sentinel and exit after a few events ("stuck at 3/80"). A
fresh queue per run fixes it.
- fix: _bump_status_for_frame skips IMG_STIM frames. A stim emission
is the SLM-illuminated snap paired with its imaging frame; counting
it double-updated the status (lag/elapsed refreshing twice per stim
event) and made n_frames_received drift away from the RTMEvent
count. Imaging + ref frames are the meaningful data frames.
Verified end-to-end against the optogenetic virtual-microscope backend:
cancel mid-run then restart reaches steady state (no stall); pause
halts feeding after the backpressure window drains and resume continues
to completion; frame count tracks RTMEvents 1:1 for single-channel plans.
Rework the minimal status widget into a full run dashboard, driven by
the RunHandle data exposed in the previous commit.
Components (top to bottom):
- State chip -- RUNNING / PAUSED / DONE / ... as plain text in a
translucent-neutral rounded chip (no per-state fill: a colored
banner competed with the imaging/stim/ref legend colors).
- Legend chips -- imaging / stim / ref; the chip matching the current
event type is fully opaque, the others dimmed.
- EventStrip -- one cell per RTMEvent, color-coded by type. Past +
current cells opaque (progress fill), future cells dimmed. Same-type
runs are coalesced into single fills so thousands of events render
with correct alpha instead of over-stacking at sub-pixel widths.
Empty state draws a "(no events loaded)" placeholder.
- FovMap -- one dot per unique FOV position, equal-aspect (a straight
line of FOVs stays a line), grey visit-order path, active dot
recolored to the current event type. Pinned square via resizeEvent.
Paints its own rounded panel background; "FOV X/Y" counter in the
corner.
- Stats form -- event N/M, elapsed, scheduled, lag, remaining, errors.
Times formatted hh:mm:ss with the leading unit suffixed and dropped
when zero; lag turns red past 5 s. Wrapped in a shaded panel echoing
napari's layer-controls boxes.
- Pause/Resume + Stop buttons.
Threading / theming details:
- statusChanged is connected with thread="main" and the widget calls
psygnal.qt.start_emitting_from_queue() so worker-thread emits are
delivered on the GUI thread (drives QWidgets safely under napari).
- A 250 ms QTimer ticks the elapsed/remaining clocks between status
emissions so time fields don't freeze between frames.
- The strip cursor tracks n_frames_received (actual snaps), not
n_events_consumed (the feed loop runs 3-4 ahead via backpressure,
which made the strip jump several cells at run start).
- Colors/fonts derive from the Qt palette so the widget adapts to
napari's light/dark theme; corner radii match napari widgets.
Add a second stage position (20, 20, 0) to the baseline / stim / recovery sequences so the demo exercises a 2-FOV acquisition -- the ExperimentStatusWidget's FOV map then shows both positions and the visit-order path between them. Drop the frame interval 1.5s -> 1s.
d473b9b to
3c0e798
Compare
Collaborator
Author
|
@alandolt can you have a look if you see any general issues with this architecture change? still a few open TODOs before merging, but the main idea is there i think! but would be great to have your input before i start migrating the other notebooks etc. I think this will also be useful more long-term, running experiments on different microscopes simultaneously with BO for example, in combo with |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Move the MDA feed loop onto a worker thread, expose live status through a
RunHandle(psygnalSignal), and add a napari dock widget that mirrors + steers the current run. Replaces the synchronous-blockingrun_experiment/continue_experimentAPI.Why
The controller's feed loop ran on the main thread, so:
run_experimentblocked the calling cell — no interactive monitoring / cancellation without Ctrl-C (which sometimes left device state half-set).Moving the loop onto its own thread fixes all of these: napari is responsive by construction, the cell returns immediately, and cancellation / pause / live status become natural.
What changed
New:
faro/core/run_status.pyRunStatus— immutable snapshot dataclass:state,current_event_index,current_fov,n_events_total,n_events_consumed,n_frames_received,started_at/finished_at,lag_ms,background_errors,fatal_error, …RunHandle— owns the worker thread + cooperative cancel/pause events, carries the run's (sorted) event list. Methods:status(),wait(),cancel(),pause(),resume(),is_running(),is_paused(). Signal:statusChanged(psygnal) emitting the latestRunStatus.RunState:pending → running ⇄ pausing/paused → done/error(cancellingon cancel).faro/core/controller.pyController.runStarted = Signal(object)fires on each new run/continue carrying the freshRunHandle.run_experiment/continue_experimentspawn a worker thread and return the handle immediately; validation still runs synchronously on the caller. Events are sorted once and stashed on the handle so the widget renders them in execution order._run_workercentralises pre-flight setup and wraps the feed loop so failures land inhandle.fatal_errorinstead of crashing the user._run_mda_with_eventspollscancel_eventandpause_eventeach iteration — pause halts feeding after the in-flight backpressure window drains; resume continues.STOP_EVENTbehind; reusing the queue made the next run's engine consume that sentinel and stall after a few events ("stuck at 3/80")._bump_status_for_frameskipsIMG_STIMsnaps — a stim emission is the SLM-illuminated snap paired with its imaging frame; counting it double-updated lag/elapsed and drifted the frame count off the RTMEvent count._NapariMDAHandlerkeeps routing frames into thepreviewlayer throughout the run; the controller just stops continuous sequence acquisition once at MDA start to avoid a snap-buffer race. Notebooks can drop the old "break the CoreViewerLink before running" dance.New:
faro/widgets/experiment_status.pyExperimentStatusWidget— a napari dock panel that mirrors and controls the current run:runStarted.Async/Qt fixes folded in
PYMM_SIGNALS_BACKEND=psygnalforced infaro/microscope/base.py— with aQApplicationloaded, pymmcore-plus otherwise picks the Qt signal backend and queuesframeReadyto the main thread; if the main thread is blocked (handle.wait()), frames never reach the controller. Forcing psygnal keeps the data path direct/synchronous on the engine thread.statusChangedwiththread="main"+ drivespsygnal.qt.start_emitting_from_queue()so worker-thread emits reach QWidgets safely.uv.lock: bumpedpymmcore-widgetspast an upstream fix (_presets_widgetcrashing on an empty device label during MDA events).BREAKING: notebook updates required
Before
After — choose one:
(a) Blocking equivalent (smallest diff):
(b) Non-blocking, with status / cancel / pause:
Optional napari widget:
Demo notebook (test artifact — remove before merge)
experiments/02_demo_sim_optogenetic/demo_sim_optogenetic_napari_async.ipynbis included only to exercise this PR against the virtual-microscope optogenetic backend (async run, pause/resume, cancel/restart, the status widget, multi-FOV). It doubles as a worked example of what the migrated notebooks could look like. It should be deleted before this PR merges — the real deliverable is the API + widget, not this notebook.What to check / test before merging
experiments/*that callsrun_experiment/continue_experiment— migrate to.wait()or the non-blocking flow. Confirm none rely on the old blocking return.CoreViewerLinkbefore a run — that workaround is no longer needed; verify removing it and that the preview layer keeps updating during the run.tests/hardware/*— update for the newRunHandlereturn type; run on the Moench rig.n_frames_receivedoutpaces the RTMEvent count — verify the strip/stats still read sensibly or gate the assumption.continue_experiment+ the widget: confirm the strip/map rebuild correctly for the appended events and the FOV map merges positions.import farostays Qt-free;.wait()path works without aQApplication.virtual-microscopelockfile pin —uv lock --upgrade-package virtual-microscopeto pick up the fixes now on its default branch (JIT pre-warm;SimCameraDevicedigital ROI / MDA-teardown fix). Without this the demo notebook's first ~4 s of frames stall and the napari Snap preview freezes after a run. Commit theuv.lockchange separately (it is not async/widget code).Related (separate repo)
Two
virtual-microscopefixes were needed for the demo notebook and have already landed on its default branch (virtual-env):RealtimeEnginestarts; otherwise the first ~4 s of snaps stall behind a compile holding the sim lock, so frames arrive in a burst instead of paced.SimCameraDevicedigital ROI — implements real ROI cropping. It also fixes an MDA-teardown bug: the camera previously raisedNotImplementedErrorfromset_roi, which abortedMDARunner._finish_runbefore it emittedsequenceFinished; napari-micromanager then never cleared_mda_running, so the Snap preview silently stopped updating after a run.These are not part of this PR — faro just needs the lockfile bump above to pick them up.
Verification
Exercised end-to-end against the virtual-microscope optogenetic backend (napari + napari-micromanager + the widget):
Compatibility notes
import faro.widgets);import faro/import faro.corestay Qt-free.AbstractMicroscope.Screenshot