Skip to content

Always raise a received result-as-error in spawn tasks#288

Open
goodboy wants to merge 3 commits intomainfrom
raise_runinactor_error
Open

Always raise a received result-as-error in spawn tasks#288
goodboy wants to merge 3 commits intomainfrom
raise_runinactor_error

Conversation

@goodboy
Copy link
Owner

@goodboy goodboy commented Jan 3, 2022

Hopefully fixes #287.

This small change ensures that any actor spawn task that receives an error from its remote target will raise that error locally thus triggering a surrounding nursery error and possible teardown.

ping @didimelli

@goodboy
Copy link
Owner Author

goodboy commented Jan 4, 2022

Ahah, so the (main) issue with this solution is that we don't get all errors from all child actors to be accumulated as previous since some actors may get cancelled and their results (even when errors) abandoned.

I've been trying some other things but likely the way to solve the underlying issue and get as much error collecting as possible would be to do a 2nd pass of all child portals at nursery close once all subprocs are known to be terminated and then fill any later received errors into the errors: dict.

In an effort to support `.run_in_actor()` error raising by our nursery
we ideally collect as many child errors as possible during nursery
teardown and error collection/propagation.

Here we try a couple things,
- factor the per-actor error y retrieval into a new
  `pack_and_report_errors()`
- when a result retrieval via `exhaust_portal()` is cancelled pack the
  `trio.Cancelled` into the `errors: dict` expecting to rescan for
  errors for any such entries after process termination.
- at the end of the spawn task conduct a timed-out 2nd retrieval of any
  late delivered error from the child task for each entry in `errors`
  containing a cancelled.

This causes a bunch of cancellation tests to still fail seemingly due to
the race case where the OCA nursery may have requested cancellation of
children *before* they can remote-error and thus the `MultiError`
matching expectations aren't going to (always) be correct. Previously we
were always waiting for all `.run_in_actor()` results to arrive and
**not** raising any errors early (which in turn triggers local
cancellation).
@goodboy goodboy added bug Something isn't working supervision labels Jan 5, 2022
@goodboy
Copy link
Owner Author

goodboy commented Jan 5, 2022

Oh yeah ignore the mp runs there 😂
Code hasn't been changed for that backend!

f"{subactor.uid}")
nursery.cancel_scope.cancel()

# if errors:
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the alternative approach mentioned in #287: cancelling the ria_nursery instead of raising any final error retrieved from the portal.

I think this approach is a bit more, hard to extend wrt supervisor strats being overloaded by a user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working supervision

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Nursery does not re-raise error from .run_in_actor() remote task

1 participant