Relax panics in async/futures to traps/errors#12688
Relax panics in async/futures to traps/errors#12688alexcrichton wants to merge 3 commits intobytecodealliance:mainfrom
Conversation
This commit is an admittance that I don't believe we're going to get to a point where we are confident enough in the fuzzing of component-model-async such that we could confidently say we're exercising the vast majority of possible panics. Development of component-model-async has shown a steady trickle of panics over the course of the development of the feature, and this trend has been persistent over time as well. An attempt was made in bytecodealliance#12119 to add a fuzzer dedicated to async events but that didn't actually find anything in development and it has missed a number of panics present before and discovered after its introduction. Overall I do not know how to improve the fuzzer to the point that it would find pretty much all of the existing async-related panics over time. To help address this concern of the `concurrent.rs` implementation this commit goes through and replaces things like `unwrap()`, `assert!`, `panic!`, and `unreachable!` with an error-producing form. The benefit of this is that a bug in the implementation is less likely to result in a panic and instead just results in a non-spec-compliant trap. The downside of doing this though is that it can become unclear what errors are "first class traps", or expected to be guest reachable, and which are expected to be bugs in Wasmtime. To help address this I've performed a few refactorings here as well. * Some traps previously present as error strings are now promoted to using `Trap::Foo` instead. This has some refactoring of the Rust/C side as well to make it easier to define new variants. Tests were additionally added for any trap messages that weren't previously tested as being reachable. * A new `bail_bug!` macro was added (internally) for Wasmtime. This is coupled with a concrete `WasmtimeBug` error type (exported as `wasmtime::WasmtimeBug`). The intention is that `bail!` continues to be "here's a string and I'm a bit too lazy to make a concrete error" while `bail_bug!` indicates "this is a bug in wasmtime please report this if you see it". The rough vision is that if an error condition is reached, and the system is not broken in such a way that panicking is required, then `bail_bug!` can be used to indicate a bug in Wasmtime as opposed to panicking. This reduces the real-world impact of hitting these scenarios by downgrading a CVE-worthy `panic!` into a bug-worthy non-spec-compliant trap. Not all panics are able to be transitioned to this as some are load bearing from a safety perspective or similar (or indicate something equally broken), but the vast majority of cases are suitable for "return a trap, lock down the store, and let destructors take care of everything else". This change additionally has resulted in API changes for `FutureReader` and `StreamReader`. For example creation of these types now returns a `Result` for when the `ResourceTable` is full, for example, instead of panicking.
|
I'm asking for two reviews here: @fitzgen from a high level and the concept of The only remaining major area of panics that I know of are the various public-facing buffer types within streams/futures. There's a lot of |
Subscribe to Label Actioncc @fitzgen DetailsThis issue or pull request has been labeled: "fuzzing", "wasi", "wasmtime:api", "wasmtime:c-api"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
dicej
left a comment
There was a problem hiding this comment.
LGTM; thanks for doing this! Just a few questions/comments inline.
The only remaining major area of panics that I know of are the various public-facing buffer types within streams/futures. There's a lot of unreachable!() and unwrap() and such through those methods as assumptions are made about the state of streams/etc. None of the methods return Result since the type is supposed to be a static proof of everything in a particular state. I didn't know quite how to thread that needle myself, so @dicej if you've got thoughts on that it'd be appreciated. (and example here is DirectSource)
As a user of the API, I'd be a bit annoyed by a function which could only ever return Err(_) due to a bug in the implementation, so I'd be inclined to leave those alone (i.e. not add Result to the return types), and I can't think of a lightweight alternative (I assume catch_unwind is a non-starter). Maybe we could focus on fuzzing those APIs directly, e.g. without even involving Wasm, necessarily?
| body.drop(&mut store); | ||
| return Err(ErrorCode::InternalError(Some(format!("{err:#}")))); | ||
| body.drop(&mut store).map_err(HttpError::trap)?; | ||
| return Err(HttpError::trap(err)); |
There was a problem hiding this comment.
If I'm reading this correctly, this was previously an error the guest could recover from but now it's a trap. Is that intended?
| /// | ||
| /// Returns an error if the state `T` isn't present. | ||
| fn take(&self) -> Result<LockedStateGuard<'_, T>> { | ||
| let result = self.inner.lock().unwrap().take(); |
There was a problem hiding this comment.
Not related to your changes since the original code was already using Mutex::lock, but I'm wondering if we should use try_lock instead of lock here and in with given that there shouldn't be any actual contention, i.e. we could treat it like a thread-safe RefCell. Seems better to panic (or trap if preferred) if there is unexpected contention than to block silently and unexpectedly.
| let thread = *self.set_thread(CurrentThread::None).guest().unwrap(); | ||
| let thread = match self.set_thread(CurrentThread::None)?.guest() { | ||
| Some(t) => *t, | ||
| None => bail_bug!("expected task whene exiting"), |
There was a problem hiding this comment.
| None => bail_bug!("expected task whene exiting"), | |
| None => bail_bug!("expected task when exiting"), |
| string_encoding: StringEncoding::Utf8, | ||
| }, | ||
| if let Some(caller) = guest_caller { | ||
| assert_eq!(caller, instance.unwrap()); |
There was a problem hiding this comment.
What's the rationale for removing this assertion without replacing it with a trap?
This commit is an admittance that I don't believe we're going to get to a point where we are confident enough in the fuzzing of component-model-async such that we could confidently say we're exercising the vast majority of possible panics. Development of component-model-async has shown a steady trickle of panics over the course of the development of the feature, and this trend has been persistent over time as well.
An attempt was made in #12119 to add a fuzzer dedicated to async events but that didn't actually find anything in development and it has missed a number of panics present before and discovered after its introduction. Overall I do not know how to improve the fuzzer to the point that it would find pretty much all of the existing async-related panics over time.
To help address this concern of the
concurrent.rsimplementation this commit goes through and replaces things likeunwrap(),assert!,panic!, andunreachable!with an error-producing form. The benefit of this is that a bug in the implementation is less likely to result in a panic and instead just results in a non-spec-compliant trap. The downside of doing this though is that it can become unclear what errors are "first class traps", or expected to be guest reachable, and which are expected to be bugs in Wasmtime. To help address this I've performed a few refactorings here as well.Some traps previously present as error strings are now promoted to using
Trap::Fooinstead. This has some refactoring of the Rust/C side as well to make it easier to define new variants. Tests were additionally added for any trap messages that weren't previously tested as being reachable.A new
bail_bug!macro was added (internally) for Wasmtime. This is coupled with a concreteWasmtimeBugerror type (exported aswasmtime::WasmtimeBug). The intention is thatbail!continues to be "here's a string and I'm a bit too lazy to make a concrete error" whilebail_bug!indicates "this is a bug in wasmtime please report this if you see it".The rough vision is that if an error condition is reached, and the system is not broken in such a way that panicking is required, then
bail_bug!can be used to indicate a bug in Wasmtime as opposed to panicking. This reduces the real-world impact of hitting these scenarios by downgrading a CVE-worthypanic!into a bug-worthy non-spec-compliant trap. Not all panics are able to be transitioned to this as some are load bearing from a safety perspective or similar (or indicate something equally broken), but the vast majority of cases are suitable for "return a trap, lock down the store, and let destructors take care of everything else".This change additionally has resulted in API changes for
FutureReaderandStreamReader. For example creation of these types now returns aResultfor when theResourceTableis full, for example, instead of panicking.