-
Notifications
You must be signed in to change notification settings - Fork 916
Fix TypeError when converting Rust encoding errors to Python #5668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
src/err/mod.rs
Outdated
| /// str::from_utf8(bytes).map_err(|e| e.into_with_bytes(bytes)) | ||
| /// # } | ||
| /// ``` | ||
| pub trait IntoPyErrWithBytes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather that adding a new trait I would prefer if we just made a constructor on PyUnicodeDecodeError which returns the PyErr. Internally it could make use of PyUnicodeDecodeError::new_utf8 to construct it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved the implementation to a new constructor for PyUnicodeDecodeError.
I don't want to call PyUnicodeDecodeError::new_utf8 since it is fallible and I don't want to have to worry about handling errors while already creating an error. I've kept the implementation mostly the same using a private Utf8ErrorWithBytes struct which implements PyErrArguments and calls PyUnicodeDecodeError::new_err under the hood.
Lemme know if this approach sounds reasonable to you, or you want any more changes
feddff7 to
a8b289c
Compare
|
Will add some tests for my changes |
davidhewitt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is looking great, brilliant to fix a broken corner in our error handling!
| let bytes = types::PyBytes::new(py, &bytes).into_any(); | ||
| let start = types::PyInt::new(py, start).into_any(); | ||
| let end = types::PyInt::new(py, end).into_any(); | ||
| let reason = types::PyString::new(py, "invalid utf-8").into_any(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's hard to do better than this with the current API, we'd have to peek the invalid bytes and deduce the reason for failure?
(Probably not worth doing that, just checking.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can check for incomplete byte sequences (i.e. unexpected end of input) versus invalid bytes by checking if err.error_len() is None. But the Rust error type does not differentiate between the different kinds of invalid bytes (invalid start byte, invalid continuation, encoding out of range, etc...)
| pub fn new_err_from_utf8(bytes: &[u8], err: std::str::Utf8Error) -> crate::PyErr { | ||
| Utf8ErrorWithBytes { | ||
| err, | ||
| bytes: bytes.to_vec(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid this intermediate allocation? I guess not, because we don't have py: Python here.
... is it possibly better to take 'py: Python as an argument, call new_utf8 above to construct Python bytes immediately, and then use PyErr::from_value inside this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT I just saw your comment about fallibility.
I think if new_utf8 returns an error, we could just bubble that up with return e. I guess that only happens in weird edge cases like keyboard interrupt, memory allocation error etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... or we just return PyResult<PyErr> here and let callers decide whether to flatten the other errors away, but the API does feel a bit awkward then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I didn't really want to return a PyResult<PyErr>, in part because it would be an awkward API and in part to keep symmetry between new_err_from_utf8 and new_err.
Adding a py: Python argument would also break the symmetry with new_err, but if you think that's worth it to avoid the extra allocation then I can make that change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On reflection I think that having the return type just be PyErr is good enough, the failure mode is unlikely and users will probably always just flatten the outer error anyway because there's not an obvious alternative.
Taking py: Python I think is fine; the discussion we've had around other parts of the PyO3 error handling story is strongly leaning towards the conclusion that we will want to require py: Python to create a PyErr in the future (because it leads to more efficient implementations, as we find here).
src/err/impls.rs
Outdated
| impl std::convert::From<exceptions::Utf8ErrorWithBytes> for PyErr { | ||
| fn from(err: exceptions::Utf8ErrorWithBytes) -> PyErr { | ||
| exceptions::PyUnicodeDecodeError::new_err(err) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation may not be needed if the changes to new_err_from_utf8 to call from_utf8 as proposed in my other comment.
|
I think the failing test should be due to Rust diagnostic messages changing in 1.92, not related to my change |
davidhewitt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the slow reply. I see you worked out that merging with main would address the CI issues 👍
| pub fn new_err_from_utf8(bytes: &[u8], err: std::str::Utf8Error) -> crate::PyErr { | ||
| Utf8ErrorWithBytes { | ||
| err, | ||
| bytes: bytes.to_vec(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On reflection I think that having the return type just be PyErr is good enough, the failure mode is unlikely and users will probably always just flatten the outer error anyway because there's not an obvious alternative.
Taking py: Python I think is fine; the discussion we've had around other parts of the PyO3 error handling story is strongly leaning towards the conclusion that we will want to require py: Python to create a PyErr in the future (because it leads to more efficient implementations, as we find here).
Fixes #5651.
Fixes the implementation of
PyErrArgumentsforstring::FromUtf8Errorandffi::IntoStringError.Removes the broken implementation of
PyErrArgumentsforstr::Utf8Error,string::FromUtf16Error, andchar::DecodeUtf16Error.Adds a convenience trait called
IntoPyErrWithBytesforstr::Utf8Errorto allow easy conversion toPyErr. The implementation ofFrom<str::Utf8Error>forPyErrwas being used in a few places internally in PyO3, so this felt like the best solution. And unlike the UTF-16 errors,str::Utf8Errorcontains everything needed to construct a PythonUnicodeDecodeErrorexcept the source bytes. I exposed this API publicly since it's probably a pretty common use case (at least compared to UTF-16), but let me know if you want any changes (especially regarding API naming).