Fix TypeError when converting Rust encoding errors to Python #5668

staticintlucas · 2025-12-01T22:11:56Z

Fixes the implementation of PyErrArguments for string::FromUtf8Error and ffi::IntoStringError.

Removes the broken implementation of PyErrArguments for str::Utf8Error, string::FromUtf16Error, and char::DecodeUtf16Error.

Adds a convenience trait called IntoPyErrWithBytes for str::Utf8Error to allow easy conversion to PyErr. The implementation of From<str::Utf8Error> for PyErr was being used in a few places internally in PyO3, so this felt like the best solution. And unlike the UTF-16 errors, str::Utf8Error contains everything needed to construct a Python UnicodeDecodeError except the source bytes. I exposed this API publicly since it's probably a pretty common use case (at least compared to UTF-16), but let me know if you want any changes (especially regarding API naming).

Icxolu · 2025-12-02T17:33:30Z

src/err/mod.rs

+/// str::from_utf8(bytes).map_err(|e| e.into_with_bytes(bytes))
+/// # }
+/// ```
+pub trait IntoPyErrWithBytes {


Rather that adding a new trait I would prefer if we just made a constructor on PyUnicodeDecodeError which returns the PyErr. Internally it could make use of PyUnicodeDecodeError::new_utf8 to construct it.

I've moved the implementation to a new constructor for PyUnicodeDecodeError.

I don't want to call PyUnicodeDecodeError::new_utf8 since it is fallible and I don't want to have to worry about handling errors while already creating an error. I've kept the implementation mostly the same using a private Utf8ErrorWithBytes struct which implements PyErrArguments and calls PyUnicodeDecodeError::new_err under the hood.

Lemme know if this approach sounds reasonable to you, or you want any more changes

staticintlucas · 2025-12-02T22:41:35Z

Will add some tests for my changes ~~tomorrow~~ next week to make codecov happy

davidhewitt

Thanks, this is looking great, brilliant to fix a broken corner in our error handling!

newsfragments/5668.fixed.md

davidhewitt · 2025-12-12T11:20:56Z

src/err/impls.rs

+        let bytes = types::PyBytes::new(py, &bytes).into_any();
+        let start = types::PyInt::new(py, start).into_any();
+        let end = types::PyInt::new(py, end).into_any();
+        let reason = types::PyString::new(py, "invalid utf-8").into_any();


I guess it's hard to do better than this with the current API, we'd have to peek the invalid bytes and deduce the reason for failure?

(Probably not worth doing that, just checking.)

We can check for incomplete byte sequences (i.e. unexpected end of input) versus invalid bytes by checking if err.error_len() is None. But the Rust error type does not differentiate between the different kinds of invalid bytes (invalid start byte, invalid continuation, encoding out of range, etc...)

davidhewitt · 2025-12-12T11:25:05Z

src/exceptions.rs

+    pub fn new_err_from_utf8(bytes: &[u8], err: std::str::Utf8Error) -> crate::PyErr {
+        Utf8ErrorWithBytes {
+            err,
+            bytes: bytes.to_vec(),


Can we avoid this intermediate allocation? I guess not, because we don't have py: Python here.

... is it possibly better to take 'py: Python as an argument, call new_utf8 above to construct Python bytes immediately, and then use PyErr::from_value inside this function?

EDIT I just saw your comment about fallibility.

I think if new_utf8 returns an error, we could just bubble that up with return e. I guess that only happens in weird edge cases like keyboard interrupt, memory allocation error etc.

... or we just return PyResult<PyErr> here and let callers decide whether to flatten the other errors away, but the API does feel a bit awkward then.

Yeah I didn't really want to return a PyResult<PyErr>, in part because it would be an awkward API and in part to keep symmetry between new_err_from_utf8 and new_err.

Adding a py: Python argument would also break the symmetry with new_err, but if you think that's worth it to avoid the extra allocation then I can make that change

On reflection I think that having the return type just be PyErr is good enough, the failure mode is unlikely and users will probably always just flatten the outer error anyway because there's not an obvious alternative.

Taking py: Python I think is fine; the discussion we've had around other parts of the PyO3 error handling story is strongly leaning towards the conclusion that we will want to require py: Python to create a PyErr in the future (because it leads to more efficient implementations, as we find here).

davidhewitt · 2025-12-12T11:27:19Z

src/err/impls.rs

+impl std::convert::From<exceptions::Utf8ErrorWithBytes> for PyErr {
+    fn from(err: exceptions::Utf8ErrorWithBytes) -> PyErr {
+        exceptions::PyUnicodeDecodeError::new_err(err)
+    }
+}


This implementation may not be needed if the changes to new_err_from_utf8 to call from_utf8 as proposed in my other comment.

src/exceptions.rs

src/err/impls.rs

staticintlucas · 2025-12-12T22:20:32Z

I think the failing test should be due to Rust diagnostic messages changing in 1.92, not related to my change

davidhewitt

Sorry for the slow reply. I see you worked out that merging with main would address the CI issues 👍

davidhewitt · 2025-12-19T14:35:05Z

src/exceptions.rs

+    pub fn new_err_from_utf8(bytes: &[u8], err: std::str::Utf8Error) -> crate::PyErr {
+        Utf8ErrorWithBytes {
+            err,
+            bytes: bytes.to_vec(),


On reflection I think that having the return type just be PyErr is good enough, the failure mode is unlikely and users will probably always just flatten the outer error anyway because there's not an obvious alternative.

Taking py: Python I think is fine; the discussion we've had around other parts of the PyO3 error handling story is strongly leaning towards the conclusion that we will want to require py: Python to create a PyErr in the future (because it leads to more efficient implementations, as we find here).

staticintlucas force-pushed the main branch from 52b35e1 to ba5f15e Compare December 1, 2025 22:13

Icxolu reviewed Dec 2, 2025

View reviewed changes

staticintlucas force-pushed the main branch 3 times, most recently from feddff7 to a8b289c Compare December 2, 2025 21:28

Fix bug causing TypeError when converting Rust encoding errors to Python

febd0ca

staticintlucas force-pushed the main branch from a8b289c to febd0ca Compare December 2, 2025 21:41

Add test coverage for UnicodeDecodeError changes

3203dbd

staticintlucas closed this Dec 8, 2025

staticintlucas force-pushed the main branch from 10f675e to 3203dbd Compare December 8, 2025 21:31

staticintlucas reopened this Dec 8, 2025

Merge branch 'PyO3:main' into main

27e3f88

staticintlucas force-pushed the main branch from bcdfe1a to 27e3f88 Compare December 8, 2025 21:57

davidhewitt reviewed Dec 12, 2025

View reviewed changes

Fix some review comments

5251339

Merge branch 'PyO3:main' into main

8d7b7de

davidhewitt reviewed Dec 19, 2025

View reviewed changes

Fix TypeError when converting Rust encoding errors to Python #5668

Are you sure you want to change the base?

Fix TypeError when converting Rust encoding errors to Python #5668

Conversation

staticintlucas commented Dec 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

staticintlucas commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidhewitt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

staticintlucas commented Dec 12, 2025

Uh oh!

davidhewitt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

staticintlucas commented Dec 2, 2025 •

edited

Loading