Skip to content

Conversation

@anthonyryan1
Copy link
Contributor

Right now, we've got a lot of code attempting to correct every malformed UTF8 string into something usable. This pull request is more of an alternate approach where we don't try and coax every string into something valid, but instead replace invalid sequences with a broken character encoding character.

Also ensures that on JSON encode failures we don't fail silently and return a HTTP 200.

One of two possible fixes for #2977

The other approcah is collecting every broken UTF8 name and adding unit tests for the UTF8 repair class. I have saved some strings that I've observed causing these problems in the wild, and can share them with someone motivated to write tests and fix the UTF8 fixer instead.

Right now, we've got a lot of code attempting to correct every
malformed UTF8 string into something usable. This pull request is
more of an alternate approach where we don't try and coax every
string into something valid, but instead replace invalid sequences
with a broken character encoding character.

Also ensures that on JSON encode failures we don't fail silently
and return a HTTP 200.

One of two possible fixes for Novik#2977

The other approcah is collecting every broken UTF8 name and adding
unit tests for the UTF8 repair class. I have saved some strings that
I've observed causing these problems in the wild, and can share them
with someone motivated to write tests and fix the UTF8 fixer instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant