refactor: explicit charset validation functions for JSON and URL-encoded payloads#678
refactor: explicit charset validation functions for JSON and URL-encoded payloads#678Phillip9587 wants to merge 1 commit intoexpressjs:masterfrom
Conversation
bjohansebas
left a comment
There was a problem hiding this comment.
This is definitely a breaking change. For example, if I currently provide charset=utf-16be for JSONs, it would be decoded correctly and no error would be thrown, with your current PR this would be considered invalid, which makes me wonder whether the specification takes UTF-16BE or UTF-16LE encoding into account.
Additionally, how does this behave with iconv-lite? Given that iconv-lite correctly supports UTF-16BE and UTF-16LE, it should be able to decode the JSONs correctly.
My initial comments, i’ll continue reviewing these changes.
|
why wouldn't we just limit to UTF-8 at this point? |
|
@ctcpip I orgininally intented to keep this backward compatible but if we target a new major we should definitly limit it to |
|
it's already not backward-compatible, so my suggestion is to just adhere to the latest JSON RFC and only allow utf-8 |
bc9e2e9 to
18c8bc0
Compare
This PR improves the handling and validation of character encodings for the JSON and URL-encoded parsers, ensuring compliance with relevant standards and providing more robust, maintainable code.
The main change is extracting the previously inlined charset validation functions into utility functions. I also included tests for these functions.
isValidJsonCharsetutf-utf-8,utf-16orutf-32case-insensitive according to RFC 7159 Section 8.1 (Stricter, Maybe semver major)isValidUrlencodedCharsetutf-8andiso-8859-1Now checks case-insensitivethe charset is already lower-cased in thegetCharset()helper functionEdit 21.01.2025: I changed the PR so that the JSON Parser now is RFC 8259 compatible and ONLY allows
utf-8.