Skip to content

Conversation

@DXist
Copy link
Contributor

@DXist DXist commented Mar 20, 2023

This PR adds length_utf16 validator.

My project exposes data from Salesforce via JsonSchema based API. I want to validate field lengths in the same way as Salesforce does - by counting UTF16 characters.

UTF16 is used for Unicode string representation in JavaScript, Java and Salesforce APEX. I think this validator could be useful to others as well. A good use case is to align backend and frontend length validators.

An example of mismatch between UTF16 and Unicode codepoints: '𝔠' symbol has 2 UTF16 characters but it's still 1 Unicode codepoint.

Should I wrap the implementation in optional feature length_utf16 ?

@DXist DXist mentioned this pull request Mar 20, 2023
@Keats
Copy link
Owner

Keats commented Mar 20, 2023

I don't think it makes sense to add that to the library, it's better added as a custom validator.

@LeoniePhiline
Copy link

@Keats The need for an UTF-16 code unit length validator is very common - assume all of web form handling -, since the maxlength of HTML form fields counts UTF-16 code units.

If the frontend counts UTF-16 code units, and the backend counts UTF-8 code units, then inconsistencies arise whenever values contain characters encoded with different length in UTF-16 vs UTF-8.

This results in values being rejected by the server which passed client side validation, whenever the server's UTF-8 representation longer than the browser's UTF-16 representation.

Validator::Regex(_) => "regex",
Validator::Range { .. } => "range",
Validator::Length { .. } => "length",
Validator::LengthUTF16 { .. } => "length_utf16",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Validator::LengthUTF16 { .. } => "length_utf16",
Validator::LengthUtf16 { .. } => "length_utf16",

For consistency with Rust code style, you might want to use Utf in identifiers.

E.g. https://doc.rust-lang.org/std/str/struct.EncodeUtf16.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll address code style suggestions if the approach with an extra builtin validator type is desired for the crate users.

We could gather more comments/thumbs up in the MR description for more feedback.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it makes more sense to go with a parameter approach to the length validator like mentioned in #250 otherwise we just duplicate things that are 99% the same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants