Replace printables table with unicode_data.rs tables#155527
Conversation
|
If you want to modify |
|
rustbot has assigned @Mark-Simulacrum. Use Why was this reviewer chosen?The reviewer was selected based on:
|
This comment has been minimized.
This comment has been minimized.
c2aeaba to
fd13759
Compare
This comment has been minimized.
This comment has been minimized.
ab09b17 to
a799ecf
Compare
Nominating for libs-api to FCP this. @Jules-Bertholet can you write up how that affects the public API of std? i.e., where is that unprintability used (only in Debug impls of str)? |
There was a problem hiding this comment.
Can you split out the re-ordering and renaming in char/methods.rs? It's very hard to review the diff for me when methods are moved around in the file. It also seems entirely unrelated to the core change here and I'd rather have separate commits at least.
The changes look broadly reasonable though, I'd be happy to accept them if separated out (including maybe from the libs-api facing change).
|
Reminder, once the PR becomes ready for a review, use |
a799ecf to
6079a98
Compare
|
@rustbot ready 6079a98 is the libs-API-relevant change. It affects the Note that we may also wish to stop escaping format control characters which are not default-ignorable. The list of characters this would affect: https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BCf%7D-%5Cp%7BDefault_Ignorable_Code_Point%7D |
|
We discussed this in today's @rust-lang/libs-api meeting; +1 for us, but we'd like someone with more unicode knowledge to weigh in to be safe so cc @Manishearth (offtopic, it would be nice to have an @rust-lang/unicode-knowers ping group since these issues arise pretty often) |
|
I didn't look too closely, but this seems fine. From a quick look the printability concern is for debug output, and yes, being more conservative there makes sense. |
|
While you shouldn't depend on ICU4X in the stdlib, it may be worth using ICU4X to get your unicode properties, instead of fetching them yourself. This does mean you are beholden to ICU4X for unicode updates though. |
|
@rfcbot merge libs-api |
|
Team member @Amanieu has proposed to merge this. The next step is review by the rest of the tagged team members: No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
Rollup of 25 pull requests Successful merges: - rust-lang/rust#157251 (`rust-analyzer` subtree update) - rust-lang/rust#157533 (Subtree sync for rustc_codegen_cranelift) - rust-lang/rust#154742 (Add APIs for case folding to the standard library) - rust-lang/rust#155144 (mir_build: Add an extra intermediate step in MIR building for patterns ) - rust-lang/rust#156222 (Stabilize `Result::map_or_default` and `Option::map_or_default`) - rust-lang/rust#157016 (add `extern "tail"` calling convention) - rust-lang/rust#157264 (diagnostics: Fix ICE building a trait ref in method suggestions) - rust-lang/rust#157386 (Parse deprecated note links separately in rustc_resolve) - rust-lang/rust#157483 (fix windows-gnu TLS leak) - rust-lang/rust#157488 (compiletest: inject `#![windows_subsystem = "windows"]` to debuginfo tests on Windows) - rust-lang/rust#157509 (remove solaris implementation for File::lock, it has the wrong semantics) - rust-lang/rust#157521 (Rename `SyncView::{as_pin => as_pin_ref}`) - rust-lang/rust#156136 (Move tests box) - rust-lang/rust#156573 (Add unwinder_private_data_size for wasm64 target) - rust-lang/rust#156783 (docs: make `Rc::into_raw` clickable in `Rc::increment_strong_count` doc) - rust-lang/rust#156840 (Stabilize `PathBuf::into_string`) - rust-lang/rust#156936 (Remove FIXME about impl PinCoerceUnsized for UnsafePinned<T>) - rust-lang/rust#157365 (Revert "LLVM 23: Run AssignGUIDPass in some places") - rust-lang/rust#157380 (clarify compiler_fence (and fence) docs) - rust-lang/rust#157471 (Debug assert that parsed attributes are in the `BUILTIN_ATTRIBUTE_MAP`) - rust-lang/rust#157485 (Rename `errors.rs` file to `diagnostics.rs` (1/N)) - rust-lang/rust#157494 (Convert `QueryRegionConstraint` into a struct) - rust-lang/rust#157526 (std tests: skip a slow test on Miri) - rust-lang/rust#157531 (ci: bump x86_64-gnu base image to 26.04) - rust-lang/rust#157556 (Add `BTree::append()` change to 1.96.0 relnotes) Failed merges: - rust-lang/rust#155527 (Replace printables table with `unicode_data.rs` tables)
And rename a struct field.
This gets rid of the `printable.py` script, ensuring that `unicode-table-generator` handles all our Unicode data table generation needs. I've elected to give each Unicode property its own table, instead of merging them all into one. This is slightly less efficient in terms of space, but should allow us to expose these tables in the future with public methods on `char`.
These characters may be hidden/invisible otherwise.
6079a98 to
4db9ff5
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
|
@rustbot ready |
|
@bors r+ |
…Mark-Simulacrum Replace printables table with `unicode_data.rs` tables This gets rid of the `printable.py` script, ensuring that `unicode-table-generator` handles all our Unicode data table generation needs. There are also some drive-by documentation improvements in `library/core/char/methods.rs`. There is one change in behavior: we now consider all characters with the `Default_Ignorable_Code_Point` property to be unprintable. These characters can be hidden/invisible otherwise. I've chosen to give each Unicode property its own table, instead of merging them all into one. This is slightly less efficient in terms of space, but should allow us to expose these tables in the future with public methods on `char`. @rustbot label A-Unicode
…Mark-Simulacrum Replace printables table with `unicode_data.rs` tables This gets rid of the `printable.py` script, ensuring that `unicode-table-generator` handles all our Unicode data table generation needs. There are also some drive-by documentation improvements in `library/core/char/methods.rs`. There is one change in behavior: we now consider all characters with the `Default_Ignorable_Code_Point` property to be unprintable. These characters can be hidden/invisible otherwise. I've chosen to give each Unicode property its own table, instead of merging them all into one. This is slightly less efficient in terms of space, but should allow us to expose these tables in the future with public methods on `char`. @rustbot label A-Unicode
…Mark-Simulacrum Replace printables table with `unicode_data.rs` tables This gets rid of the `printable.py` script, ensuring that `unicode-table-generator` handles all our Unicode data table generation needs. There are also some drive-by documentation improvements in `library/core/char/methods.rs`. There is one change in behavior: we now consider all characters with the `Default_Ignorable_Code_Point` property to be unprintable. These characters can be hidden/invisible otherwise. I've chosen to give each Unicode property its own table, instead of merging them all into one. This is slightly less efficient in terms of space, but should allow us to expose these tables in the future with public methods on `char`. @rustbot label A-Unicode
…Mark-Simulacrum Replace printables table with `unicode_data.rs` tables This gets rid of the `printable.py` script, ensuring that `unicode-table-generator` handles all our Unicode data table generation needs. There are also some drive-by documentation improvements in `library/core/char/methods.rs`. There is one change in behavior: we now consider all characters with the `Default_Ignorable_Code_Point` property to be unprintable. These characters can be hidden/invisible otherwise. I've chosen to give each Unicode property its own table, instead of merging them all into one. This is slightly less efficient in terms of space, but should allow us to expose these tables in the future with public methods on `char`. @rustbot label A-Unicode
Rollup of 31 pull requests Successful merges: - #141030 (Expand free alias types during variance computation) - #154853 (mgca: Register `ConstArgHasType` when normalizing projection consts) - #155527 (Replace printables table with `unicode_data.rs` tables) - #156629 (Stabilize `core::range::{legacy, RangeFull, RangeTo}`) - #157280 (traits: Allow escaping self types in ExistentialTraitRef::with_self_ty) - #157282 (Fix post-monomorphization error note race in the parallel frontend) - #157352 (Make the retained dep graph deterministic under the parallel frontend) - #157601 (Emit error for unused target expression in glob and list delegations) - #157611 (Update `browser-ui-test` version to `0.24.0`) - #157620 (Add a strategy FnMut to inject behavior into the flush cycle) - #157645 (Windows TLS - Only register the `atexit` hook when `cleanup` can be unloaded) - #157646 (Keep rename-imported main alive in dead-code analysis under `--test`) - #157647 (Start using comptime for reflection intrinsics and their wrapper functions) - #157719 (resolve: Partially revert "Remove a special case for dummy imports") - #155153 (Ensure Send/Sync is not implemented for std::env::Vars{,Os}) - #155198 (fix(mgca): Allow specifying generic args (of enum) on enum itself in unit & tuple variant constructions in (direct) const args) - #155323 (refactor `TypeRelativePath::AssocItem` to use `AliasTerm`) - #156497 (fix-155516: Don't suggest wrong unwrap expect) - #156583 (Support defaults for static EIIs) - #157013 (Ensure inferred let pattern types are well-formed) - #157196 (Only suggest reborrowing a moved value where the reborrow is valid) - #157230 (borrowck: avoid ICE describing fields on generic params) - #157288 (platform support: add SNaN erratum to MIPS targets) - #157330 (remove `IsTypeConst` from `hir::TraitItemKind`) - #157350 (compiletest: ignore SVG `y` offset in by-lines comparison) - #157355 (Add `or_try_*` variants for `HashMap` and `BTreeMap` Entry APIs) - #157577 (Fix parser error recovery treating 'dyn' as a strict keyword in Rust 2015 when used in `dyn + dyn`) - #157670 (Rename `errors.rs` file to `diagnostics.rs` (4/N)) - #157691 (Move symbol hiding code to a new file) - #157700 (Rename `errors.rs` file to `diagnostics.rs` (5/N)) - #157703 (Fix doc link to Instant sub in saturating caveat) Failed merges: - #157699 (Arg splat experiment - hir FnDecl impl)
Rollup merge of #155527 - Jules-Bertholet:riir-printable, r=Mark-Simulacrum Replace printables table with `unicode_data.rs` tables This gets rid of the `printable.py` script, ensuring that `unicode-table-generator` handles all our Unicode data table generation needs. There are also some drive-by documentation improvements in `library/core/char/methods.rs`. There is one change in behavior: we now consider all characters with the `Default_Ignorable_Code_Point` property to be unprintable. These characters can be hidden/invisible otherwise. I've chosen to give each Unicode property its own table, instead of merging them all into one. This is slightly less efficient in terms of space, but should allow us to expose these tables in the future with public methods on `char`. @rustbot label A-Unicode
Rollup of 31 pull requests Successful merges: - rust-lang#141030 (Expand free alias types during variance computation) - rust-lang#154853 (mgca: Register `ConstArgHasType` when normalizing projection consts) - rust-lang#155527 (Replace printables table with `unicode_data.rs` tables) - rust-lang#156629 (Stabilize `core::range::{legacy, RangeFull, RangeTo}`) - rust-lang#157280 (traits: Allow escaping self types in ExistentialTraitRef::with_self_ty) - rust-lang#157282 (Fix post-monomorphization error note race in the parallel frontend) - rust-lang#157352 (Make the retained dep graph deterministic under the parallel frontend) - rust-lang#157601 (Emit error for unused target expression in glob and list delegations) - rust-lang#157611 (Update `browser-ui-test` version to `0.24.0`) - rust-lang#157620 (Add a strategy FnMut to inject behavior into the flush cycle) - rust-lang#157645 (Windows TLS - Only register the `atexit` hook when `cleanup` can be unloaded) - rust-lang#157646 (Keep rename-imported main alive in dead-code analysis under `--test`) - rust-lang#157647 (Start using comptime for reflection intrinsics and their wrapper functions) - rust-lang#157719 (resolve: Partially revert "Remove a special case for dummy imports") - rust-lang#155153 (Ensure Send/Sync is not implemented for std::env::Vars{,Os}) - rust-lang#155198 (fix(mgca): Allow specifying generic args (of enum) on enum itself in unit & tuple variant constructions in (direct) const args) - rust-lang#155323 (refactor `TypeRelativePath::AssocItem` to use `AliasTerm`) - rust-lang#156497 (fix-155516: Don't suggest wrong unwrap expect) - rust-lang#156583 (Support defaults for static EIIs) - rust-lang#157013 (Ensure inferred let pattern types are well-formed) - rust-lang#157196 (Only suggest reborrowing a moved value where the reborrow is valid) - rust-lang#157230 (borrowck: avoid ICE describing fields on generic params) - rust-lang#157288 (platform support: add SNaN erratum to MIPS targets) - rust-lang#157330 (remove `IsTypeConst` from `hir::TraitItemKind`) - rust-lang#157350 (compiletest: ignore SVG `y` offset in by-lines comparison) - rust-lang#157355 (Add `or_try_*` variants for `HashMap` and `BTreeMap` Entry APIs) - rust-lang#157577 (Fix parser error recovery treating 'dyn' as a strict keyword in Rust 2015 when used in `dyn + dyn`) - rust-lang#157670 (Rename `errors.rs` file to `diagnostics.rs` (4/N)) - rust-lang#157691 (Move symbol hiding code to a new file) - rust-lang#157700 (Rename `errors.rs` file to `diagnostics.rs` (5/N)) - rust-lang#157703 (Fix doc link to Instant sub in saturating caveat) Failed merges: - rust-lang#157699 (Arg splat experiment - hir FnDecl impl)
Rollup of 31 pull requests Successful merges: - rust-lang#141030 (Expand free alias types during variance computation) - rust-lang#154853 (mgca: Register `ConstArgHasType` when normalizing projection consts) - rust-lang#155527 (Replace printables table with `unicode_data.rs` tables) - rust-lang#156629 (Stabilize `core::range::{legacy, RangeFull, RangeTo}`) - rust-lang#157280 (traits: Allow escaping self types in ExistentialTraitRef::with_self_ty) - rust-lang#157282 (Fix post-monomorphization error note race in the parallel frontend) - rust-lang#157352 (Make the retained dep graph deterministic under the parallel frontend) - rust-lang#157601 (Emit error for unused target expression in glob and list delegations) - rust-lang#157611 (Update `browser-ui-test` version to `0.24.0`) - rust-lang#157620 (Add a strategy FnMut to inject behavior into the flush cycle) - rust-lang#157645 (Windows TLS - Only register the `atexit` hook when `cleanup` can be unloaded) - rust-lang#157646 (Keep rename-imported main alive in dead-code analysis under `--test`) - rust-lang#157647 (Start using comptime for reflection intrinsics and their wrapper functions) - rust-lang#157719 (resolve: Partially revert "Remove a special case for dummy imports") - rust-lang#155153 (Ensure Send/Sync is not implemented for std::env::Vars{,Os}) - rust-lang#155198 (fix(mgca): Allow specifying generic args (of enum) on enum itself in unit & tuple variant constructions in (direct) const args) - rust-lang#155323 (refactor `TypeRelativePath::AssocItem` to use `AliasTerm`) - rust-lang#156497 (fix-155516: Don't suggest wrong unwrap expect) - rust-lang#156583 (Support defaults for static EIIs) - rust-lang#157013 (Ensure inferred let pattern types are well-formed) - rust-lang#157196 (Only suggest reborrowing a moved value where the reborrow is valid) - rust-lang#157230 (borrowck: avoid ICE describing fields on generic params) - rust-lang#157288 (platform support: add SNaN erratum to MIPS targets) - rust-lang#157330 (remove `IsTypeConst` from `hir::TraitItemKind`) - rust-lang#157350 (compiletest: ignore SVG `y` offset in by-lines comparison) - rust-lang#157355 (Add `or_try_*` variants for `HashMap` and `BTreeMap` Entry APIs) - rust-lang#157577 (Fix parser error recovery treating 'dyn' as a strict keyword in Rust 2015 when used in `dyn + dyn`) - rust-lang#157670 (Rename `errors.rs` file to `diagnostics.rs` (4/N)) - rust-lang#157691 (Move symbol hiding code to a new file) - rust-lang#157700 (Rename `errors.rs` file to `diagnostics.rs` (5/N)) - rust-lang#157703 (Fix doc link to Instant sub in saturating caveat) Failed merges: - rust-lang#157699 (Arg splat experiment - hir FnDecl impl)
|
@rust-timer build d97e782 |
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (d97e782): comparison URL. Overall result: ❌ regressions - no action neededBenchmarking means the PR may be perf-sensitive. Consider adding rollup=never if this change is not fit for rolling up. @rustbot label: -S-waiting-on-perf -perf-regression Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -1.9%, secondary -2.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 2.1%, secondary -4.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.2%, secondary 0.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 517.006s -> 516.074s (-0.18%) |
View all comments
This gets rid of the
printable.pyscript, ensuring thatunicode-table-generatorhandles all our Unicode data table generation needs.There are also some drive-by documentation improvements in
library/core/char/methods.rs.There is one change in behavior: we now consider all characters with the
Default_Ignorable_Code_Pointproperty to be unprintable. These characters can be hidden/invisible otherwise.I've chosen to give each Unicode property its own table, instead of merging them all into one. This is slightly less efficient in terms of space, but should allow us to expose these tables in the future with public methods on
char.@rustbot label A-Unicode