Skip to content

Implement Debug for C-like enums with a concatenated string#155452

Open
makai410 wants to merge 3 commits into
rust-lang:mainfrom
makai410:enum-debug-array
Open

Implement Debug for C-like enums with a concatenated string#155452
makai410 wants to merge 3 commits into
rust-lang:mainfrom
makai410:enum-debug-array

Conversation

@makai410

@makai410 makai410 commented Apr 17, 2026

Copy link
Copy Markdown
Member

View all comments

Fixes: #114106 #133945

Related to: #88793

Continuation of: #109615 #114190

r? @ghost (I want to see the perf first)

@rustbot

rustbot commented Apr 17, 2026

Copy link
Copy Markdown
Collaborator

Changes to the code generated for builtin derived traits.

cc @nnethercote

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 17, 2026
@makai410

Copy link
Copy Markdown
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors Bot pushed a commit that referenced this pull request Apr 17, 2026
Implement `Debug` for C-like enums with a concatenated string
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 17, 2026
@rust-bors

rust-bors Bot commented Apr 18, 2026

Copy link
Copy Markdown
Contributor

☀️ Try build successful (CI)
Build commit: 5a131e3 (5a131e34e34187b36455f6f6809f9cd14e7ab55f, parent: e9e32aca5a4ffd08cbc29547b039d64b92a2c03b)

@rust-timer

This comment has been minimized.

@rust-timer

Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (5a131e3): comparison URL.

Overall result: ❌✅ regressions and improvements - please read:

Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf.

Next, please: If you can, justify the regressions found in this try perf run in writing along with @rustbot label: +perf-regression-triaged. If not, fix the regressions and do another perf run. Neutral or positive results will clear the label automatically.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.4% [0.2%, 2.4%] 50
Regressions ❌
(secondary)
0.9% [0.2%, 1.6%] 9
Improvements ✅
(primary)
-0.4% [-0.4%, -0.4%] 1
Improvements ✅
(secondary)
-0.0% [-0.0%, -0.0%] 2
All ❌✅ (primary) 0.4% [-0.4%, 2.4%] 51

Max RSS (memory usage)

Results (primary -3.6%, secondary -0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.6% [1.2%, 6.0%] 5
Improvements ✅
(primary)
-3.6% [-5.4%, -2.5%] 4
Improvements ✅
(secondary)
-4.4% [-6.3%, -1.6%] 3
All ❌✅ (primary) -3.6% [-5.4%, -2.5%] 4

Cycles

Results (primary 2.8%, secondary 2.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
2.8% [2.8%, 2.8%] 1
Regressions ❌
(secondary)
2.1% [2.1%, 2.1%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.8% [2.8%, 2.8%] 1

Binary size

Results (primary 0.4%, secondary 0.7%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.5% [0.0%, 1.4%] 26
Regressions ❌
(secondary)
0.7% [0.1%, 1.1%] 8
Improvements ✅
(primary)
-0.7% [-1.1%, -0.3%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.4% [-1.1%, 1.4%] 28

Bootstrap: 492.421s -> 491.503s (-0.19%)
Artifact size: 394.25 MiB -> 394.27 MiB (0.01%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Apr 18, 2026
Comment thread compiler/rustc_builtin_macros/src/deriving/debug.rs Outdated
@makai410

Copy link
Copy Markdown
Member Author

There was a leftover condition from an earlier PR that I pulled in without a proper review, such that it actually didn't apply the optimization to large enums <_<;

I also want to try skipping bounds checks. I suppose it could improve runtime performance, at least on my machine, based on some pretty rough benchmarks with an enum of 10,000 variants.

@makai410

Copy link
Copy Markdown
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors Bot pushed a commit that referenced this pull request Apr 18, 2026
Implement `Debug` for C-like enums with a concatenated string
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 18, 2026
@nnethercote nnethercote self-assigned this Apr 18, 2026
@nnethercote

Copy link
Copy Markdown
Contributor

I'm happy to review this once it's ready.

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 18, 2026
Comment on lines +284 to +288
let variant_names = def
.variants
.iter()
.map(|v| v.disr_expr.is_none().then_some(v.ident.name.as_str()))
.collect::<Option<ThinVec<_>>>()?;

@makai410 makai410 Apr 18, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I think I missed a valid case, considering:

enum Uwu {
    QwQ = 0,
    AwA = 1,
}

which has explicit discriminants but is actually dense.

View changes since the review

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we can use an offset when some variants have negative discriminants while the overall variants remain dense.

I'll do a follow-up PR to implement these.

@rust-bors

rust-bors Bot commented Apr 18, 2026

Copy link
Copy Markdown
Contributor

☀️ Try build successful (CI)
Build commit: 7348f26 (7348f264e27b4bbcab21ed426532865b98fc5f55, parent: 8da2d28cbd5a4e2b93e028e709afe09541671663)

@rust-timer

This comment has been minimized.

@rust-timer

Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (7348f26): comparison URL.

Overall result: ❌✅ regressions and improvements - please read:

Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf.

Next, please: If you can, justify the regressions found in this try perf run in writing along with @rustbot label: +perf-regression-triaged. If not, fix the regressions and do another perf run. Neutral or positive results will clear the label automatically.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.5% [0.2%, 7.1%] 50
Regressions ❌
(secondary)
1.2% [0.2%, 4.1%] 12
Improvements ✅
(primary)
-0.6% [-0.6%, -0.6%] 2
Improvements ✅
(secondary)
-0.2% [-0.5%, -0.0%] 13
All ❌✅ (primary) 0.5% [-0.6%, 7.1%] 52

Max RSS (memory usage)

Results (primary 0.7%, secondary 0.6%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
3.2% [1.8%, 6.9%] 5
Regressions ❌
(secondary)
2.4% [0.8%, 6.1%] 10
Improvements ✅
(primary)
-5.6% [-7.2%, -3.9%] 2
Improvements ✅
(secondary)
-5.6% [-6.3%, -4.8%] 3
All ❌✅ (primary) 0.7% [-7.2%, 6.9%] 7

Cycles

Results (primary 4.4%, secondary 3.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
4.4% [4.4%, 4.4%] 1
Regressions ❌
(secondary)
3.2% [1.8%, 5.1%] 3
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 4.4% [4.4%, 4.4%] 1

Binary size

Results (primary 0.4%, secondary 1.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.6% [0.1%, 1.5%] 20
Regressions ❌
(secondary)
1.0% [0.1%, 2.1%] 20
Improvements ✅
(primary)
-0.3% [-0.5%, -0.0%] 8
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.4% [-0.5%, 1.5%] 28

Bootstrap: 493.313s -> 504.472s (2.26%)
Artifact size: 394.36 MiB -> 394.34 MiB (-0.01%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 18, 2026
@theemathas

Copy link
Copy Markdown
Contributor

This will probably run into #148423. Furthermore, I believe this is likely to cause unsoundness in actual programs, due to the enumflags2 crate which I think has a macro that modifies the discriminant values of enums.

@makai410

Copy link
Copy Markdown
Member Author

Given that it's a lang issue and we basically have no way to work around in this PR...

@rustbot blocked #148423

@rustbot rustbot added S-blocked Status: Blocked on something else such as an RFC or other implementation work. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 19, 2026
@makai410

Copy link
Copy Markdown
Member Author

We still have compile time regression, I suppose that for small enums it might not be worth doing this kind of optimization, I don't think the runtime performance would vary a lot but the compile time has already increased a lot.

@rust-bors

This comment has been minimized.

@panstromek

panstromek commented May 28, 2026

Copy link
Copy Markdown
Contributor

@makai410 The piece of code I was looking for when we were discussing this at all hands is here: https://github.com/makai410/rust/blob/b16703bbee4b9ec6e790bbb88115c84be293635b/tests/ui/deriving/deriving-all-codegen.stdout#L445, the optimization was introduced in #98190

The derive generates array of names, and array of dyn Debug references to fields and passes that to internal non-generic debug_struct_fields_finish function. There are also specialized functions for smaller structs.

I think what you probably want to do here is very similar - generate just the arrays that are unique to each struct and pass them to some internal function to avoid duplicating a lot of similar code in multiple derives.

Also, with this approach it's maybe not even worth it to concatenate the string at all, and just use list of names and discriminant as an index, but we'd have to measure that.

@makai410

makai410 commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

Also, with this approach it's maybe not even worth it to concatenate the string at all, and just use list of names and discriminant as an index, but we'd have to measure that.

Hi @panstromek, sorry for the delay. The reason for doing concatenated string is that there was a #109615 which tried a names array but it turned out to be not much of an improvement, and the idea of concatenated string was inspired by #133945 (comment), which suggested that the concatenated string could reduce the number of relocations, so I still want to try the concatenated string here.

As for the threshold number of variants, I was planning to write a benchmark tool to measure it. The idea is basically to use several well-known rust projects as compilation targets, add a custom compiler flag (-Zbike-shedding-threshold=100), sweep the threshold from 100 to 200 in steps of 20, measure the runtime improvement and compile-time overhead, and compute their ratio. A higher ratio means we trade less compile-time overhead for more runtime improvement. We can then plot the threshold on the X axis and the ratio on the Y axis, and then the threshold corresponding to the highest ratio is likely what we want, but worth noting here we still have to look at the actual number of the compile-time overhead at the highest ratio point, which might be a large number(?), to check if the number is acceptable. This is just a rough idea that can probably miss something, but well I think the overall direction is correct?

EDIT:
The problem might not be that complicated. I suspect there wouldn't be much difference between 100 and 200, since I don't think there are many enums with that many variants in this range. I suspect even 50 variants is rare in practice... So maybe I should just pick several favorite numbers and run several rounds of perf to see the results :)

@panstromek

Copy link
Copy Markdown
Contributor

I see, I didn't know about #109615, the concatenated string approach makes sense then.

So maybe I should just pick several favorite numbers and run several rounds of perf to see the results

Yea, I'd probably do something lazy like that :D. You can kinda bisect to it. But note that I'm not sure how many C-like enums with derive(Debug) we have in our benchmarks suite - especially big ones - so we might not be able to see the effect without adding a new one (or at least confirming it locally).


btw. Maybe it's also worth trying to pad all variants to the same length and indexing the resulting string by max_length * discriminant, to avoid generating second array. This way the derive would be bit more minimal (at the cost of some wasted space), something like this:

enum Fieldless0 { A, BB, CCC }
//...
const __NAMES: &str = "A  BB CCC";
debug_c_like_enum(
    __names,
   discriminant_value(self),
   3 /* max variant name len */
)

@makai410

makai410 commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

Sounds good, let's see the perf of the offset one first.

@makai410 makai410 force-pushed the enum-debug-array branch from b16703b to d9870e7 Compare June 9, 2026 15:45
@rustbot

rustbot commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Changes to the code generated for builtin derived traits.

cc @nnethercote

@rustbot

rustbot commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@ZuseZ4

ZuseZ4 commented Jun 10, 2026

Copy link
Copy Markdown
Member

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 10, 2026
@rust-bors

This comment has been minimized.

rust-bors Bot pushed a commit that referenced this pull request Jun 10, 2026
Implement `Debug` for C-like enums with a concatenated string
@rust-bors

rust-bors Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

☀️ Try build successful (CI)
Build commit: 5b7107a (5b7107a5c175fa4886818ef2b0c5ae6a5d9ade7b, parent: d56483a91d6cf5041351a3208b8d08f98f0c8b56)

@rust-timer

This comment has been minimized.

@rust-timer

Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (5b7107a): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.4% [-0.7%, -0.2%] 9
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results (primary 2.2%, secondary 0.6%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
5.0% [3.0%, 7.1%] 2
Regressions ❌
(secondary)
1.4% [1.0%, 2.2%] 3
Improvements ✅
(primary)
-3.4% [-3.4%, -3.4%] 1
Improvements ✅
(secondary)
-2.0% [-2.0%, -2.0%] 1
All ❌✅ (primary) 2.2% [-3.4%, 7.1%] 3

Cycles

This perf run didn't have relevant results for this metric.

Binary size

Results (primary -0.1%, secondary 1.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.3% [0.3%, 1.8%] 12
Improvements ✅
(primary)
-0.1% [-0.2%, -0.0%] 11
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.1% [-0.2%, -0.0%] 11

Bootstrap: 525.47s -> 516.414s (-1.72%)
Artifact size: 400.80 MiB -> 401.32 MiB (0.13%)

@rustbot rustbot removed perf-regression Performance regression. S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jun 10, 2026
@panstromek

panstromek commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

This looks legit. wg-grammar has those two big C-like enums with derive debug.

I'm suspicious of the numbers in bootstrap, that might be a noise spike or return from noise spike (we see these on wall-time numbers sometimes).

@nnethercote

Copy link
Copy Markdown
Contributor

Given that it's a lang issue and we basically have no way to work around in this PR...

@rustbot blocked #148423

@makai410 Has the blocking problem been resolved?

@makai410

Copy link
Copy Markdown
Member Author

Given that it's a lang issue and we basically have no way to work around in this PR...
@rustbot blocked #148423

@makai410 Has the blocking problem been resolved?

I added bounds check back in the new commits, so in the worst scenario like #148423, it could panic or output an incorrect result but won't trigger UB, at least. So it depends on whether that's acceptable.

@makai410

Copy link
Copy Markdown
Member Author

I think the perf result looks ideal, not affecting most common cases and being able to improve cases when it actually matters.

One more optimization we can do here is to use u16 or even u8 instead of usize to store the offsets when possible, which I'd like to do as a follow-up PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-blocked Status: Blocked on something else such as an RFC or other implementation work. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Poor codegen for Debug impls of C-like enums with many variants

7 participants