feat: Implement SORT() directive for linker scripts#1994
Conversation
- Introduces Harvester engine in layout.rs to collect and sort sections lexicographically. - Implements physical section reordering in elf_writer.rs with O(1) skip logic for optimized performance. - Adds comprehensive integration test and 50k-section stress test verification. - Verified physical byte layout is contiguous and alphabetical via objdump. Benchmark (Warm cache): - GNU ld: 0.221s - wild: 0.093s Fixes wild-linker#1661
davidlattimore
left a comment
There was a problem hiding this comment.
I've only partially reviewed, but I just noticed that this is a new PR. Was there a reason to open a new PR rather than continuing with 1857? Also, I just noticed that some of the comments I made on that PR haven't yet been addressed in this PR, so it's probably worthwhile looking back at the comments on that PR
| let _span = debug_span!("write_file", filename = %object.input).entered(); | ||
| let _file_span = layout.args().common().trace_span_for_file(object.file_id); | ||
|
|
||
| // Fast O(1) lookup to skip harvested sections |
There was a problem hiding this comment.
Anything that involves a loop is unlikely to be O(1).
| let _file_span = layout.args().common().trace_span_for_file(object.file_id); | ||
|
|
||
| // Fast O(1) lookup to skip harvested sections | ||
| let epilogue = layout |
There was a problem hiding this comment.
I feel like there's an easier way to get the epilogue. Pretty sure we store the FileId of the epilogue somewhere
| // The actual build-id will be filled in later once all writing has completed. It's important | ||
| // that we fill it with zeros now however, since if we're overwriting an existing file, there | ||
| // might be other data there and we don't zero it, then the build ID will be hashing that data. | ||
|
|
There was a problem hiding this comment.
This extra blank line seems unnecessary and separates the above comment from the code it's referring to
| harvested_sections_registry.push(sec.clone()); | ||
| } | ||
|
|
||
| // Crucial: We MUST sort by the IDs here, otherwise the Binary Search in Step 3 will fail. |
There was a problem hiding this comment.
To what extent was this PR written by an LLM and to what extent do you understand the code that was produced? I ask here because referring to "step 3" seems like the kind of thing an LLM would do. There doesn't seem to be numbered "steps" defined anywhere.
There was a problem hiding this comment.
yeh that was during the merge conflict i coudn't figure it out, so i pasted my file and the file from the wild repo and asked it to make a new merged file, it asked for some context as to what I was doing/ changing to the code and added these comments I should have paid more attention in removing them
There was a problem hiding this comment.
if you go over our conversation on zulip, you should get the idea if I understand what Im doing, respectfully
There was a problem hiding this comment.
Thanks for the explanation. Hope I didn't offend, it's just that I know this PR is a difficult undertaking for someone without previous experience in the codebase and I worry when I see signs of LLM use in those sorts of circumstances. That makes sense that you used it for fixing merge conflicts and that it introduced changes that were perhaps unintended.
There was a problem hiding this comment.
wasn't offended at all, it didn't even merge those files properly I had to restart and fix those myself line by line, while i asked it to merge the file I also asked it to review my work and that is where the comments are from, I removed them as i was fixing the merge conflicts but ig this one got left
| pub(crate) size: u64, | ||
| pub(crate) alignment: Alignment, | ||
| pub(crate) mem_offset: u64, | ||
| pub(crate) _name: &'data [u8], |
|
|
||
| __attribute__((used, section(".text.func_a"))) int func_a() { return 1; } | ||
|
|
||
| __attribute__((used, section(".text.func_b"))) int func_b() { return 2; } |
There was a problem hiding this comment.
This test doesn't seem to actually be testing that the sections are sorted. If I disable sorting, e.g. by always setting SectionPattern.sorted to false, the test still passes. It'd also be good to test that sorted sections from multiple input objects get correctly sorted together.
|
|
||
| Ok(SectionPattern { name, sorted: true }) | ||
| } else { | ||
| // Original behavior: bare pattern |
There was a problem hiding this comment.
Documenting "original behaviour" makes sense from the perspective of reading a PR, but doesn't make sense for someone reading the code later, after the PR is merged. Comments should always be written with the future reader in mind. This comment is probably unnecessary.
| fn parse_pattern<'input>(input: &mut &'input BStr) -> winnow::Result<SectionPattern<'input>> { | ||
| // Handling SORT(...) wrapper: SORT is an alias for SORT_BY_NAME in GNU ld. | ||
| if input.starts_with(b"SORT") { | ||
| // Consume "SORT" |
There was a problem hiding this comment.
Comments like this don't really add anything. It's pretty much just repeating what the next line says.
| Ok(pattern) | ||
| fn parse_pattern<'input>(input: &mut &'input BStr) -> winnow::Result<SectionPattern<'input>> { | ||
| // Handling SORT(...) wrapper: SORT is an alias for SORT_BY_NAME in GNU ld. | ||
| if input.starts_with(b"SORT") { |
There was a problem hiding this comment.
I'm pretty sure there is a helper that checks starts_with and then consumes
| *(.text) | ||
|
|
||
| /* Protect our harvested sections from the GC! */ | ||
| KEEP(*(SORT(.text.*))) |
There was a problem hiding this comment.
It'd be good to test both with and without KEEP and make sure that GC works as expected for sorted sections that aren't referenced.
yeh that just skipped my mind, i will close this one and go back to the previous one |
Transitioned from hashmaps to flat vectors (Vec) for harvested sections. This ensures continuous memory layout and significantly improves L1 cache locality during the final layout binary search.
Fixes #1661