fix(resolve): detect unused patches when multiple patches share a PackageId#16987
fix(resolve): detect unused patches when multiple patches share a PackageId#16987raushan728 wants to merge 2 commits into
Conversation
|
r? @weihanglo rustbot has assigned @weihanglo. Use Why was this reviewer chosen?The reviewer was selected based on:
|
| .entry(summary.package_id()) | ||
| .or_insert_with(HashSet::new) | ||
| .insert(url); | ||
| let mut used_patches = HashSet::new(); |
There was a problem hiding this comment.
It seems we literally build the same set twice: once in register_used_patches and once here.
There was a problem hiding this comment.
One solution could be to maintain an unused_patches_with_url that also includes the URL. But we need to consider this issue first to evaluate the real impact of this change.
There was a problem hiding this comment.
I intentionally built the set twice locally to strictly avoid altering the Resolve struct, because modifying or replacing unused_patches could affect how [patch.unused] is encoded in cargo.lock via encode.rs.
If we add unused_patches_with_url to Resolve, should we keep it as a purely in-memory field (and skip serializing it), or are you open to modifying the lockfile structure to store the URLs as well?
| for (url, summaries) in registry.patches().iter() { | ||
| for summary in summaries.iter() { | ||
| let unused = summary.package_id(); | ||
| if !used_patches.contains(&(url.clone(), unused)) { | ||
| // Show alternative source URLs if the source URLs being patched | ||
| // cannot be found in the crate graph. | ||
| match source_ids_grouped_by_pkg_name.get(&unused.name()) { | ||
| Some(ids) if ids.iter().all(|id| id.canonical_url() != url) => { | ||
| let mut ids: Vec<_> = ids.into_iter().collect(); | ||
| ids.sort_by_key(|id| id.url()); // SourceId implements Ord but url() is safe | ||
| let mut help = "perhaps you meant one of the following:".to_owned(); | ||
| for id in ids { | ||
| help.push_str("\n\t"); | ||
| help.push_str(&id.display_registry_name()); | ||
| } | ||
| ws.gctx().shell().print_report( | ||
| &[Level::WARNING | ||
| .secondary_title(format!("patch `{unused}` {MESSAGE}")) | ||
| .element(Level::HELP.message(help))], | ||
| false, | ||
| )?; | ||
| } | ||
| _ => unemitted_unused_patches.push(unused), |
There was a problem hiding this comment.
This change means X is in unused_patches if and only if there is at least one URL where X is declared as a patch but is not used there. Before the change, PackageId X was in unused_patches if and only if X was never used under any URL.
Will this cause any issues for consumers of unused_patches?
There was a problem hiding this comment.
From my exploration, the main consumers of unused_patches() are the warning emitter itself and encode.rs (which writes the [[patch.unused]] table in cargo.lock).
With this change, if a PackageId is used from one URL but unused from another, it will indeed be recorded in the lockfile's unused section. Since the lockfile only stores the PackageId, it loses the source context, which is technically inaccurate but doesn't break the parser.
To fully fix this ambiguity without confusion, we might actually need to make cargo.lock source aware for unused patches. How would you prefer to handle this?
There was a problem hiding this comment.
To fully fix this ambiguity without confusion, we might actually need to make
cargo.locksource aware for unused patches. How would you prefer to handle this?
I think once we decide to include it in the cargo.lock, this may require bumping the lockfile. Because if one user upgrades their toolchain, we will have lockfile changes, and they will need to submit that change. So this should be a lockfile version bump required change. And we need to use this mechanism to maintain compatibility.
This issue is far more complex than I thought. It is not technically challenging, but it requires a lot of design work. That is also why we marked it as needing design.
So I personally prefer to rework the design to make sure we are on the same page with everyone else about how to deliver this improvement. We should decide whether we need to update the lockfile first, because the implementation would be very different based on these decisions.
But I also saw @weihanglo's comment from #12464 (comment). I'm not sure that simply adding some warnings would be enough to move some parts forward first. I would love to hear your thoughts on this.
There was a problem hiding this comment.
Regarding @weihanglo comment
#12464 (comment)
Another approach is we can emit a warning without touching the lockfile at this time. Something like,
patch `foo 0.1.0 (/path/to/foo)` was not used because there is already a same package in the dependency graph. This unused patch wasn't previously added to the lockfile, but will be in the future.
Based on this, we have two ways to move :
-
Warning-only (Phased): We use the in-memory
(CanonicalUrl, PackageId)logic only to emit the warning to the user. We intentionally exclude these specific overlaps fromResolve::unused_patchesso they do not leak intoCargo.lock. This solves the UX issue now without requiring a lockfile version bump. -
Full lockfile bump (Needs Design): We pause, design how
[[patch.unused]]should encode the URL, bump the lockfile version, and implement the complete fix.
I can go with 1 if we want to deliver the warning first, or hold off for the full design. What do you think is the best path forward?
…kageId Previously, Cargo tracked used patches solely by PackageId. If multiple sources patched the same dependency, unused patches were incorrectly marked as used. This updates the logic to track usage by both source URL and PackageId in-memory, ensuring accurate warnings while keeping Cargo.lock backward-compatible. Fixes rust-lang#12471
|
This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
|
@epage Sorry for the direct ping! The review has been inactive for a few days, and we are a bit stuck on an architectural decision here. Could you please share your thoughts and guide me on how to move forward? |
Sorry for the delayed response. I will have another look. Please do not ping other maintainers if one maintainer is already involved in the PR. I had a business trip last week. I will find some time to check it this week or this weekend. |
Fixes #12471
Previously, Cargo determined patch usage solely by checking if the
PackageIdexisted in the crate graph, which caused false positives if multiple patch sources pointed to the same dependency.Updated
register_used_patchesandemit_warnings_of_unused_patchesto independently verify patch usage by inspecting graph edges for specific(CanonicalUrl, PackageId)pairs.Left
Resolve::unused_patchesasVec<PackageId>to ensure strictCargo.lockbackward compatibility without altering the serialized format.