revision: priority queue for streaming walks + walk_ops vtable by spkrka · Pull Request #2118 · gitgitgadget/git

spkrka · 2026-05-19T08:43:06Z

Summary

Building on the limit_list() priority queue change from PR #2114, this adds a missing release_revisions() call needed for the new allocation pattern, extends the O(log N) priority queue approach to the non-limited streaming walk in get_revision_1(), and refactors the walk dispatch into a vtable pattern.

Commit 1 (same as PR #2114)

revision: use priority queue in limit_list() — replaces O(N) sorted linked-list insertion with O(log N) priority queue in limit_list().

Commit 2

pack-objects: call release_revisions() after cruft traversal — enumerate_and_traverse_cruft_objects() never called release_revisions(). This was harmless on master (the linked-list queue is consumed during traversal), but becomes a real leak once commit 3 switches to a prio_queue whose backing array must be freed. Fixes the linux-leaks CI failure in t7704.

Commit 3

revision: use priority queue for non-limited streaming walks — adds a commit_queue field to rev_info alongside the existing commits linked list. The two representations are mutually exclusive: setup uses the list, then the walk drains it into the priority queue. Removes the list parameter from process_parents() and eliminates commit_list_insert_by_date() entirely.

Performance: git rev-list HEAD on a 2.3M-commit repository: ~21s → ~6.4s (3.3x speedup).

Commit 4

revision: introduce revision_walk_ops for get_revision_1() — replaces the nested if/else dispatch in get_revision_1() with a revision_walk_ops vtable (init/next/expand function pointers). Five walk modes (reflog, topo, streaming, no_walk, limited) are each self-contained. The ops are selected once and cached on rev_info.

Design notes

This takes a two-phase approach: linked list during setup, priority queue during the walk. External callers that read revs->commits between prepare_revision_walk() and the walk loop are unchanged. This contrasts with the alternative approach of replacing commits entirely with a prio_queue (see Jeff King's jk/revs-commits-prio-queue branch at https://github.com/peff/git).

Tradeoffs are discussed in the mailing list thread for PR #2114.

Test plan

All existing tests pass (t6001, t6012, t6019, t4013, t4202, t1411 specifically verified)
linux-leaks CI (t7704) passes with the release_revisions() fix
Correctness verified on 2.3M-commit monorepo across multiple range sizes

limit_list() maintains a date-sorted work queue of commits using a linked list with commit_list_insert_by_date() for insertion. Each insertion walks the list to find the right position — O(n) per insert. In repositories with merge-heavy histories, the symmetric difference can contain thousands of commits, making this O(n) insertion the dominant cost. Replace the sorted linked list with a prio_queue (binary heap). This gives O(log n) insertion and O(log n) extraction instead of O(n) insertion and O(1) extraction, which is a net win when the queue is large. The still_interesting() and everybody_uninteresting() helpers are updated to scan the prio_queue's contiguous array instead of walking a linked list. process_parents() already accepts both a commit_list and a prio_queue parameter, so the change in limit_list() simply switches which one is passed. Benchmark: git rev-list --left-right --count HEAD~N...HEAD Repository: 2.3M commits, merge-heavy DAG (monorepo) Best of 5 runs, times in seconds: commits in symmetric diff baseline patched speedup -------------- -------- ------- ------- 10 0.01 0.01 1.0x 50 0.01 0.01 1.0x 3751 21.23 8.49 2.5x 4524 21.70 8.29 2.6x 10130 20.10 6.65 3.0x No change for small traversals; 2.5-3.0x faster when the queue grows to thousands of commits. Signed-off-by: Kristofer Karlsson <krka@spotify.com>

enumerate_and_traverse_cruft_objects() initializes a rev_info on the stack and uses it for a traversal, but never calls release_revisions() afterwards. This currently does not cause a leak on master because the commit queue uses a linked list that is consumed during traversal. However, a subsequent commit will switch the revision walking machinery to use a prio_queue, which dynamically allocates its backing array. Without this cleanup, that allocation is reported as a leak by LeakSanitizer (384 bytes in the linux-leaks CI job via t7704). Add the missing release_revisions() call. Signed-off-by: Kristofer Karlsson <krka@spotify.com>

Replace the O(N) sorted linked-list insertion used by get_revision_1() for non-limited walks with an O(log N) priority queue. Add a commit_queue field to rev_info alongside the existing commits linked list. The two representations are mutually exclusive: setup and external callers that need list access (mark_edges_uninteresting, bisect, etc.) use the linked list, then get_revision_1() lazily drains it into the priority queue on the first call for non-limited, non-no_walk, non-topo, non-reflog walks. The conversion function rev_info_commit_list_to_queue() is public so callers that know they will iterate can convert early. The parent-rewriting path (rewrite_one) always merges newly discovered commits into the priority queue, which is correct because rewrite_one_1 only calls process_parents for non-limited walks where the queue is active. Combined with the limit_list() priority queue change in the parent commit, this eliminates all O(N) sorted linked-list insertion from the revision walk machinery. Performance: git rev-list HEAD on a 2.3M-commit repository improves from ~21s to ~6.4s (3.3x speedup). Signed-off-by: Kristofer Karlsson <krka@spotify.com>

Extract the per-mode dispatch logic in get_revision_1() into a revision_walk_ops struct with function pointers for next() and expand(). Each walk mode (reflog, topo, streaming, no_walk, limited) gets its own small helper functions, and get_walk_ops() selects the right ops table once per walk. This replaces the scattered if/else chains that checked the same mode conditions at multiple points in the loop body. The loop is now mode-agnostic: it calls ops->next() to pop the next commit and ops->expand() to process parents, with NULL expand for limited walks that skip parent processing entirely. No functional change. Signed-off-by: Kristofer Karlsson <krka@spotify.com>

spkrka force-pushed the revision-walk-ops branch 2 times, most recently from 789b94a to 029889e Compare May 19, 2026 13:23

spkrka added 4 commits May 19, 2026 15:27

spkrka force-pushed the revision-walk-ops branch from 029889e to 7d4da97 Compare May 19, 2026 13:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revision: priority queue for streaming walks + walk_ops vtable#2118

revision: priority queue for streaming walks + walk_ops vtable#2118
spkrka wants to merge 4 commits into
gitgitgadget:masterfrom
spkrka:revision-walk-ops

spkrka commented May 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

spkrka commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Commit 1 (same as PR #2114)

Commit 2

Commit 3

Commit 4

Design notes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

spkrka commented May 19, 2026 •

edited

Loading