feat: bin changelog entries by day in stability calculation#143
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the calculate_stability function in src/utils.rs to optimize memory usage by counting and binning changelog entries by day on the fly, avoiding the allocation of a temporary vector. Feedback on the changes highlights that using a stateful closure with side-effects inside Iterator::filter is a Rust anti-pattern, and suggests replacing the iterator chain with an idiomatic for loop to improve code safety and readability.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
If a package has multiple changelog entries within a day, those are likely either duplicates or reflect some sort of short-term packaging problem, rather than indicating a higher frequency of releases. Therefore, it more accurately reflects package stability to treat these as a single release for the purpose of stability calculation. Such duplicate changelog entries can cause stability to be significantly underestimated for some packages; for example, the `systemd` package currently has stability score around 0.260 without this change, and 0.416 with this change.
jlebon
left a comment
There was a problem hiding this comment.
LGTM, thanks!
The underlying issue of course is that changelogs are only a proxy for what we're really interested in. What we really want is churn measured at the consumption level. A closer approximation to this is Bodhi updates.
This kind of integration is out of scope for chunkah, but for anyone really interested in optimizing to that level we could make it easier to feed auxiliary stability data for components (i.e. not just xattrs).
If a package has multiple changelog entries within a day, those are likely either duplicates or reflect some sort of short-term packaging problem, rather than indicating a higher frequency of releases. Therefore, it more accurately reflects package stability to treat these as a single release for the purpose of stability calculation.
Such duplicate changelog entries can cause stability to be significantly underestimated for some packages; for example, the
systemdpackage currently has stability score around 0.260 without this change, and 0.416 with this change.