Skip to content

fix: use commits-first approach to avoid rate limiting and missing contributors#409

Merged
zkoppert merged 2 commits intomainfrom
fix/contributors-missing-rate-limit
Feb 28, 2026
Merged

fix: use commits-first approach to avoid rate limiting and missing contributors#409
zkoppert merged 2 commits intomainfrom
fix/contributors-missing-rate-limit

Conversation

@zkoppert
Copy link
Collaborator

Summary

Fixes #392 — contributors silently dropped from output due to rate limiting on large repos/orgs.

Problem

When start_date and end_date are set, get_contributors() fetches all-time contributors via repo.contributors(), then makes a separate API call per contributor to check for commits in the date range. For large repos this causes rate limiting, and the broad except Exception handler silently drops the entire repo's contributors.

Solution

Use a commits-first approach: fetch commits in the date range directly (repo.commits(since=, until=)) and extract unique authors. This reduces API calls from O(N) (all-time contributors) to O(M/30) (commits in period, paginated).

Bonus: contribution_count now reflects the actual count for the specified period, not the all-time count.

Testing

  • All 62 tests pass (2 new tests added for None author handling and commit aggregation)
  • Linting passes (pylint 10.00/10, flake8, mypy, black, isort all clean)
  • 99.69% code coverage

Copilot AI review requested due to automatic review settings February 28, 2026 00:34
@zkoppert zkoppert requested a review from jmeridth as a code owner February 28, 2026 00:34
@github-actions github-actions bot added the fix label Feb 28, 2026
@zkoppert zkoppert self-assigned this Feb 28, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses missing contributors on large repos/orgs by switching get_contributors() to a commits-first strategy for date-bounded runs, reducing API calls and rate-limit risk while making counts reflect the selected period.

Changes:

  • Refactors get_contributors() to enumerate commits in the start_date/end_date range and aggregate unique authors/counts.
  • Updates and adds unit tests to cover commit aggregation and skipping commits with None authors.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
contributors.py Implements commits-first contributor discovery and per-period contribution counting when a date range is provided.
test_contributors.py Updates existing tests for the new behavior and adds new cases for None authors and aggregation across multiple commits.
Comments suppressed due to low confidence (1)

test_contributors.py:140

  • test_get_contributors_skip_users_with_no_commits (and its docstring) no longer matches what the test actually verifies after the commits-first refactor; it now asserts the happy path with a single commit. Renaming the test and updating the docstring to reflect the current behavior will keep the suite understandable and avoid future confusion.
    @patch("contributors.contributor_stats.ContributorStats")
    def test_get_contributors_skip_users_with_no_commits(self, mock_contributor_stats):
        """
        Test the get_contributors function skips users with no commits in the date range.
        """
        mock_repo = MagicMock()
        mock_commit = MagicMock()
        mock_commit.author.login = "user"
        mock_commit.author.avatar_url = (
            "https://avatars.githubusercontent.com/u/12345678?v=4"
        )

        mock_repo.full_name = "owner/repo"
        mock_repo.commits.return_value = iter([mock_commit])
        ghe = ""

…ntributors

When date filtering is active, fetch commits in the date range directly
and extract unique authors, instead of iterating all-time contributors
and making a separate API call per contributor to check for commits.

The previous approach made O(N) API calls where N is the number of
all-time contributors, which exhausted rate limits on large repos/orgs.
When rate limiting occurred mid-iteration, the broad exception handler
silently dropped all contributors for the affected repository.

The new approach makes O(M/30) API calls where M is the number of
commits in the date range, which is orders of magnitude fewer for
monthly reports.

Additionally, contribution_count now reflects the actual count for the
specified period rather than the misleading all-time count.

Fixes #392

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@zkoppert zkoppert force-pushed the fix/contributors-missing-rate-limit branch from 796dc07 to 545219a Compare February 28, 2026 00:39
Address PR review feedback:
- Add return value assertions to test_get_contributors and
  test_get_contributors_skip_bot for stricter contract verification
- Rename test_get_contributors_skip_users_with_no_commits to
  test_get_contributors_with_single_commit to match actual behavior

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@zkoppert zkoppert merged commit 541f0ea into main Feb 28, 2026
33 checks passed
@zkoppert zkoppert deleted the fix/contributors-missing-rate-limit branch February 28, 2026 01:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Doesn't output all contributors

3 participants