Skip to content

Conversation

@Snider
Copy link
Owner

@Snider Snider commented Feb 2, 2026

This submission introduces a new rate-limiting feature that allows for fine-grained control over the application's HTTP requests. The system is configurable on a per-domain basis and can dynamically adjust to server-side rate limits. New CLI flags have been added to provide users with flexible control over the rate-limiting behavior.

Fixes #51


PR created automatically by Jules for task 1795631143193828042 started by @Snider

This commit introduces a configurable rate-limiting system for all HTTP requests made by the application.

Key features include:
- A token bucket algorithm for rate limiting.
- Per-domain configuration via a YAML file (`--rate-config`).
- Wildcard domain matching (e.g., `*.archive.org`).
- Dynamic adjustments based on `429` responses and `Retry-After` headers.
- New CLI flags (`--rate-limit`, `--burst`) for on-the-fly configuration.

I began by creating a new `http` package to centralize the rate-limiting logic. I then integrated this package into the `website` and `github` collectors, ensuring that all outgoing HTTP requests are subject to the new rate-limiting rules.

Throughout the implementation, I added comprehensive unit and integration tests to validate the new functionality. This process also uncovered several pre-existing issues in the test suite, which I have now fixed. These fixes include:
- Correcting mock implementations for `http.Client` and `vcs.GitCloner`.
- Updating outdated function signatures in tests and examples.
- Resolving missing dependencies and syntax errors in test files.
- Stabilizing flaky tests.

Co-authored-by: Snider <631881+Snider@users.noreply.github.com>
@google-labs-jules
Copy link
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 2, 2026

Warning

Rate limit exceeded

@Snider has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 17 minutes and 6 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/rate-limiting-1795631143193828042

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @Snider, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the application's network request handling by introducing a robust and configurable rate-limiting system. It provides users with the ability to specify request rates and burst limits, either directly through CLI flags or via a detailed YAML configuration. Crucially, the system intelligently responds to server-side rate limits by pausing and retrying requests, preventing unnecessary hammering of APIs and improving overall stability when interacting with external services. This change ensures more compliant and efficient data collection.

Highlights

  • Configurable Rate Limiting: Introduced a new rate-limiting feature that allows for fine-grained control over HTTP requests, configurable globally, per-domain, or via a YAML configuration file.
  • Dynamic Rate Adjustment: The system can dynamically adjust to server-side rate limits by respecting the 'Retry-After' header in HTTP 429 Too Many Requests responses.
  • New CLI Flags: Added new command-line flags (--rate-limit, --burst, --rate-config) to collect github repo, collect github repos, and collect website commands for flexible user control over rate-limiting behavior.
  • HTTP Client Integration: Refactored existing components (GitHub client, Git cloner, Website downloader) to accept and utilize a custom http.Client, enabling the injection of the new rate-limiting transport.
  • New pkg/http Module: Created a new pkg/http module containing the core rate-limiting logic, including Config parsing, Limiter implementation, and RateLimitingRoundTripper.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a configurable rate-limiting feature, which is a great addition. The implementation is solid, with a new http.RoundTripper to handle rate limiting and retries. I've identified a few areas for improvement:

  • There's significant code duplication in the command files for handling rate-limit flags. This should be refactored into a helper function.
  • The retry logic in the RateLimitingRoundTripper doesn't handle context cancellation during the wait, and has a potential issue with non-rewindable request bodies.
  • The integration with go-git introduces a global mutex that serializes all git clone operations, which could be a performance bottleneck.
  • There's some unused code in pkg/http/ratelimiter.go that can be removed.
    Overall, these are great changes that add valuable functionality. Addressing the feedback will make the implementation more robust and maintainable.

}

// Wait and retry the request once.
time.Sleep(delay)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The retry logic has a couple of issues:

  1. Context Cancellation: time.Sleep(delay) blocks and does not respect the request's context. If the request is cancelled, this goroutine will wait for the full delay before returning, leaking resources for that duration.
  2. Non-Rewindable Body: Retrying a request can be problematic if the request has a body (POST, PUT, etc.). If the request body is an io.Reader that is not also an io.ReadSeeker, it can only be read once. The second attempt to send the request will fail because the body has already been consumed. While the current usage in the codebase is for GET requests, this makes the RoundTripper unsafe for general use.

For the context cancellation, you should use a timer that can be interrupted by the context's Done channel.

		timer := time.NewTimer(delay)
		select {
		case <-timer.C:
		case <-req.Context().Done():
			timer.Stop()
			return nil, req.Context().Err()
		}

Comment on lines +57 to +63
cloneMutex.Lock()
originalClient := githttp.DefaultClient
githttp.DefaultClient = githttp.NewClient(g.httpClient)
defer func() {
githttp.DefaultClient = originalClient
cloneMutex.Unlock()
}()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using a package-level mutex to guard the modification of the global githttp.DefaultClient will serialize all git clone operations across the application. This will become a performance bottleneck if multiple repositories need to be cloned concurrently.

While this appears to be a workaround for a limitation in the go-git library's API for PlainClone, it's an important trade-off to be aware of. The current implementation correctly prevents data races but at the cost of concurrency.

Comment on lines +54 to +87
config := &borghttp.Config{
Defaults: borghttp.Rate{
RequestsPerSecond: 1, // GitHub API has strict limits
Burst: 1,
},
Domains: make(map[string]borghttp.Rate),
}

if rateConfig != "" {
var err error
config, err = borghttp.ParseConfig(rateConfig)
if err != nil {
return fmt.Errorf("error parsing rate config: %w", err)
}
}

if rateLimit != "" {
parts := strings.Split(rateLimit, "/")
if len(parts) != 2 || (parts[1] != "s" && parts[1] != "m") {
return fmt.Errorf("invalid rate limit format: %s (e.g., 2/s or 120/m)", rateLimit)
}
rate, err := strconv.ParseFloat(parts[0], 64)
if err != nil {
return fmt.Errorf("invalid rate: %w", err)
}
if parts[1] == "m" {
rate = rate / 60
}
config.Defaults.RequestsPerSecond = rate
}

if burst > 0 {
config.Defaults.Burst = burst
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for parsing rate-limiting flags and constructing the borghttp.Config is duplicated in cmd/collect_github_repo.go, cmd/collect_github_repos.go, and cmd/collect_website.go.

To improve maintainability and reduce code duplication, consider refactoring this logic into a shared helper function. This function could take the command's flag set and default rate/burst values, and return a configured *borghttp.Config or an error.

Comment on lines +1 to +28
package http

import (
"context"
"golang.org/x/time/rate"
)

// Limiter is a rate limiter that can be dynamically adjusted.
type Limiter struct {
limiter *rate.Limiter
}

// NewLimiter creates a new Limiter.
func NewLimiter(r rate.Limit, b int) *Limiter {
return &Limiter{
limiter: rate.NewLimiter(r, b),
}
}

// Wait waits for a token from the bucket.
func (l *Limiter) Wait(ctx context.Context) error {
return l.limiter.Wait(ctx)
}

// SetLimit sets the rate limit.
func (l *Limiter) SetLimit(r rate.Limit) {
l.limiter.SetLimit(r)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This Limiter struct and its associated functions in this file appear to be unused in the application. It's a wrapper around rate.Limiter that doesn't add any new functionality. Consider removing this file and its test to reduce code complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Configurable rate limiting per domain

2 participants