Configurable Rate Limiting per Domain #88

Snider · 2026-02-02T00:53:47Z

This submission introduces a new rate-limiting feature that allows for fine-grained control over the application's HTTP requests. The system is configurable on a per-domain basis and can dynamically adjust to server-side rate limits. New CLI flags have been added to provide users with flexible control over the rate-limiting behavior.

Fixes #51

PR created automatically by Jules for task 1795631143193828042 started by @Snider

This commit introduces a configurable rate-limiting system for all HTTP requests made by the application. Key features include: - A token bucket algorithm for rate limiting. - Per-domain configuration via a YAML file (`--rate-config`). - Wildcard domain matching (e.g., `*.archive.org`). - Dynamic adjustments based on `429` responses and `Retry-After` headers. - New CLI flags (`--rate-limit`, `--burst`) for on-the-fly configuration. I began by creating a new `http` package to centralize the rate-limiting logic. I then integrated this package into the `website` and `github` collectors, ensuring that all outgoing HTTP requests are subject to the new rate-limiting rules. Throughout the implementation, I added comprehensive unit and integration tests to validate the new functionality. This process also uncovered several pre-existing issues in the test suite, which I have now fixed. These fixes include: - Correcting mock implementations for `http.Client` and `vcs.GitCloner`. - Updating outdated function signatures in tests and examples. - Resolving missing dependencies and syntax errors in test files. - Stabilizing flaky tests. Co-authored-by: Snider <631881+Snider@users.noreply.github.com>

google-labs-jules · 2026-02-02T00:53:48Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

coderabbitai · 2026-02-02T00:53:56Z

Warning

Rate limit exceeded

@Snider has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 17 minutes and 6 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/rate-limiting-1795631143193828042

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-02T00:54:15Z

Summary of Changes

Hello @Snider, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the application's network request handling by introducing a robust and configurable rate-limiting system. It provides users with the ability to specify request rates and burst limits, either directly through CLI flags or via a detailed YAML configuration. Crucially, the system intelligently responds to server-side rate limits by pausing and retrying requests, preventing unnecessary hammering of APIs and improving overall stability when interacting with external services. This change ensures more compliant and efficient data collection.

Highlights

Configurable Rate Limiting: Introduced a new rate-limiting feature that allows for fine-grained control over HTTP requests, configurable globally, per-domain, or via a YAML configuration file.
Dynamic Rate Adjustment: The system can dynamically adjust to server-side rate limits by respecting the 'Retry-After' header in HTTP 429 Too Many Requests responses.
New CLI Flags: Added new command-line flags (--rate-limit, --burst, --rate-config) to collect github repo, collect github repos, and collect website commands for flexible user control over rate-limiting behavior.
HTTP Client Integration: Refactored existing components (GitHub client, Git cloner, Website downloader) to accept and utilize a custom http.Client, enabling the injection of the new rate-limiting transport.
New pkg/http Module: Created a new pkg/http module containing the core rate-limiting logic, including Config parsing, Limiter implementation, and RateLimitingRoundTripper.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a configurable rate-limiting feature, which is a great addition. The implementation is solid, with a new http.RoundTripper to handle rate limiting and retries. I've identified a few areas for improvement:

There's significant code duplication in the command files for handling rate-limit flags. This should be refactored into a helper function.
The retry logic in the RateLimitingRoundTripper doesn't handle context cancellation during the wait, and has a potential issue with non-rewindable request bodies.
The integration with go-git introduces a global mutex that serializes all git clone operations, which could be a performance bottleneck.
There's some unused code in pkg/http/ratelimiter.go that can be removed.
Overall, these are great changes that add valuable functionality. Addressing the feedback will make the implementation more robust and maintainable.

gemini-code-assist · 2026-02-02T00:56:03Z

pkg/http/roundtripper.go

+		}
+
+		// Wait and retry the request once.
+		time.Sleep(delay)


The retry logic has a couple of issues:

Context Cancellation: time.Sleep(delay) blocks and does not respect the request's context. If the request is cancelled, this goroutine will wait for the full delay before returning, leaking resources for that duration.

Non-Rewindable Body: Retrying a request can be problematic if the request has a body (POST, PUT, etc.). If the request body is an io.Reader that is not also an io.ReadSeeker, it can only be read once. The second attempt to send the request will fail because the body has already been consumed. While the current usage in the codebase is for GET requests, this makes the RoundTripper unsafe for general use.

For the context cancellation, you should use a timer that can be interrupted by the context's Done channel.

timer := time.NewTimer(delay) select { case <-timer.C: case <-req.Context().Done(): timer.Stop() return nil, req.Context().Err() }

gemini-code-assist · 2026-02-02T00:56:03Z

pkg/vcs/git.go

+	cloneMutex.Lock()
+	originalClient := githttp.DefaultClient
+	githttp.DefaultClient = githttp.NewClient(g.httpClient)
+	defer func() {
+		githttp.DefaultClient = originalClient
+		cloneMutex.Unlock()
+	}()


Using a package-level mutex to guard the modification of the global githttp.DefaultClient will serialize all git clone operations across the application. This will become a performance bottleneck if multiple repositories need to be cloned concurrently.

While this appears to be a workaround for a limitation in the go-git library's API for PlainClone, it's an important trade-off to be aware of. The current implementation correctly prevents data races but at the cost of concurrency.

gemini-code-assist · 2026-02-02T00:56:03Z

cmd/collect_github_repo.go

+			config := &borghttp.Config{
+				Defaults: borghttp.Rate{
+					RequestsPerSecond: 1, // GitHub API has strict limits
+					Burst:             1,
+				},
+				Domains: make(map[string]borghttp.Rate),
+			}
+
+			if rateConfig != "" {
+				var err error
+				config, err = borghttp.ParseConfig(rateConfig)
+				if err != nil {
+					return fmt.Errorf("error parsing rate config: %w", err)
+				}
+			}
+
+			if rateLimit != "" {
+				parts := strings.Split(rateLimit, "/")
+				if len(parts) != 2 || (parts[1] != "s" && parts[1] != "m") {
+					return fmt.Errorf("invalid rate limit format: %s (e.g., 2/s or 120/m)", rateLimit)
+				}
+				rate, err := strconv.ParseFloat(parts[0], 64)
+				if err != nil {
+					return fmt.Errorf("invalid rate: %w", err)
+				}
+				if parts[1] == "m" {
+					rate = rate / 60
+				}
+				config.Defaults.RequestsPerSecond = rate
+			}
+
+			if burst > 0 {
+				config.Defaults.Burst = burst
+			}


The logic for parsing rate-limiting flags and constructing the borghttp.Config is duplicated in cmd/collect_github_repo.go, cmd/collect_github_repos.go, and cmd/collect_website.go.

To improve maintainability and reduce code duplication, consider refactoring this logic into a shared helper function. This function could take the command's flag set and default rate/burst values, and return a configured *borghttp.Config or an error.

gemini-code-assist · 2026-02-02T00:56:04Z

pkg/http/ratelimiter.go

+package http
+
+import (
+	"context"
+	"golang.org/x/time/rate"
+)
+
+// Limiter is a rate limiter that can be dynamically adjusted.
+type Limiter struct {
+	limiter *rate.Limiter
+}
+
+// NewLimiter creates a new Limiter.
+func NewLimiter(r rate.Limit, b int) *Limiter {
+	return &Limiter{
+		limiter: rate.NewLimiter(r, b),
+	}
+}
+
+// Wait waits for a token from the bucket.
+func (l *Limiter) Wait(ctx context.Context) error {
+	return l.limiter.Wait(ctx)
+}
+
+// SetLimit sets the rate limit.
+func (l *Limiter) SetLimit(r rate.Limit) {
+	l.limiter.SetLimit(r)
+}


This Limiter struct and its associated functions in this file appear to be unused in the application. It's a wrapper around rate.Limiter that doesn't add any new functionality. Consider removing this file and its test to reduce code complexity.

google-labs-jules bot mentioned this pull request Feb 2, 2026

feat: Configurable rate limiting per domain #51

Open

5 tasks

gemini-code-assist bot reviewed Feb 2, 2026

View reviewed changes

Copilot AI mentioned this pull request Feb 2, 2026

[WIP] Combine multiple PRs into a single squash commit #112

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable Rate Limiting per Domain #88

Configurable Rate Limiting per Domain #88

Uh oh!

Snider commented Feb 2, 2026

Uh oh!

google-labs-jules bot commented Feb 2, 2026

Uh oh!

coderabbitai bot commented Feb 2, 2026

Rate limit exceeded

Uh oh!

gemini-code-assist bot commented Feb 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 2, 2026

Uh oh!

gemini-code-assist bot Feb 2, 2026

Uh oh!

gemini-code-assist bot Feb 2, 2026

Uh oh!

gemini-code-assist bot Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Configurable Rate Limiting per Domain #88

Are you sure you want to change the base?

Configurable Rate Limiting per Domain #88

Uh oh!

Conversation

Snider commented Feb 2, 2026

Uh oh!

google-labs-jules bot commented Feb 2, 2026

Uh oh!

coderabbitai bot commented Feb 2, 2026

Rate limit exceeded

Uh oh!

gemini-code-assist bot commented Feb 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants