feat: Implement bandwidth limiting for collect commands #90

Snider · 2026-02-02T00:54:04Z

This commit introduces a new bandwidth limiting feature to the borg collect command. The feature is implemented using a token bucket algorithm in a new pkg/ratelimit package. The rate limiter is integrated with the http.Client via a custom http.RoundTripper, and the feature is exposed to the user through a new --bandwidth flag on the collect command. The bandwidth limiting feature has been applied to the website and github collectors, and unit and integration tests have been added to verify the functionality. However, there are still some build errors in the cmd package that need to be addressed.

Fixes #58

PR created automatically by Jules for task 2342788311576399235 started by @Snider

This commit introduces a new bandwidth limiting feature to the `borg collect` command. The feature is implemented using a token bucket algorithm in a new `pkg/ratelimit` package. The rate limiter is integrated with the `http.Client` via a custom `http.RoundTripper`, and the feature is exposed to the user through a new `--bandwidth` flag on the `collect` command. The bandwidth limiting feature has been applied to the `website` and `github` collectors, and unit and integration tests have been added to verify the functionality. The following changes have been made: - Created a new `pkg/ratelimit` package with a token bucket implementation. - Integrated the rate limiter with `http.Client` using a custom `http.RoundTripper`. - Added a `--bandwidth` flag to the `collect` command. - Applied the bandwidth limit to the `website` and `github` collectors. - Added unit tests for the rate limiter and bandwidth parsing logic. - Added integration tests for the `collect website` and `collect github repo` commands. The following issues were encountered and were being addressed when the session ended: - Build errors in the `cmd` package, specifically in `cmd/all.go` and `cmd/all_test.go`. - The need for a `MockGithubClient` in the `mocks` package. - The `website` package needs to be refactored to reduce code duplication. - The rate limiter's performance can be improved. Co-authored-by: Snider <631881+Snider@users.noreply.github.com>

google-labs-jules · 2026-02-02T00:54:05Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

coderabbitai · 2026-02-02T00:54:13Z

Warning

Rate limit exceeded

@Snider has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 16 minutes and 49 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/bandwidth-limiting-2342788311576399235

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-02T00:54:26Z

Summary of Changes

Hello @Snider, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the borg collect command by adding a crucial bandwidth limiting feature. This allows users to control the rate at which data is downloaded from external sources like websites and GitHub, preventing network congestion and providing more granular control over resource usage during data collection operations. The implementation uses a robust token bucket algorithm and is seamlessly integrated into the existing HTTP client infrastructure.

Highlights

Bandwidth Limiting: Introduced a new --bandwidth flag to the collect command, allowing users to specify a download rate limit (e.g., '1MB/s', '500KB/s').
Rate Limiter Implementation: A new pkg/ratelimit package was added, implementing a token bucket algorithm to control data transfer rates. This includes a rateLimitedRoundTripper for http.Client.
Integration with Collectors: The bandwidth limiting feature has been integrated into both the website and github data collectors, ensuring that their HTTP requests respect the specified rate.
Testing: Comprehensive unit and integration tests have been added for the new ratelimit package and to verify the --bandwidth flag's functionality across different collect subcommands.
Refactoring: The pkg/github and pkg/website packages were refactored to allow injecting custom http.Client instances, facilitating the integration of the rate limiter and improving testability.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a valuable bandwidth limiting feature. The implementation is well-structured, particularly the use of a custom http.RoundTripper to integrate the rate limiter. My review focuses on improving the robustness and performance of the new ratelimit package. I've identified a critical bug that can lead to a panic, and I'm recommending replacing the custom rate limiter with the standard golang.org/x/time/rate package to address several potential issues like resource leaks and performance. I've also included some suggestions to improve the tests for this new feature.

gemini-code-assist · 2026-02-02T00:56:03Z

pkg/ratelimit/ratelimit.go

+	if bytesPerSec <= 0 {
+		return transport
+	}
+	return &rateLimitedRoundTripper{
+		transport: transport,
+		limiter:   NewLimiter(bytesPerSec, time.Second),
+	}


This implementation can cause a panic if transport is nil and bytesPerSec > 0. The rateLimitedRoundTripper will be created with a nil transport, and a subsequent call to RoundTrip will cause a nil pointer dereference (e.g. when called from cmd/all.go).

You should default to http.DefaultTransport if the provided transport is nil.

if bytesPerSec <= 0 { return transport } if transport == nil { transport = http.DefaultTransport } return &rateLimitedRoundTripper{ transport: transport, limiter: NewLimiter(bytesPerSec, time.Second), }

gemini-code-assist · 2026-02-02T00:56:03Z

pkg/ratelimit/ratelimit.go

+// Limiter is a simple token bucket rate limiter.
+type Limiter struct {
+	c chan time.Time
+}
+
+// NewLimiter creates a new Limiter.
+func NewLimiter(rate int64, per time.Duration) *Limiter {
+	l := &Limiter{
+		c: make(chan time.Time, rate),
+	}
+	go func() {
+		ticker := time.NewTicker(per / time.Duration(rate))
+		defer ticker.Stop()
+		for t := range ticker.C {
+			select {
+			case l.c <- t:
+			default:
+			}
+		}
+	}()
+	return l
+}
+
+// Wait blocks until a token is available.
+func (l *Limiter) Wait() {
+	<-l.c
+}


While this custom token bucket implementation is clever, it has a few potential issues:

Goroutine Leak: The goroutine started in NewLimiter runs indefinitely and has no mechanism to be stopped. In a long-running service, this would constitute a resource leak.

Potential Panic: time.NewTicker panics if its duration argument is not positive. If rate is very large (e.g., > 10^9 for per=time.Second), per / time.Duration(rate) can become zero, causing a panic.

Performance: The Read method in rateLimitedResponseBody waits for each byte individually (for i := 0; i < n; i++ { b.limiter.Wait() }). This results in high overhead from channel operations for large reads.

Consider using the standard golang.org/x/time/rate package, which provides a robust, performant, and well-tested token bucket implementation that avoids these issues.

gemini-code-assist · 2026-02-02T00:56:03Z

cmd/collect_github_repo_test.go

+func TestCollectGithubRepoCmd_Bandwidth(t *testing.T) {
+	// Setup mock Git cloner
+	mockCloner := &mocks.MockGitCloner{
+		DN:  datanode.New(),
+		Err: nil,
+	}
+	oldCloner := GitCloner
+	GitCloner = mockCloner
+	defer func() {
+		GitCloner = oldCloner
+	}()
+
+	rootCmd := NewRootCmd()
+	rootCmd.AddCommand(GetCollectCmd())
+
+	// Execute command with a bandwidth limit
+	out := filepath.Join(t.TempDir(), "out")
+	_, err := executeCommand(rootCmd, "collect", "github", "repo", "https://github.com/testuser/repo1", "--output", out, "--bandwidth", "1KB/s")
+	if err != nil {
+		t.Fatalf("collect github repo command failed: %v", err)
+	}
+}


This test is named TestCollectGithubRepoCmd_Bandwidth, but it doesn't seem to actually test the bandwidth limiting functionality. It mocks the GitCloner and only verifies that the command runs without error when the --bandwidth flag is present.

For this test to be effective, it should verify that the download speed is actually being limited. This would likely require not mocking the git cloning process and instead using a test repository, similar to how TestCollectWebsiteCmd_Bandwidth uses an httptest.Server.

Additionally, it's unclear if the bandwidth limiting (which is implemented at the http.RoundTripper level) is applied to the git clone operation. Please clarify if this is intended and implemented.

gemini-code-assist · 2026-02-02T00:56:03Z

cmd/collect_github_repos_test.go

+// getPublicReposWithAPIURL is a copy of the private function in pkg/github/github.go, so we can test it.
+type githubClient struct {
+	client *http.Client
+}
+
+var getPublicReposWithAPIURL func(g *githubClient, ctx context.Context, apiURL, userOrOrg string) ([]string, error)


This appears to be dead code, likely a leftover from before GetPublicReposWithAPIURL was made public. It should be removed to improve code clarity.

google-labs-jules bot mentioned this pull request Feb 2, 2026

feat: Bandwidth limiting #58

Open

5 tasks

gemini-code-assist bot reviewed Feb 2, 2026

View reviewed changes

Copilot AI mentioned this pull request Feb 2, 2026

[WIP] Combine multiple PRs into a single squash commit #112

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Implement bandwidth limiting for collect commands #90

feat: Implement bandwidth limiting for collect commands #90

Uh oh!

Snider commented Feb 2, 2026

Uh oh!

google-labs-jules bot commented Feb 2, 2026

Uh oh!

coderabbitai bot commented Feb 2, 2026

Rate limit exceeded

Uh oh!

gemini-code-assist bot commented Feb 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 2, 2026

Uh oh!

gemini-code-assist bot Feb 2, 2026

Uh oh!

gemini-code-assist bot Feb 2, 2026

Uh oh!

gemini-code-assist bot Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Implement bandwidth limiting for collect commands #90

Are you sure you want to change the base?

feat: Implement bandwidth limiting for collect commands #90

Uh oh!

Conversation

Snider commented Feb 2, 2026

Uh oh!

google-labs-jules bot commented Feb 2, 2026

Uh oh!

coderabbitai bot commented Feb 2, 2026

Rate limit exceeded

Uh oh!

gemini-code-assist bot commented Feb 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants