-
Notifications
You must be signed in to change notification settings - Fork 0
Configurable Rate Limiting per Domain #88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit introduces a configurable rate-limiting system for all HTTP requests made by the application. Key features include: - A token bucket algorithm for rate limiting. - Per-domain configuration via a YAML file (`--rate-config`). - Wildcard domain matching (e.g., `*.archive.org`). - Dynamic adjustments based on `429` responses and `Retry-After` headers. - New CLI flags (`--rate-limit`, `--burst`) for on-the-fly configuration. I began by creating a new `http` package to centralize the rate-limiting logic. I then integrated this package into the `website` and `github` collectors, ensuring that all outgoing HTTP requests are subject to the new rate-limiting rules. Throughout the implementation, I added comprehensive unit and integration tests to validate the new functionality. This process also uncovered several pre-existing issues in the test suite, which I have now fixed. These fixes include: - Correcting mock implementations for `http.Client` and `vcs.GitCloner`. - Updating outdated function signatures in tests and examples. - Resolving missing dependencies and syntax errors in test files. - Stabilizing flaky tests. Co-authored-by: Snider <631881+Snider@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @Snider, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the application's network request handling by introducing a robust and configurable rate-limiting system. It provides users with the ability to specify request rates and burst limits, either directly through CLI flags or via a detailed YAML configuration. Crucially, the system intelligently responds to server-side rate limits by pausing and retrying requests, preventing unnecessary hammering of APIs and improving overall stability when interacting with external services. This change ensures more compliant and efficient data collection. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a configurable rate-limiting feature, which is a great addition. The implementation is solid, with a new http.RoundTripper to handle rate limiting and retries. I've identified a few areas for improvement:
- There's significant code duplication in the command files for handling rate-limit flags. This should be refactored into a helper function.
- The retry logic in the
RateLimitingRoundTripperdoesn't handle context cancellation during the wait, and has a potential issue with non-rewindable request bodies. - The integration with
go-gitintroduces a global mutex that serializes all git clone operations, which could be a performance bottleneck. - There's some unused code in
pkg/http/ratelimiter.gothat can be removed.
Overall, these are great changes that add valuable functionality. Addressing the feedback will make the implementation more robust and maintainable.
| } | ||
|
|
||
| // Wait and retry the request once. | ||
| time.Sleep(delay) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retry logic has a couple of issues:
- Context Cancellation:
time.Sleep(delay)blocks and does not respect the request's context. If the request is cancelled, this goroutine will wait for the full delay before returning, leaking resources for that duration. - Non-Rewindable Body: Retrying a request can be problematic if the request has a body (
POST,PUT, etc.). If the request body is anio.Readerthat is not also anio.ReadSeeker, it can only be read once. The second attempt to send the request will fail because the body has already been consumed. While the current usage in the codebase is forGETrequests, this makes theRoundTripperunsafe for general use.
For the context cancellation, you should use a timer that can be interrupted by the context's Done channel.
timer := time.NewTimer(delay)
select {
case <-timer.C:
case <-req.Context().Done():
timer.Stop()
return nil, req.Context().Err()
}| cloneMutex.Lock() | ||
| originalClient := githttp.DefaultClient | ||
| githttp.DefaultClient = githttp.NewClient(g.httpClient) | ||
| defer func() { | ||
| githttp.DefaultClient = originalClient | ||
| cloneMutex.Unlock() | ||
| }() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a package-level mutex to guard the modification of the global githttp.DefaultClient will serialize all git clone operations across the application. This will become a performance bottleneck if multiple repositories need to be cloned concurrently.
While this appears to be a workaround for a limitation in the go-git library's API for PlainClone, it's an important trade-off to be aware of. The current implementation correctly prevents data races but at the cost of concurrency.
| config := &borghttp.Config{ | ||
| Defaults: borghttp.Rate{ | ||
| RequestsPerSecond: 1, // GitHub API has strict limits | ||
| Burst: 1, | ||
| }, | ||
| Domains: make(map[string]borghttp.Rate), | ||
| } | ||
|
|
||
| if rateConfig != "" { | ||
| var err error | ||
| config, err = borghttp.ParseConfig(rateConfig) | ||
| if err != nil { | ||
| return fmt.Errorf("error parsing rate config: %w", err) | ||
| } | ||
| } | ||
|
|
||
| if rateLimit != "" { | ||
| parts := strings.Split(rateLimit, "/") | ||
| if len(parts) != 2 || (parts[1] != "s" && parts[1] != "m") { | ||
| return fmt.Errorf("invalid rate limit format: %s (e.g., 2/s or 120/m)", rateLimit) | ||
| } | ||
| rate, err := strconv.ParseFloat(parts[0], 64) | ||
| if err != nil { | ||
| return fmt.Errorf("invalid rate: %w", err) | ||
| } | ||
| if parts[1] == "m" { | ||
| rate = rate / 60 | ||
| } | ||
| config.Defaults.RequestsPerSecond = rate | ||
| } | ||
|
|
||
| if burst > 0 { | ||
| config.Defaults.Burst = burst | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for parsing rate-limiting flags and constructing the borghttp.Config is duplicated in cmd/collect_github_repo.go, cmd/collect_github_repos.go, and cmd/collect_website.go.
To improve maintainability and reduce code duplication, consider refactoring this logic into a shared helper function. This function could take the command's flag set and default rate/burst values, and return a configured *borghttp.Config or an error.
| package http | ||
|
|
||
| import ( | ||
| "context" | ||
| "golang.org/x/time/rate" | ||
| ) | ||
|
|
||
| // Limiter is a rate limiter that can be dynamically adjusted. | ||
| type Limiter struct { | ||
| limiter *rate.Limiter | ||
| } | ||
|
|
||
| // NewLimiter creates a new Limiter. | ||
| func NewLimiter(r rate.Limit, b int) *Limiter { | ||
| return &Limiter{ | ||
| limiter: rate.NewLimiter(r, b), | ||
| } | ||
| } | ||
|
|
||
| // Wait waits for a token from the bucket. | ||
| func (l *Limiter) Wait(ctx context.Context) error { | ||
| return l.limiter.Wait(ctx) | ||
| } | ||
|
|
||
| // SetLimit sets the rate limit. | ||
| func (l *Limiter) SetLimit(r rate.Limit) { | ||
| l.limiter.SetLimit(r) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This submission introduces a new rate-limiting feature that allows for fine-grained control over the application's HTTP requests. The system is configurable on a per-domain basis and can dynamically adjust to server-side rate limits. New CLI flags have been added to provide users with flexible control over the rate-limiting behavior.
Fixes #51
PR created automatically by Jules for task 1795631143193828042 started by @Snider